Introduction

As we recall from the first vignette, in a way to implement a set of simulations in the AlphaSimR package four steps must be followed:

  1. Simulate founder genomes/haplotypes.
  2. Set global simulation parameters for a target trait/traits.
  3. Model the breeding program.
  4. Examine results by looking into population individuals’ metrics.

In this vignette, we will cover a few useful functions from AlphaSimR that will help to Model the breeding program (step 3). In addition, we will compare two implementations over a few generations of selection.

We will start by loading the packages.

# Loading packages
library(AlphaSimR)
## Loading required package: R6
library(ggplot2)

Modeling a breeding program in AlphaSimR

Founder Genome

Let’s start the simulation by creating the base genome.

# Creating the founding genome
founderGenomes = runMacs(nInd = 100,
                         nChr = 4,
                         segSites = 100,
                         species = "MAIZE")


# Setting up trait characteristics
SP = SimParam$new(founderGenomes)
SP$addTraitA(nQtlPerChr = 100, 
             mean = 10, 
             var = 1)

# Generating base population
Parents = newPop(founderGenomes)

Making crosses

First, at the beginning of each breeding cycle, we generate new crosses. We can implement random crosses as well as make oriented crosses by giving the list of crosses to make.

# 1. Crossing randomly
F1pop = randCross(pop = Parents, nCrosses = 100)

# 2. Make crosses by a list of individuals
# Making the crossing plan
MatingPlan = data.frame(Parent1 = Parents@id[1:5],
                        Parent2 = Parents@id[6:10])
# Crossing based on the crossing list
F1Pop = makeCross(Parents, crossPlan = as.matrix(MatingPlan))

# Checking out the pedigree records out of F1
Pedigree = data.frame(Ind = F1Pop@id,
                 Parent1 = F1Pop@mother,
                 Parent2 = F1Pop@father)

Advance the genotypes

Now, we can advance the materials. One function that we may use is the self().

  # Advancing F1->F2
  F2pop = self(pop = F1pop, nProgeny = 10)

Then, we may assume phenotypes for all those individuals. For each individual, the phenotypes are assumed to come from the genetic values generated. Then, AlphaSimR adds in each genetic value a random error sampled from a multivariate normal distribution in the function setPheno(). Some important arguments in the function setPheno() are:

pop = the target population with the individuals to assume the phenotypes.
h2 = narrow-sense heritability.
H2 = broad-sense heritability.
varE = residual variance.
reps = number of repetitions.
p = the p-value for the environmental covariate used by GxE traits. If NULL, a value is sampled at random.

There are three options for assuming the error variance mentioned. We can set the argument h2, H2, varE. We should use only one of these arguments. If we supply values for more than one, only one will be used according to the order in which they are listed above.

# Setting phenotypes
F2popvarE = setPheno(pop = F2pop, varE = 10, reps = 1) # Option 1

F2poph2 = setPheno(pop = F2pop, h2 = 0.5, reps = 1) # Option 2

F2popH2 = setPheno(pop = F2pop, H2 = 0.6, reps = 1) # Option 3


# Correlation

df = data.frame('VarE' = cor(F2popvarE@gv, F2popvarE@pheno),
                'h2' = cor(F2poph2@gv, F2poph2@pheno),
                'H2' = cor(F2popH2@gv, F2popH2@pheno))
df
##           Trait1  Trait1.1  Trait1.2
## Trait1 0.3840965 0.7793424 0.8256036
# Plotting the values
par(mfrow = c(2,3))

hist(F2popvarE@gv)
hist(F2poph2@gv)
hist(F2popH2@gv)
hist(F2popvarE@pheno)
hist(F2poph2@pheno)
hist(F2popH2@pheno)

The reps argument is for convenient representation of replicated data. It is intended to represent replicated yield trials in plant breeding programs. In this case, varE is set to the plot error and reps is set to the number of plots per entry. The resulting phenotype represents the entry-means.

We will explore more in the following populations:

# Setting phenotypes
F2pheno1 = setPheno(F2pop, varE = 10, reps = 1)

# Setting phenotypes
F2pheno6 = setPheno(F2pop, varE = 10, reps = 6)

# Setting phenotypes
F2pheno10 = setPheno(F2pop, varE = 10, reps = 10)

# Correlation with the genetic value using different numbers of repetitions
cor(F2pheno1@gv, F2pheno1@pheno)
##         Trait1
## Trait1 0.35992
cor(F2pheno6@gv, F2pheno6@pheno)
##           Trait1
## Trait1 0.7055848
cor(F2pheno10@gv, F2pheno10@pheno)
##           Trait1
## Trait1 0.7720663
# Plotting the parameters
df = data.frame(Trait1 = c(F2pheno1@pheno,F2pheno6@pheno,F2pheno10@pheno),
                Id = c(1:(3*nInd(F2pop))),
                Pop = rep(c('rep1','rep6','rep10'), each=nInd(F2pop)))

#require(ggplot2)
ggplot(df, aes(x = Id, y = Trait1, color = Pop)) +
  geom_point(show.legend = TRUE)

Another argument in the setPheno() function is p. It represents the environmental covariate and it varies from 0-1. Values close to 0 and 1 put more weight on the phenotypic values (larger errors) and values close to 0.5, have a smaller weight (lower errors). Generally, we assume uniform values for the p argument.

We will explore more in the following populations:

# Setting phenotypes
F2_pop1 = setPheno(F2pop, reps = 2, h2 = 0.1, p = 0.9) 

# Setting phenotypes
F2_pop2 = setPheno(F2pop, reps = 2, h2 = 0.1, p = 0.5) 

# Setting phenotypes
F2_pop3 = setPheno(F2pop, reps = 2, h2 = 0.1, p = 0.1)

# Correlation of the phenotypes with the genetic values
cor(F2_pop1@gv, F2_pop1@pheno)
##           Trait1
## Trait1 0.5298826
cor(F2_pop2@gv, F2_pop2@pheno)
##           Trait1
## Trait1 0.5199872
cor(F2_pop3@gv, F2_pop3@pheno)
##          Trait1
## Trait1 0.471256

The next step in modeling the breeding program is to make selections on the phenotypes. We may use the function selectInd()

# Advancing F1->F2
F2pop = self(pop = F1pop, nProgeny = 1)

# Setting phenotypes
F2pop = setPheno(F2pop, varE = 10, reps = 1)

# Select the best-performing individuals according to their phenotype
F2popSel = selectInd(pop = F2pop,
                            nInd = 20,
                            use = "pheno") # gv, pheno, bv, ebv

# Create a new population and assume phenotypes
newPop = randCross(F2popSel, nCrosses = 100)
newPop = setPheno(newPop, h2 = 0.5)

# Evaluate observed response to selection between generations (as the difference between the mean of genetic values)
deltaG = mean(gv(newPop)) - mean(gv(F2pop))

Plotting

To see the impact of the selections in the genetic mean of the populations, we can plot the phenotypic and genetic values from both populations:

phenoRange = range(pheno(c(F2pop, newPop)))
par(mfrow = c(2, 2),
    mar = c(4, 4, 2, 1))
# Phenotype values in the base population and the selected individuals
tmp = hist(pheno(F2pop), xlim = phenoRange, xlab = "Phenotype value", main = "Base population")
abline(v = mean(pheno(F2pop)), col = "black", lty = 1, lwd = 3)
hist(pheno(F2popSel), col = "blue", breaks = tmp$breaks, add = TRUE)
abline(v = mean(pheno(F2popSel)), col = "blue", lty = 4, lwd = 3)

# Genetic values in the base population and the selected individuals
tmp = hist(gv(F2pop), xlim = phenoRange, xlab = "Genetic value", main = "Base population")
abline(v = mean(gv(F2pop)), col = "black", lty = 1, lwd = 3)
hist(gv(F2popSel), col = "blue", breaks = tmp$breaks, add = TRUE)
abline(v = mean(gv(F2popSel)), col = "blue", lty = 4, lwd = 3)

# Phenotype values in the new population
hist(pheno(newPop), xlim = phenoRange, xlab = "Phenotype value", main = "New population")
abline(v = mean(pheno(F2pop)), col = "black", lty = 1, lwd = 3)
abline(v = mean(pheno(newPop)), col = "blue", lty = 4, lwd = 3)

# Genetic values in the new population
hist(gv(newPop), xlim = phenoRange, xlab = "Genetic value", main = "New population")
abline(v = mean(gv(F2pop)), col = "black", lty = 1, lwd = 3)
abline(v = mean(gv(newPop)), col = "blue", lty = 4, lwd = 3)

Then, we can advance the materials and do selections to the end of the pipeline.

  # Advancing F2->F3 and selection
  F3pop = self(pop = F2pop, nProgeny = 4)
  F3pop = setPheno(F3pop, varE = 10, reps = 3)
  
  F3popSel = selectWithinFam(F3pop, nInd = 4, use = "pheno")
  F3popSel = selectInd(F3popSel, nInd = 300, use = "pheno")

   # Advancing F3->F4 and selection
  F4pop = self(pop = F3popSel, nProgeny = 2)
  F4pop = setPheno(F4pop, varE = 10, reps = 5)
  
  F4popSel = selectWithinFam(F4pop, nInd = 1, use = "pheno")
  F4popSel = selectInd(F4popSel, nInd = 50, use = "pheno")

  # Advancing F4->F5 and selection
  F5pop = self(pop = F4popSel, nProgeny = 1)
  F5pop = setPheno(F5pop, varE = 10, reps = 8)
  
  F5popSel = selectInd(F5pop, nInd = 10, use = "pheno")
  
  # Advancing F5->F6 and selection
  F6pop = self(pop = F5popSel, nProgeny = 1)
  F6pop = setPheno(F6pop, varE = 10, reps = 10)
  
  F6Sel = selectInd(F6pop, nInd = 5, use = "pheno")

  # Selecting Variety
  F6Sel = setPheno(F6Sel, varE = 10, reps = 20)
  Variety = selectInd(F6Sel, nInd = 1, use = "pheno")

Some (other) useful functions in AlphaSimR.

# 1. Making DHs
F1 = randCross(Parents,120,1)
DHs = makeDH(F1, nDH = 10)

# 2. Hybrid Cross
tester = DHs[1:5] # Five individuals as testers
lines = DHs[6:105] # Lines

Hybrids = hybridCross(tester, lines, crossPlan = 'testcross')
length(unique(Hybrids@father))
## [1] 100
# Set pheno for CGA
linesGCA = setPhenoGCA(lines, tester, h2 = 0.2)
lines = setPheno(lines, h2 = 0.2)

cor(linesGCA@pheno, lines@pheno)
##         Trait1
## [1,] 0.3195568

Examine results over generations

rm(list=ls())

# Creating the founding genome
founderGenomes = runMacs(nInd = 100,
                         nChr = 4,
                         segSites = 20,
                         species = "MAIZE")


# Setting up trait characteristics
SP = SimParam$new(founderGenomes)
SP$addTraitA(nQtlPerChr = 20, 
             mean = 10, 
             var = 1)

# Generating base population
basePop = newPop(founderGenomes)


# Set simulation parameters 
nSelected1 = 10 # Scenario 1
nSelected2 = 50  # Scenario 2

##------------------- For the first scenario
# Select the best performing individuals according to their phenotype
basePopSelected = basePop

# Allocate vectors
nGenerations = 10 + 1 # +1 to store starting generation
meanGAll = numeric(nGenerations)
varGAll = numeric(nGenerations)

# Save the starting values
meanGAll[1] = meanG(basePop)
varGAll[1] = varG(basePop)

# To make the for loop below simpler we will make a copy of the object basePopSelected
newPopSelected = basePopSelected

# Selection over many generations
for (generation in 1:(nGenerations - 1)) {
  # Cross parents, phenotype progeny, and select new parents
  newPop = randCross(newPopSelected, nCrosses = nInd(basePop))
  newPop = setPheno(newPop, h2 = 0.5)
  newPopSelected = selectInd(pop = newPop,
                             nInd = nSelected1,
                             use = "pheno")
  # Save summaries
  meanGAll[1 + generation] = meanG(newPop)
  varGAll[1 + generation] = varG(newPop)
}

# Now save these outputs by copying the objects
meanGAll_n10 = meanGAll
varGAll_n10 = varGAll

##------------------- For second scenario

# Allocate vectors
nGenerations = 10 + 1 # +1 to store starting generation
meanGAll = numeric(nGenerations)
varGAll = numeric(nGenerations)

# Save the starting values
meanGAll[1] = meanG(basePop)
varGAll[1] = varG(basePop)

# To make the for loop below simpler we will make a copy of the object basePopSelected
newPopSelected = basePopSelected

# Selection over many generations
for (generation in 1:(nGenerations - 1)) {
  # Cross parents, phenotype progeny, and select new parents
  newPop = randCross(newPopSelected, nCrosses = nInd(basePop))
  newPop = setPheno(newPop, h2 = 0.5)
  newPopSelected = selectInd(pop = newPop,
                             nInd = nSelected2,
                             use = "pheno")
  # Save summaries
  meanGAll[1 + generation] = meanG(newPop)
  varGAll[1 + generation] = varG(newPop)
}

# Now save these outputs by copying the objects
meanGAll_n50 = meanGAll
varGAll_n50 = varGAll

Let’s plot and see the changes over time

par(mfrow = c(2, 1),
    mar = c(4, 4, 1, 1))

# Plot mean of genetic values over time
meanRanges = range(c(meanGAll_n10, meanGAll_n50))
plot(x = 1:nGenerations, y = meanGAll_n10, type = "l", col = "black", lwd = 3,
     xlab = "Generation", ylab = "Mean of genetic values", ylim = meanRanges)
lines(x = 1:nGenerations, y = meanGAll_n50, type = "l", col = "purple", lty = 2, lwd = 3)
legend(x = "topleft", legend = c(10, 50), title = "nSelected",
       lwd = 3, lty = c(1, 2), col = c("black", "purple"), bty = "n")

# Plot variance of genetic values over time
varRanges = range(c(varGAll_n50, varGAll_n10))
plot(x = 1:nGenerations, y = varGAll_n10, type = "l", col = "black", lwd = 3,
     xlab = "Generation", ylab = "Variance of genetic values", ylim = varRanges)
lines(x = 1:nGenerations, y = varGAll_n50, type = "l", col = "purple", lty = 2, lwd = 3)
legend(x = "topright", legend = c(10, 50), title = "nSelected",
       lwd = 3, lty = c(1, 2), col = c("black", "purple"), bty = "n")

Multi traits

# Clean the working environment
rm(list = ls())

# Load AlphaSimR, simulate founder genomes, define a trait, and simulate a base population
founderGenomes = runMacs(nInd = 100,
                         nChr = 10,
                         segSites = 100,
                         species = "MAIZE")

# Trait
SP = SimParam$new(founderGenomes)
SP$addTraitA(nQtlPerChr = 100,
             mean = c(10, 2), 
             var = c(1,1),
             corA = matrix(data=c(1, 0.25, 0.25, 1), ncol = 2))

# Parents
Parents = newPop(founderGenomes)

We can proceed and create crosses that take both traits into account. In addition, we can assume phenotypes and select individuals.

# Crossing randomly
F1pop = randCross(pop = Parents, nCrosses = 100)

# Advancing F1->F2 and phenotypes
F2pop = self(pop = F1pop, nProgeny = 1)
F2pop = setPheno(F2pop, varE = c(10, 2), reps = 1)

# Select the top individuals based on their phenotypes - trait 1
popSelT1 = selectInd(pop = F2pop,
                     nInd = 20,
                     use = "pheno",
                     trait = 1) # gv, pheno, bv, ebv

# Select the top individuals based on their phenotypes - trait 2
popSelT2 = selectInd(pop = F2pop,
                     nInd = 20,
                     use = "pheno",
                     trait = 2) # gv, pheno, bv, ebv


# Select the top individuals based on their phenotypes - trait 2
popSelBoth = selectInd(pop = F2pop,
                     nInd = 20,
                     use = "pheno",
                     trait = selIndex,
                     b=c(0.5,0.5)) # index

# Gain with selections based on individual traits or in the index.
df = data.frame(Basepop = c(apply(F2pop@pheno, 2, FUN = mean)),
                Sel_T1 = c(apply(popSelT1@pheno, 2, FUN = mean)),
                Sel_T2 = c(apply(popSelT2@pheno, 2, FUN = mean)),
                Sel_Both = c(apply(popSelBoth@pheno, 2, FUN = mean)))
df
##         Basepop    Sel_T1   Sel_T2  Sel_Both
## Trait1 9.812046 14.966326 9.882529 14.652945
## Trait2 1.967772  1.691937 4.478636  2.503956

References


  1. professor, University of Florida, ↩︎

  2. Post doc, University of Florida, ↩︎