1. Introduction

To start a set of simulations in the AlphaSimR package (Gaynor, Gorjanc, and Hickey 2021), four steps must be implemented, as follows:

  1. Simulate founder genomes/haplotypes.
  2. Set global simulation parameters for a target trait/traits.
  3. Model the breeding program.
  4. Examine results by looking into population individuals’ metrics.

In this vignette, we will cover a few useful functions from AlphaSimR that will help to model a the breeding program. In addition, we will compare three pipelines over a few generations of selection.

2. Breeding program

Imagine that we got hired by a breeding company. In your first day, your team introduced you to the pipeline below:

Then, your manager mentioned that the you will implement changes in the pipeline and they gave you two options:

  1. Increase the number of crosses from 50 to 70 (20 more families);
  2. Increase the number of doubled-haploid (DH) lines produced from 10 to 14 (400 more DH individuals).

As there are no budget for implementing both, your task will be to choose each strategy is the best and why.

So, in the next topics, and following the 4 steps before mentioned, we will implement the pipeline and the scenarios to leverage evidence for a data-drive decision.

Step 1: Founder Genome

We will start by loading the packages.

# Loading packages
# install.packages("AlphaSimR") # Install if you do not have it
library(AlphaSimR)
## Loading required package: R6
library(ggplot2)

Let’s start the simulation itself by creating the base genome.

set.seed(52684)
# Creating the founding genome
founderGenomes = runMacs(nInd = 100, # Number of individuals that compose the genome
                         nChr = 10,   # Number of chromosome pairs of the target species
                         segSites = 50,  # number of segregation sites
                         species = "MAIZE") # We can use the base parameters and we have available in the package MAIZE, WHEAT, CATTLE, and GENERIC.

# Object created
founderGenomes
## An object of class "MapPop" 
## Ploidy: 2 
## Individuals: 100 
## Chromosomes: 10 
## Loci: 500

Step 2: Traits

With the founder genomes in perspective, we can proceed to add the characteristics of the target trait or traits to the simulation. AlphaSimR has a vignette that we recommend you take a look at link. It gives, thoroughly, an explanation on the traits of interest and how to interpret its effects.

We will start with a trait with only additive effects. For such, we have the following parameters:

set.seed(6985)
# Global simulation parameters from founder genomes.
SP = SimParam$new(founderGenomes)

# Additive trait
SP$addTraitA(nQtlPerChr = 20, # Number of QTL per chromosome
             mean = 10, # Trait additive mean
             var = 10) # Trait additive variance or varA

SP$setVarE(varE=20) # trait heritability

# 
#             varA                 10
# h2 = ------------------- = ------------- = 0.33
#       varA + varE/nreps       10 + 20
#

# QTL effects (for the traits)
SP$traits[[1]]@addEff
##   [1] -0.461665280  0.401970953 -0.162468487  0.102462278 -0.400742643
##   [6] -0.277532766 -0.848304391  0.146805432  0.863171779 -0.016736397
##  [11] -0.696456841 -0.038287308  0.383747681 -0.364386064  0.482625385
##  [16]  0.214603293  0.278871869  0.368198182 -0.131123793  0.249230379
##  [21]  0.442916834 -0.719218994  0.397263153 -0.307487543  0.217171343
##  [26] -0.706694241 -0.047641081 -0.474597289 -0.295749820  0.729934755
##  [31] -0.139512868 -0.501969770  0.145528117  0.844087847  0.511460312
##  [36]  0.117810361  0.578736766  0.314206951  0.078945197 -0.001422712
##  [41]  0.327342163 -0.254465172  0.020528076 -0.455401968 -0.081499700
##  [46]  0.084264300 -0.022395934  0.457563053 -0.299016664 -0.006928243
##  [51] -0.227359777 -0.319400423 -0.398409399 -0.052149974 -0.098296094
##  [56]  0.273530298  1.155093323  0.066288692 -0.371035847  0.382634548
##  [61]  0.400570776  0.045305685 -0.267556246  0.129471951  0.231982597
##  [66] -0.520190901 -0.183675692 -0.553238070  0.187815156 -0.966631627
##  [71] -0.550095604 -0.230195321  0.233429304  0.209879458 -0.286482424
##  [76] -0.094343547 -0.260738929 -0.303447653 -0.037984336 -0.673131397
##  [81]  0.537570019  0.235772621  0.710480986  0.307839374 -0.510079519
##  [86] -0.656417466 -0.032579917 -0.019180429  0.215889572  0.199177047
##  [91] -0.784688428  0.628416729  0.648299176  0.362468757 -0.008699142
##  [96] -0.187075833  0.255910977  0.903406197 -1.053598567 -0.676564539
## [101] -0.834180487  0.886379779 -0.309572693  1.075586514 -0.720294771
## [106]  0.146918546  0.297383055 -0.481376470  0.186993125 -0.403558918
## [111] -0.117300799 -0.660296997 -0.332194465  0.615816408  0.254277603
## [116]  0.576829933  0.586953108 -0.410550839  0.507214170  0.072226681
## [121] -0.335943864 -0.777337079  0.851235589  0.067308575  0.230557604
## [126]  0.373773069 -0.082035295  0.437240692 -0.231714414 -0.282242309
## [131] -0.521099012  0.166914461 -0.431821654  0.006854767 -0.739023472
## [136] -0.195977320  0.001723814  0.004405849  0.391183317 -0.264211586
## [141]  0.175165737  0.180579075 -0.396543956  0.198599954  0.666161745
## [146] -0.426867869  0.073107332  1.024700914 -0.126605618 -0.172316699
## [151]  0.262498175 -0.492893522  0.208737381  0.303638980 -0.471008935
## [156]  0.747689623 -0.086393315 -0.366649436  0.366224632  0.260506182
## [161] -0.743601498 -0.214322527 -0.293096287  0.304135959 -0.443315461
## [166]  0.069288014  0.241518307 -0.057181138 -0.623430693 -0.345305256
## [171] -0.198929640 -0.356346590 -0.196815322 -0.403088367 -0.398741540
## [176] -0.295746876  0.392189742 -0.146491973  0.122058360 -0.541053558
## [181]  0.376497593 -0.452790977  0.045916966 -0.945953444  0.148775350
## [186] -0.093564627  0.108476833 -0.494405994 -0.235406109  0.037356424
## [191]  0.384697106 -0.323549286  0.556348711  0.057812811 -0.368637277
## [196]  0.336914430 -0.069067613 -0.390398499 -0.056477812 -0.128204974
# Generating base population
basePop = newPop(founderGenomes)

# Gen param
# genParam(basePop)

# Parents
Parents = basePop

The implementation of traits in AlphaSimR follows a biological model, which is responsible for converting into a genetic value each individual genotype before created. In a straightforward way, the genetic value is used to create the individuals’ phenotypes. The biological effects presented in AlphaSimR are:

A: additive effect
D: dominance effect
G: genotype by environment interaction effect
E: environmental effect

So, we can create traits with the combinations of those effects (assuming that all of them as, at least, additive) using the ADGE framework, as it follows:

# Traits that can be created in AlphaSimR:
SP$addTraitA()
SP$addTraitAD()
SP$addTraitADG()
SP$addTraitADEG()
SP$addTraitAG()
SP$addTraitAE()
SP$addTraitAEG()

It is important to have in mind that for dominance effects (D) we set the mean of the dominance degree (0-1) and variance, whereas for genotype by environment effect (G) and environmental effect (E) we have just to adjust the variance.

Step 3: Deploying the pipeline

First, at the beginning of each breeding cycle, we generate new crosses. We can implement random crosses as well as make oriented crosses by giving the list of crosses to make.

#------------- Year 1 
F1pop = randCross(pop = Parents, nCrosses = 50, nProgeny = 2) # Crossing block

#------------- Year 2
DHpop = makeDH(pop=F1pop, nDH=10) # Doubled Haploids

#------------- Year 3
rowStage = setPheno(pop=DHpop, h2 = 0.1, reps = 1) # h2 = 0.1 (visual selection)
PrelimTrial = selectWithinFam(rowStage, nInd = 3, use = "pheno") 

#-------------  Year 4
PrelimTrial = setPheno(PrelimTrial, varE = 20, reps = 1) # h2 = 10/(10+20/1) = 0.33
AdvancTrial = selectInd(PrelimTrial, nInd = 50, use = "pheno") # Preliminary Trial

#------------- Year 5
AdvancTrial = setPheno(AdvancTrial, varE = 20, reps = 5) # h2 = 10/(10+20/5) = 0.71
EliteTrial = selectInd(AdvancTrial, nInd = 20, use = "pheno") # Advanced Trial

#------------- Year 6
EliteTrial = setPheno(EliteTrial, varE = 20, reps = 20) # h2 = 10/(10+20/20) = 0.90
Variety = selectInd(EliteTrial, nInd = 2, use = "pheno") # Elite Trial

#------------- Year 7    
# Release varieties

Step 4: Comparision

Scenario 1

# Allocate vectors
nYears = 20 + 1 # +1 to store starting generation
meanGAll = numeric(nYears)
varGAll = numeric(nYears)

# Save the starting values
meanGAll[1] = meanG(basePop)
varGAll[1] = varG(basePop)

# From the same population
Parents = basePop


# Loop
for (i in 2:nYears){
  set.seed(1)
#------------- Year 1 
F1pop = randCross(pop = Parents, nCrosses = 50, nProgeny = 2) # Crossing block

#------------- Year 2
DHpop = makeDH(pop=F1pop, nDH = 10) # Doubled Haploids

#------------- Year 3
rowStage = setPheno(pop=DHpop, h2 = 0.1, reps = 1)
PrelimTrial = selectWithinFam(rowStage, nInd = 3, use = "pheno") 

#-------------  Year 4
PrelimTrial = setPheno(PrelimTrial, varE = 20, reps = 1)
AdvancTrial = selectInd(PrelimTrial, nInd = 50, use = "pheno") # Preliminary Trial

#------------- Year 5
AdvancTrial = setPheno(AdvancTrial, varE = 20, reps = 5)
EliteTrial = selectInd(AdvancTrial, nInd = 20, use = "pheno") # Advanced Trial

#------------- Year 6
EliteTrial = setPheno(EliteTrial, varE = 20, reps = 20)
Variety = selectInd(EliteTrial, nInd = 2, use = "pheno") # Elite Trial

#------------- Year 7    
# Release varieties

# Updating the parents
Parents = AdvancTrial

# Track performance
meanGAll[i] = meanG(Parents)
varGAll[i] = varG(Parents)

}

# Track performance
meanG_Scen1 = meanGAll
varG_Scen1 = varGAll

Scenario 2

# Allocate vectors
nYears = 20 + 1 # +1 to store starting generation
meanGAll = numeric(nYears)
varGAll = numeric(nYears)

# Save the starting values
meanGAll[1] = meanG(basePop)
varGAll[1] = varG(basePop)

# From the same population
Parents = basePop


# Loop
for (i in 2:nYears){
set.seed(11)  
#------------- Year 1 
F1pop = randCross(pop = Parents, nCrosses = 70, nProgeny = 2) # Crossing block

#------------- Year 2
DHpop = makeDH(pop=F1pop, nDH=10) # Doubled Haploids

#------------- Year 3
rowStage = setPheno(pop=DHpop, h2 = 0.1, reps = 1)
PrelimTrial = selectWithinFam(rowStage, nInd = 3, use = "pheno") 

#-------------  Year 4
PrelimTrial = setPheno(PrelimTrial, varE = 20, reps = 1)
AdvancTrial = selectInd(PrelimTrial, nInd = 50, use = "pheno") # Preliminary Trial

#------------- Year 5
AdvancTrial = setPheno(AdvancTrial, varE = 20, reps = 5)
EliteTrial = selectInd(AdvancTrial, nInd = 20, use = "pheno") # Advanced Trial

#------------- Year 6
EliteTrial = setPheno(EliteTrial, varE = 20, reps = 20)
Variety = selectInd(EliteTrial, nInd = 2, use = "pheno") # Elite Trial

#------------- Year 7    
# Release varieties

# Updating the parents
Parents = AdvancTrial

# Track performance
meanGAll[i] = meanG(Parents)
varGAll[i] = varG(Parents)

}

# Track performance
meanG_Scen2 = meanGAll
varG_Scen2 = varGAll

Scenario 3

# Allocate vectors
nYears = 20 + 1 # +1 to store starting generation
meanGAll = numeric(nYears)
varGAll = numeric(nYears)

# Save the starting values
meanGAll[1] = meanG(basePop)
varGAll[1] = varG(basePop)

# From the same population
Parents = basePop


# Loop
for (i in 2:nYears){
  set.seed(111)
#------------- Year 1 
F1pop = randCross(pop = Parents, nCrosses = 50, nProgeny = 2) # Crossing block

#------------- Year 2
DHpop = makeDH(pop=F1pop, nDH=14) # Doubled Haploids

#------------- Year 3
rowStage = setPheno(pop=DHpop, h2 = 0.1, reps = 1)
PrelimTrial = selectWithinFam(rowStage, nInd = 3, use = "pheno") 

#-------------  Year 4
PrelimTrial = setPheno(PrelimTrial, varE = 20, reps = 1)
AdvancTrial = selectInd(PrelimTrial, nInd = 50, use = "pheno") # Preliminary Trial

#------------- Year 5
AdvancTrial = setPheno(AdvancTrial, varE = 20, reps = 5)
EliteTrial = selectInd(AdvancTrial, nInd = 20, use = "pheno") # Advanced Trial

#------------- Year 6
EliteTrial = setPheno(EliteTrial, varE = 20, reps = 20)
Variety = selectInd(EliteTrial, nInd = 2, use = "pheno") # Elite Trial

#------------- Year 7    
# Release varieties

# Updating the parents
Parents = AdvancTrial

# Track performance
meanGAll[i] = meanG(Parents)
varGAll[i] = varG(Parents)

}

# Track performance
meanG_Scen3 = meanGAll
varG_Scen3 = varGAll

Plotting

# Plot mean of genetic values over time
meanRanges = range(c(meanG_Scen1, meanG_Scen2, meanG_Scen3))
plot(x = 1:nYears, y = meanG_Scen1, type = "l", col = "black", lwd = 3,
     xlab = "Years", ylab = "Genetic values", ylim = meanRanges)
lines(x = 1:nYears, y = meanG_Scen2, type = "l", col = "blue", lty = 2, lwd = 3)
lines(x = 1:nYears, y = meanG_Scen3, type = "l", col = "grey", lty = 2, lwd = 3)

legend(x = "topleft",  legend = c('Scen1','Scen2','Scen3'), title = "Scenario",
       lwd = 3, lty = c(1, 2,2), col = c("black", "blue", 'grey'), bty = "n")

References

Gaynor, R Chris, Gregor Gorjanc, and John M Hickey. 2021. “AlphaSimR: An r Package for Breeding Program Simulations.” G3 11 (2): jkaa017.

  1. professor, University of Florida, ↩︎

  2. Post doc, University of Florida, ↩︎