1. Introduction

Simulations have been demonstrated as a powerful tool to improve animal and plant breeding programs. In addition, those tools may offer an alternative to address theoretical concepts in quantitative genetics and breeding.

Here, we will use AlphaSimR package (Gaynor et al. 2021). The package uses stochastic simulations for design and optimization of breeding programs. The package offers a fast, simple, and inexpensive way to test alternative breeding programs.

knitr::opts_chunk$set(warning = FALSE, message = FALSE)

rm(list=ls())

#install.packages("AlphaSimR")

require(AlphaSimR)

## Carregando pacotes exigidos: AlphaSimR

## Carregando pacotes exigidos: R6

For a whole set of simulations, four simple steps must be included, as follows:

Simulate founder genomes/haplotypes.
Set global simulation parameters for a target trait.
Modeling the breeding program.
Examining results by looking into population individuals.

2. Simulate founder genomes/haplotypes

The base population is created with the function runMacs, in addition to set all the parameters expected in the founder haplotypes (i.e. number of individuals, number of chromosomes, segregation sites number).

set.seed(53627)

# Creating the founder genome
founderGen = runMacs(nInd = 100,
                     nChr = 10,
                     segSites = 100,
                     species = "MAIZE", #wheat, generic, cattle
                     ploidy = 2L)

# The founder genome
founderGen

## An object of class "MapPop" 
## Ploidy: 2 
## Individuals: 100 
## Chromosomes: 10 
## Loci: 1000

3. Set global simulation parameters for a target trait

After create the base genome, we may set the parameters for a target trait. Here, we will create a trait with only additive genetic control, using the function addTraitA, as follows:

SP <- SimParam$new(founderGen)

#Heritability
h2 = 0.5

# Additive trait (mean and variance)
SP$addTraitA(nQtlPerChr = 100,
             mean = 100,
             var = 30)

# Residual variance (based on the heritability)
SP$setVarE(h2 = h2)

Now, we can create the base population, using the function newPop:

set.seed(53627)

# Base population
basePop = newPop(founderGen)

# Setting phenotypes
basePop = setPheno(basePop)

basePop

## An object of class "Pop" 
## Ploidy: 2 
## Individuals: 100 
## Chromosomes: 10 
## Loci: 1000 
## Traits: 1

The package AlphaSimR allow us to explore several characteristics related with the populations that we created. For instance we can access:

# Individuals
basePop@id

##   [1] "1"   "2"   "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11"  "12" 
##  [13] "13"  "14"  "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24" 
##  [25] "25"  "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33"  "34"  "35"  "36" 
##  [37] "37"  "38"  "39"  "40"  "41"  "42"  "43"  "44"  "45"  "46"  "47"  "48" 
##  [49] "49"  "50"  "51"  "52"  "53"  "54"  "55"  "56"  "57"  "58"  "59"  "60" 
##  [61] "61"  "62"  "63"  "64"  "65"  "66"  "67"  "68"  "69"  "70"  "71"  "72" 
##  [73] "73"  "74"  "75"  "76"  "77"  "78"  "79"  "80"  "81"  "82"  "83"  "84" 
##  [85] "85"  "86"  "87"  "88"  "89"  "90"  "91"  "92"  "93"  "94"  "95"  "96" 
##  [97] "97"  "98"  "99"  "100"

# Phenotypic mean
mean(basePop@pheno)

## [1] 98.95431

# Phenotypic variance
varP(basePop)

##          Trait1
## Trait1 55.46051

# Genetic mean
mean(basePop@gv)

## [1] 100

#Genetic variance
varG(basePop)

##        Trait1
## Trait1     30

# Plotting the phenotypic values
hist(pheno(basePop), xlab = "Phenotype value", main = "")

4. Modeling the breeding program

After simulate the base population and the target trait parameters, we can implement the breeding program simulations by mimic a real breeding program using the functions from the package.

set.seed(53627)

# Selection of individuals
basePopSelected = selectInd(basePop, 50)

# Cross of selected individuals
newPop = randCross(pop = basePopSelected, nCrosses = 10)

# Phenotype the progeny
newPop = setPheno(newPop)

# Selfing the genotypes
pop_self = self(newPop, nProgeny = 5)

5. Examining the results: Response to selection

We may observe how do phenotypic and genetic means perform in the base population and in the new population after selection.

To this end, let’s analyse mean phenotype and genetic values in the base population, the selected individuals, and the new population.

# Mean of phenotype and genetic values in the base population
c(meanP(basePop), meanG(basePop))

##    Trait1    Trait1 
##  98.95431 100.00000

# Mean of phenotype and genetic values in the selected part of the base population
c(meanP(basePopSelected), meanG(basePopSelected))

##   Trait1   Trait1 
## 104.7271 102.8595

# Mean of phenotype and genetic values in the new population
c(meanP(newPop), meanG(newPop))

##   Trait1   Trait1 
## 102.7993 104.6966

We will now visualize these differences in phenotype and genetic values and their means, for all three groups of individuals in each generation. As before, we will highlight with a purple solid line the overall mean of the population in both generations, so we can clearly see response to selection.

phenoRange = range(pheno(c(basePop, newPop)))
par(mfrow = c(2, 2),
    mar = c(4, 4, 2, 1))
# Phenotype values in the base population and the selected individuals
tmp = hist(pheno(basePop), xlim = phenoRange, xlab = "Phenotype value", main = "Base population")
abline(v = mean(pheno(basePop)), col = "black", lty = 1, lwd = 3)
hist(pheno(basePopSelected), col = "blue", breaks = tmp$breaks, add = TRUE)
abline(v = mean(pheno(basePopSelected)), col = "blue", lty = 4, lwd = 3)

# Genetic values in the base population and the selected individuals
tmp = hist(gv(basePop), xlim = phenoRange, xlab = "Genetic value", main = "Base population")
abline(v = mean(gv(basePop)), col = "black", lty = 1, lwd = 3)
hist(gv(basePopSelected), col = "blue", breaks = tmp$breaks, add = TRUE)
abline(v = mean(gv(basePopSelected)), col = "blue", lty = 4, lwd = 3)

# Phenotype values in the new population
hist(pheno(newPop), xlim = phenoRange, xlab = "Phenotype value", main = "New population")
abline(v = mean(pheno(basePop)), col = "black", lty = 1, lwd = 3)
abline(v = mean(pheno(newPop)), col = "blue", lty = 4, lwd = 3)

# Genetic values in the new population
hist(gv(newPop), xlim = phenoRange, xlab = "Genetic value", main = "New population")
abline(v = mean(gv(basePop)), col = "black", lty = 1, lwd = 3)
abline(v = mean(gv(newPop)), col = "blue", lty = 4, lwd = 3)

6. Breeder’s equation

As we discussed before, the Breeder’s equation (Lush, 1937) can measure the gains with selection. It predicts the expected change in mean genetic value due to selection per generation (\(\Delta G\)) by multiplying the intensity of selection (\(i\)), accuracy of selection (\(r\)), and standard deviation of genetic values, as follows:

\[\Delta G = i \times r \times S_G.\]

Intensity of selection (\(i\)) is defined as selection differential (\(SD\)) expressed in units of phenotype standard deviation (\(S_P\)), \(i = SD / S_P\). Accuracy of selection (\(r\)) is defined as Pearson correlation between true genetic values (\(G\)) and estimated genetic values (\(\hat{G}\)), \(cor(G, \hat{G})\). Estimated genetic values are given by the criterion we use to rank selection candidates. Here we used phenotype values.

Lets evaluate these components and expected change in mean genetic value.

# Selection differential
(selDiff = meanP(basePopSelected) - meanP(basePop))

##  Trait1 
## 5.77282

# Intensity of selection
(i = selDiff / sqrt(varP(basePop)))

##           Trait1
## Trait1 0.7751685

# Accuracy of selection
(r = cor(gv(basePop), pheno(basePop)))

##           Trait1
## Trait1 0.6200745

par(mfrow = c(1, 1))
plot(y = gv(basePop), x = pheno(basePop),
     ylab = "Genetic value", xlab = "Phenotypic values")
points(y = gv(basePopSelected), x = pheno(basePopSelected), col = "blue", pch = 19)

# Expected change in the mean of genetic values from the simulation
(deltaG_Exp = i * r * sqrt(varG(basePop)))

##          Trait1
## Trait1 2.632695

# Observed change in the mean of genetic values from the simulation (Observed genetic gain)
(deltaG_Obs = meanG(newPop) - meanG(basePop))

##   Trait1 
## 4.696593

# Breeder's equation with the assumed heritability
selDiff * h2

##  Trait1 
## 2.88641

# Breeder's equation with a realized heritability
selDiff * varG(basePop) / varP(basePop)

##          Trait1
## Trait1 3.122665

# Expanded Breeder's equation with a realized genetic variance and estimated accuracy
i * r * sqrt(varG(basePop))

##          Trait1
## Trait1 2.632695

# Expanded Breeder's equation with a realized genetic variance and assumed accuracy
i * (sqrt(varG(basePop)) / sqrt(varP(basePop))) * sqrt(varG(basePop))

##          Trait1
## Trait1 3.122665

7. Population performance over time

The simulations also allow us to track the performance a trait over several cycles of breeding. Realistically, this represents the best strategy to measure the impact of selection on a trait over multiple cycles or generations of breeding.

# Store the results
nGenerations = 20 + 1 
meanP = numeric(nGenerations)
varP = numeric(nGenerations)
meanG = numeric(nGenerations)
varG = numeric(nGenerations)

# Save the starting values
meanP[1] = meanP(basePop)
varP[1] = varP(basePop)
meanG[1] = meanG(basePop)
varG[1] = varG(basePop)

# Copy of the object basePopSelected
newPopSelected = basePopSelected

# Selection over 20 generations
for (generation in 1:(nGenerations - 1)) {
  cat('Generation:', generation, "\n")
  # Cross parents, phenotype progeny, and select new parents
  newPop = randCross(pop = newPopSelected, nCrosses = nInd(basePop))
  newPop = setPheno(newPop, h2 = 0.5)
  newPopSelected = selectInd(pop = newPop,
                             nInd = 20,
                             use = "pheno")
  
  # Save results for each year
  meanP[1 + generation] = meanP(newPop)
  varP[1 + generation] = varP(newPop)
  meanG[1 + generation] = meanG(newPop)
  varG[1 + generation] = varG(newPop)
  
}

## Generation: 1 
## Generation: 2 
## Generation: 3 
## Generation: 4 
## Generation: 5 
## Generation: 6 
## Generation: 7 
## Generation: 8 
## Generation: 9 
## Generation: 10 
## Generation: 11 
## Generation: 12 
## Generation: 13 
## Generation: 14 
## Generation: 15 
## Generation: 16 
## Generation: 17 
## Generation: 18 
## Generation: 19 
## Generation: 20

meanRanges = range(c(meanP, meanG))
varRanges = range(c(varP, varG))

par(mfrow = c(2, 2),
    mar = c(4, 4, 1, 1))

# Plot mean of phenotype values over time
plot(x = 1:nGenerations, y = meanP, type = "l", col = "blue", lwd = 2, lty = 2,   xlab = "Generation", ylab = "Mean of phenotype values", ylim = meanRanges)

# Plot mean of genetic values over time
plot(x = 1:nGenerations, y = meanG, type = "l", col = "blue", lwd = 2,
     xlab = "Generation", ylab = "Mean of genetic values", ylim = meanRanges)

# Plot variance of phenotype values over time
plot(x = 1:nGenerations, y = varP, type = "l", col = "blue", lwd = 2, lty = 2,
     xlab = "Generation", ylab = "Variance of phenotype values", ylim = varRanges)

# Plot variance of genetic values over time
plot(x = 1:nGenerations, y = varG, type = "l", col = "blue", lwd = 2,
     xlab = "Generation", ylab = "Variance of genetic values", ylim = varRanges)

8. Multi-trait information over several cycles

Breeding programs are often targeting the improvement of several traits at a time. However, in a multi-trait framework, we may consider the correlations in-between the traits. Correlation may vary from -1 to 1, and the signal and the magnitude will impact the response with selection.

Here, we will use AlphaSimR to simulate three scenarios, with different strategy in selection and with different correlations in-between pair of traits.

Positive correlation

First, we will check a case where two traits were simulated, with a positive correlation between them. In addition, the selection will be made in only one trait and we will be able to measure the indirect response in the other trait.

rm(list=ls())

set.seed(53627)

founderGen = runMacs(nInd = 100,
                         nChr = 10,
                         segSites = 100,
                         species = "MAIZE")
SP = SimParam$new(founderGen)


# Define the traits
means = c(100, 100)
vars = c(10, 20)
cors = matrix(data = c( 1.0, 0.5,
                       0.5,  1.0),
                byrow = TRUE, nrow = 2, ncol = 2)
h2s = c(0.5, 0.5)
SP$addTraitA(nQtlPerChr = 100, 
             mean = means, 
             var = vars, 
             corA = cors)

Create the base population for later selection.

# Base population
basePop = newPop(founderGen)

# Phenotype the population
basePop = setPheno(basePop, h2 = h2s)

Plotting the phenotypic and genetic mean/variances through several generations of breeding.

# Store the results
nGenerations = 10 + 1 # +1 to store starting generation
meanG = vector("list", length = nGenerations)
varG = vector("list", length = nGenerations)

# Save the starting values
meanG[[1]] = meanG(basePop)
varG[[1]] = varG(basePop)

# First selection step
nSelected = 20
newPopSelected = selectInd(pop = basePop,
                           nInd = nSelected,
                           use = "pheno",
                           trait = 1)

# Selection over many generations
for (generation in 1:(nGenerations - 1)) {
  newPop = randCross(newPopSelected, nCrosses = nInd(basePop))
  newPop = setPheno(newPop, h2 = h2s)
  newPopSelected = selectInd(pop = newPop,
                             nInd = nSelected,
                             use = "pheno",
                             trait = 1)
  # Save summaries
  meanG[[1 + generation]] = meanG(newPop)
  varG[[1 + generation]] = varG(newPop)
  }

# Plot results
meanGTrait1 = sapply(meanG, function(x) x[1])
meanGTrait2 = sapply(meanG, function(x) x[2])
meanRanges = range(c(meanGTrait1, meanGTrait2))

varGTrait1 = sapply(varG, function(x) x[1, 1])
varGTrait2 = sapply(varG, function(x) x[2, 2])
varRanges = range(c(varGTrait1, varGTrait2))

# Plot mean of genetic values over time
plot(x = 1:nGenerations, y = meanGTrait1, type = "l", col = "grey40", lwd = 3,
     xlab = "Generation", ylab = "Mean of genetic values", ylim = meanRanges)
lines(x = 1:nGenerations, y = meanGTrait2, type = "l", col = "grey40", lty = 3, lwd = 3)
legend(x = "topleft", legend = c("1", "2"), title = "Trait",
       lwd = 3, lty = c(1, 3), col = c("grey60", "grey60"))

Negative correlation

The second example will consider two traits with negative correlation, for the same breeding program.

rm(list=ls())

set.seed(53627)

founderGen = runMacs(nInd = 100,
                         nChr = 10,
                         segSites = 100,
                         species = "MAIZE")
SP = SimParam$new(founderGen)

# Define the traits
means = c(100, 100)
vars = c(10, 20)
cors = matrix(data = c( 1.0, -0.6,
                       -0.6,  1.0),
                byrow = TRUE, nrow = 2, ncol = 2)

h2s = c(0.5, 0.5)

SP$addTraitA(nQtlPerChr = 100, 
             mean = means, 
             var = vars, 
             corA = cors)

Create the base population for later selection

# Base population
basePop = newPop(founderGen)

# Phenotype the population
basePop = setPheno(basePop, h2 = h2s)

Plotting the phenotypic and genetic mean through several generations of breeding.

Negative correlation with index

In the third case we assume two traits with negative correlation in-between them (same magnitude as above mentioned). However, we will add a selection index for individual selection in each cycle. We use the function selIndex from AlphaSimR and we set a weight of 0.5 for both traits.

# Store the results
nGenerations = 10 + 1 
meanG = vector("list", length = nGenerations)
varG = vector("list", length = nGenerations)


# Save the starting values
meanG[[1]] = meanG(basePop)
varG[[1]] = varG(basePop)

# First selection step
nSelected = 20
newPopSelected = selectInd(pop = basePop,
                           nInd = nSelected,
                           use = "pheno",
                           trait = selIndex, b = c(0.5, 0.5), scale = TRUE)

# Selection over many generations
for (generation in 1:(nGenerations - 1)) {
  # Cross parents, phenotype progeny, and select new parents
  newPop = randCross(newPopSelected, nCrosses = nInd(basePop))
  newPop = setPheno(newPop, h2 = h2s)
  newPopSelected = selectInd(pop = newPop,
                             nInd = nSelected,
                             use = "pheno",
                             trait = selIndex, b = c(0.5, 0.5), scale = TRUE)
  # Save summaries
  meanG[[1 + generation]] = meanG(newPop)
  varG[[1 + generation]] = varG(newPop)
  }

# Plot results
meanGTrait1 = sapply(meanG, function(x) x[1])
meanGTrait2 = sapply(meanG, function(x) x[2])
meanRanges = range(c(meanGTrait1, meanGTrait2))

varGTrait1 = sapply(varG, function(x) x[1, 1])
varGTrait2 = sapply(varG, function(x) x[2, 2])
varRanges = range(c(varGTrait1, varGTrait2))

# Plot mean of genetic values over time
plot(x = 1:nGenerations, y = meanGTrait1, type = "l", col = "grey40", lwd = 3,
     xlab = "Generation", ylab = "Mean of genetic values", ylim = meanRanges)
lines(x = 1:nGenerations, y = meanGTrait2, type = "l", col = "grey40", lty = 2, lwd = 3)
legend(x = "topleft", legend = c("1", "2"), title = "Trait",
       lwd = 3, lty = c(1, 2), col = c("grey40", "grey40"))

9. Conclusions

AlphaSimR stands out as an important alternative for breeding program simulations and optimization.

References

Gaynor, R. C., Gorjanc, G., & Hickey, J. M. (2021). AlphaSimR: an R package for breeding program simulations. G3, 11(2), jkaa017.

Lush J (1937) Animal breeding. Plans. Iowa State College Press, Ames

Response to selection in AlphaSimR

Marco Antônio Peixoto & Márcio Resende Jr.

2022-10-24