As we recall from the first vignette, in a way to implement a set of simulations in the AlphaSimR package four steps must be followed:
In this vignette, we will cover a few useful functions from AlphaSimR that will help to Model the breeding program (step 3). In addition, we will compare two implementations over a few generations of selection.
We will start by loading the packages.
## Loading required package: R6
Let’s start the simulation by creating the base genome.
First, at the beginning of each breeding cycle, we generate new crosses. We can implement random crosses as well as make oriented crosses by giving the list of crosses to make.
# 1. Crossing randomly
F1pop = randCross(pop = Parents, nCrosses = 100)
# 2. Make crosses by a list of individuals
# Making the crossing plan
MatingPlan = data.frame(Parent1 = Parents@id[1:5],
Parent2 = Parents@id[6:10])
# Crossing based on the crossing list
F1Pop = makeCross(Parents, crossPlan = as.matrix(MatingPlan))
# Checking out the pedigree records out of F1
Pedigree = data.frame(Ind = F1Pop@id,
Parent1 = F1Pop@mother,
Parent2 = F1Pop@father)
Now, we can advance the materials. One function that we may use is
the self()
.
Then, we may assume phenotypes for all those individuals. For each
individual, the phenotypes are assumed to come from the genetic values
generated. Then, AlphaSimR adds in each genetic value a
random error sampled from a multivariate normal distribution in the
function setPheno()
. Some important arguments in the
function setPheno()
are:
pop = the target population with the individuals to
assume the phenotypes.
h2 = narrow-sense heritability.
H2 = broad-sense heritability.
varE = residual variance.
reps = number of repetitions.
p = the p-value for the environmental covariate used by
GxE traits. If NULL, a value is sampled at random.
There are three options for assuming the error variance mentioned. We can set the argument h2, H2, varE. We should use only one of these arguments. If we supply values for more than one, only one will be used according to the order in which they are listed above.
# Setting phenotypes
F2popvarE = setPheno(pop = F2pop, varE = 10, reps = 1) # Option 1
F2poph2 = setPheno(pop = F2pop, h2 = 0.5, reps = 1) # Option 2
F2popH2 = setPheno(pop = F2pop, H2 = 0.6, reps = 1) # Option 3
# Correlation
df = data.frame('VarE' = cor(F2popvarE@gv, F2popvarE@pheno),
'h2' = cor(F2poph2@gv, F2poph2@pheno),
'H2' = cor(F2popH2@gv, F2popH2@pheno))
df
## Trait1 Trait1.1 Trait1.2
## Trait1 0.3840965 0.7793424 0.8256036
# Plotting the values
par(mfrow = c(2,3))
hist(F2popvarE@gv)
hist(F2poph2@gv)
hist(F2popH2@gv)
hist(F2popvarE@pheno)
hist(F2poph2@pheno)
hist(F2popH2@pheno)
The reps argument is for convenient representation of replicated data. It is intended to represent replicated yield trials in plant breeding programs. In this case, varE is set to the plot error and reps is set to the number of plots per entry. The resulting phenotype represents the entry-means.
We will explore more in the following populations:
# Setting phenotypes
F2pheno1 = setPheno(F2pop, varE = 10, reps = 1)
# Setting phenotypes
F2pheno6 = setPheno(F2pop, varE = 10, reps = 6)
# Setting phenotypes
F2pheno10 = setPheno(F2pop, varE = 10, reps = 10)
# Correlation with the genetic value using different numbers of repetitions
cor(F2pheno1@gv, F2pheno1@pheno)
## Trait1
## Trait1 0.35992
## Trait1
## Trait1 0.7055848
## Trait1
## Trait1 0.7720663
# Plotting the parameters
df = data.frame(Trait1 = c(F2pheno1@pheno,F2pheno6@pheno,F2pheno10@pheno),
Id = c(1:(3*nInd(F2pop))),
Pop = rep(c('rep1','rep6','rep10'), each=nInd(F2pop)))
#require(ggplot2)
ggplot(df, aes(x = Id, y = Trait1, color = Pop)) +
geom_point(show.legend = TRUE)
Another argument in the setPheno()
function is
p. It represents the environmental covariate and it varies from
0-1. Values close to 0 and 1 put more weight on the phenotypic values
(larger errors) and values close to 0.5, have a smaller weight (lower
errors). Generally, we assume uniform values for the p
argument.
We will explore more in the following populations:
# Setting phenotypes
F2_pop1 = setPheno(F2pop, reps = 2, h2 = 0.1, p = 0.9)
# Setting phenotypes
F2_pop2 = setPheno(F2pop, reps = 2, h2 = 0.1, p = 0.5)
# Setting phenotypes
F2_pop3 = setPheno(F2pop, reps = 2, h2 = 0.1, p = 0.1)
# Correlation of the phenotypes with the genetic values
cor(F2_pop1@gv, F2_pop1@pheno)
## Trait1
## Trait1 0.5298826
## Trait1
## Trait1 0.5199872
## Trait1
## Trait1 0.471256
The next step in modeling the breeding program is to make selections
on the phenotypes. We may use the function selectInd()
# Advancing F1->F2
F2pop = self(pop = F1pop, nProgeny = 1)
# Setting phenotypes
F2pop = setPheno(F2pop, varE = 10, reps = 1)
# Select the best-performing individuals according to their phenotype
F2popSel = selectInd(pop = F2pop,
nInd = 20,
use = "pheno") # gv, pheno, bv, ebv
# Create a new population and assume phenotypes
newPop = randCross(F2popSel, nCrosses = 100)
newPop = setPheno(newPop, h2 = 0.5)
# Evaluate observed response to selection between generations (as the difference between the mean of genetic values)
deltaG = mean(gv(newPop)) - mean(gv(F2pop))
To see the impact of the selections in the genetic mean of the populations, we can plot the phenotypic and genetic values from both populations:
phenoRange = range(pheno(c(F2pop, newPop)))
par(mfrow = c(2, 2),
mar = c(4, 4, 2, 1))
# Phenotype values in the base population and the selected individuals
tmp = hist(pheno(F2pop), xlim = phenoRange, xlab = "Phenotype value", main = "Base population")
abline(v = mean(pheno(F2pop)), col = "black", lty = 1, lwd = 3)
hist(pheno(F2popSel), col = "blue", breaks = tmp$breaks, add = TRUE)
abline(v = mean(pheno(F2popSel)), col = "blue", lty = 4, lwd = 3)
# Genetic values in the base population and the selected individuals
tmp = hist(gv(F2pop), xlim = phenoRange, xlab = "Genetic value", main = "Base population")
abline(v = mean(gv(F2pop)), col = "black", lty = 1, lwd = 3)
hist(gv(F2popSel), col = "blue", breaks = tmp$breaks, add = TRUE)
abline(v = mean(gv(F2popSel)), col = "blue", lty = 4, lwd = 3)
# Phenotype values in the new population
hist(pheno(newPop), xlim = phenoRange, xlab = "Phenotype value", main = "New population")
abline(v = mean(pheno(F2pop)), col = "black", lty = 1, lwd = 3)
abline(v = mean(pheno(newPop)), col = "blue", lty = 4, lwd = 3)
# Genetic values in the new population
hist(gv(newPop), xlim = phenoRange, xlab = "Genetic value", main = "New population")
abline(v = mean(gv(F2pop)), col = "black", lty = 1, lwd = 3)
abline(v = mean(gv(newPop)), col = "blue", lty = 4, lwd = 3)
Then, we can advance the materials and do selections to the end of the pipeline.
# Advancing F2->F3 and selection
F3pop = self(pop = F2pop, nProgeny = 4)
F3pop = setPheno(F3pop, varE = 10, reps = 3)
F3popSel = selectWithinFam(F3pop, nInd = 4, use = "pheno")
F3popSel = selectInd(F3popSel, nInd = 300, use = "pheno")
# Advancing F3->F4 and selection
F4pop = self(pop = F3popSel, nProgeny = 2)
F4pop = setPheno(F4pop, varE = 10, reps = 5)
F4popSel = selectWithinFam(F4pop, nInd = 1, use = "pheno")
F4popSel = selectInd(F4popSel, nInd = 50, use = "pheno")
# Advancing F4->F5 and selection
F5pop = self(pop = F4popSel, nProgeny = 1)
F5pop = setPheno(F5pop, varE = 10, reps = 8)
F5popSel = selectInd(F5pop, nInd = 10, use = "pheno")
# Advancing F5->F6 and selection
F6pop = self(pop = F5popSel, nProgeny = 1)
F6pop = setPheno(F6pop, varE = 10, reps = 10)
F6Sel = selectInd(F6pop, nInd = 5, use = "pheno")
# Selecting Variety
F6Sel = setPheno(F6Sel, varE = 10, reps = 20)
Variety = selectInd(F6Sel, nInd = 1, use = "pheno")
# 1. Making DHs
F1 = randCross(Parents,120,1)
DHs = makeDH(F1, nDH = 10)
# 2. Hybrid Cross
tester = DHs[1:5] # Five individuals as testers
lines = DHs[6:105] # Lines
Hybrids = hybridCross(tester, lines, crossPlan = 'testcross')
length(unique(Hybrids@father))
## [1] 100
# Set pheno for CGA
linesGCA = setPhenoGCA(lines, tester, h2 = 0.2)
lines = setPheno(lines, h2 = 0.2)
cor(linesGCA@pheno, lines@pheno)
## Trait1
## [1,] 0.3195568
rm(list=ls())
# Creating the founding genome
founderGenomes = runMacs(nInd = 100,
nChr = 4,
segSites = 20,
species = "MAIZE")
# Setting up trait characteristics
SP = SimParam$new(founderGenomes)
SP$addTraitA(nQtlPerChr = 20,
mean = 10,
var = 1)
# Generating base population
basePop = newPop(founderGenomes)
# Set simulation parameters
nSelected1 = 10 # Scenario 1
nSelected2 = 50 # Scenario 2
##------------------- For the first scenario
# Select the best performing individuals according to their phenotype
basePopSelected = basePop
# Allocate vectors
nGenerations = 10 + 1 # +1 to store starting generation
meanGAll = numeric(nGenerations)
varGAll = numeric(nGenerations)
# Save the starting values
meanGAll[1] = meanG(basePop)
varGAll[1] = varG(basePop)
# To make the for loop below simpler we will make a copy of the object basePopSelected
newPopSelected = basePopSelected
# Selection over many generations
for (generation in 1:(nGenerations - 1)) {
# Cross parents, phenotype progeny, and select new parents
newPop = randCross(newPopSelected, nCrosses = nInd(basePop))
newPop = setPheno(newPop, h2 = 0.5)
newPopSelected = selectInd(pop = newPop,
nInd = nSelected1,
use = "pheno")
# Save summaries
meanGAll[1 + generation] = meanG(newPop)
varGAll[1 + generation] = varG(newPop)
}
# Now save these outputs by copying the objects
meanGAll_n10 = meanGAll
varGAll_n10 = varGAll
##------------------- For second scenario
# Allocate vectors
nGenerations = 10 + 1 # +1 to store starting generation
meanGAll = numeric(nGenerations)
varGAll = numeric(nGenerations)
# Save the starting values
meanGAll[1] = meanG(basePop)
varGAll[1] = varG(basePop)
# To make the for loop below simpler we will make a copy of the object basePopSelected
newPopSelected = basePopSelected
# Selection over many generations
for (generation in 1:(nGenerations - 1)) {
# Cross parents, phenotype progeny, and select new parents
newPop = randCross(newPopSelected, nCrosses = nInd(basePop))
newPop = setPheno(newPop, h2 = 0.5)
newPopSelected = selectInd(pop = newPop,
nInd = nSelected2,
use = "pheno")
# Save summaries
meanGAll[1 + generation] = meanG(newPop)
varGAll[1 + generation] = varG(newPop)
}
# Now save these outputs by copying the objects
meanGAll_n50 = meanGAll
varGAll_n50 = varGAll
Let’s plot and see the changes over time
par(mfrow = c(2, 1),
mar = c(4, 4, 1, 1))
# Plot mean of genetic values over time
meanRanges = range(c(meanGAll_n10, meanGAll_n50))
plot(x = 1:nGenerations, y = meanGAll_n10, type = "l", col = "black", lwd = 3,
xlab = "Generation", ylab = "Mean of genetic values", ylim = meanRanges)
lines(x = 1:nGenerations, y = meanGAll_n50, type = "l", col = "purple", lty = 2, lwd = 3)
legend(x = "topleft", legend = c(10, 50), title = "nSelected",
lwd = 3, lty = c(1, 2), col = c("black", "purple"), bty = "n")
# Plot variance of genetic values over time
varRanges = range(c(varGAll_n50, varGAll_n10))
plot(x = 1:nGenerations, y = varGAll_n10, type = "l", col = "black", lwd = 3,
xlab = "Generation", ylab = "Variance of genetic values", ylim = varRanges)
lines(x = 1:nGenerations, y = varGAll_n50, type = "l", col = "purple", lty = 2, lwd = 3)
legend(x = "topright", legend = c(10, 50), title = "nSelected",
lwd = 3, lty = c(1, 2), col = c("black", "purple"), bty = "n")
# Clean the working environment
rm(list = ls())
# Load AlphaSimR, simulate founder genomes, define a trait, and simulate a base population
founderGenomes = runMacs(nInd = 100,
nChr = 10,
segSites = 100,
species = "MAIZE")
# Trait
SP = SimParam$new(founderGenomes)
SP$addTraitA(nQtlPerChr = 100,
mean = c(10, 2),
var = c(1,1),
corA = matrix(data=c(1, 0.25, 0.25, 1), ncol = 2))
# Parents
Parents = newPop(founderGenomes)
We can proceed and create crosses that take both traits into account. In addition, we can assume phenotypes and select individuals.
# Crossing randomly
F1pop = randCross(pop = Parents, nCrosses = 100)
# Advancing F1->F2 and phenotypes
F2pop = self(pop = F1pop, nProgeny = 1)
F2pop = setPheno(F2pop, varE = c(10, 2), reps = 1)
# Select the top individuals based on their phenotypes - trait 1
popSelT1 = selectInd(pop = F2pop,
nInd = 20,
use = "pheno",
trait = 1) # gv, pheno, bv, ebv
# Select the top individuals based on their phenotypes - trait 2
popSelT2 = selectInd(pop = F2pop,
nInd = 20,
use = "pheno",
trait = 2) # gv, pheno, bv, ebv
# Select the top individuals based on their phenotypes - trait 2
popSelBoth = selectInd(pop = F2pop,
nInd = 20,
use = "pheno",
trait = selIndex,
b=c(0.5,0.5)) # index
# Gain with selections based on individual traits or in the index.
df = data.frame(Basepop = c(apply(F2pop@pheno, 2, FUN = mean)),
Sel_T1 = c(apply(popSelT1@pheno, 2, FUN = mean)),
Sel_T2 = c(apply(popSelT2@pheno, 2, FUN = mean)),
Sel_Both = c(apply(popSelBoth@pheno, 2, FUN = mean)))
df
## Basepop Sel_T1 Sel_T2 Sel_Both
## Trait1 9.812046 14.966326 9.882529 14.652945
## Trait2 1.967772 1.691937 4.478636 2.503956
professor, University of Florida, mresende@ufl.edu↩︎
Post doc, University of Florida, deamorimpeixotom@ufl.edu↩︎