AlphaSimR: Functions

```{=html}``` ## Introduction As we recall from the first vignette, in a way to implement a set of simulations in the **AlphaSimR** package four steps must be followed: 1. Simulate founder genomes/haplotypes. 2. Set global simulation parameters for a target trait/traits. 3. Model the breeding program. 4. Examine results by looking into population individuals' metrics. In this vignette, we will cover a few useful functions from **AlphaSimR** that will help to Model the breeding program (step 3). In addition, we will compare two implementations over a few generations of selection. We will start by loading the packages. ```{r} # Loading packages library(AlphaSimR) library(ggplot2) ``` ## Modeling a breeding program in **AlphaSimR** ### Founder Genome Let's start the simulation by creating the base genome. ```{r} # Creating the founding genome founderGenomes = runMacs(nInd = 100, nChr = 4, segSites = 100, species = "MAIZE") # Setting up trait characteristics SP = SimParam$new(founderGenomes) SP$addTraitA(nQtlPerChr = 100, mean = 10, var = 1) # Generating base population Parents = newPop(founderGenomes) ``` ### Making crosses First, at the beginning of each breeding cycle, we generate new crosses. We can implement random crosses as well as make oriented crosses by giving the list of crosses to make. ```{r} # 1. Crossing randomly F1pop = randCross(pop = Parents, nCrosses = 100) # 2. Make crosses by a list of individuals # Making the crossing plan MatingPlan = data.frame(Parent1 = Parents@id[1:5], Parent2 = Parents@id[6:10]) # Crossing based on the crossing list F1Pop = makeCross(Parents, crossPlan = as.matrix(MatingPlan)) # Checking out the pedigree records out of F1 Pedigree = data.frame(Ind = F1Pop@id, Parent1 = F1Pop@mother, Parent2 = F1Pop@father) ``` ### Advance the genotypes Now, we can advance the materials. One function that we may use is the `self()`. ```{r} # Advancing F1->F2 F2pop = self(pop = F1pop, nProgeny = 10) ``` Then, we may assume phenotypes for all those individuals. For each individual, the phenotypes are assumed to come from the genetic values generated. Then, **AlphaSimR** adds in each genetic value a random error sampled from a multivariate normal distribution in the function `setPheno()`. Some important arguments in the function `setPheno()` are: **pop** = the target population with the individuals to assume the phenotypes. **h2** = narrow-sense heritability. **H2** = broad-sense heritability. **varE** = residual variance. **reps** = number of repetitions. **p** = the p-value for the environmental covariate used by GxE traits. If NULL, a value is sampled at random. There are three options for assuming the error variance mentioned. We can set the argument *h2*, *H2*, *varE*. We should use only one of these arguments. If we supply values for more than one, only one will be used according to the order in which they are listed above. ```{r} # Setting phenotypes F2popvarE = setPheno(pop = F2pop, varE = 10, reps = 1) # Option 1 F2poph2 = setPheno(pop = F2pop, h2 = 0.5, reps = 1) # Option 2 F2popH2 = setPheno(pop = F2pop, H2 = 0.6, reps = 1) # Option 3 # Correlation df = data.frame('VarE' = cor(F2popvarE@gv, F2popvarE@pheno), 'h2' = cor(F2poph2@gv, F2poph2@pheno), 'H2' = cor(F2popH2@gv, F2popH2@pheno)) df # Plotting the values par(mfrow = c(2,3)) hist(F2popvarE@gv) hist(F2poph2@gv) hist(F2popH2@gv) hist(F2popvarE@pheno) hist(F2poph2@pheno) hist(F2popH2@pheno) ``` The *reps* argument is for convenient representation of replicated data. It is intended to represent replicated yield trials in plant breeding programs. In this case, *varE* is set to the plot error and *reps* is set to the number of plots per entry. The resulting phenotype represents the entry-means. We will explore more in the following populations: ```{r} # Setting phenotypes F2pheno1 = setPheno(F2pop, varE = 10, reps = 1) # Setting phenotypes F2pheno6 = setPheno(F2pop, varE = 10, reps = 6) # Setting phenotypes F2pheno10 = setPheno(F2pop, varE = 10, reps = 10) # Correlation with the genetic value using different numbers of repetitions cor(F2pheno1@gv, F2pheno1@pheno) cor(F2pheno6@gv, F2pheno6@pheno) cor(F2pheno10@gv, F2pheno10@pheno) # Plotting the parameters df = data.frame(Trait1 = c(F2pheno1@pheno,F2pheno6@pheno,F2pheno10@pheno), Id = c(1:(3*nInd(F2pop))), Pop = rep(c('rep1','rep6','rep10'), each=nInd(F2pop))) #require(ggplot2) ggplot(df, aes(x = Id, y = Trait1, color = Pop)) + geom_point(show.legend = TRUE) ``` Another argument in the `setPheno()` function is *p*. It represents the environmental covariate and it varies from 0-1. Values close to 0 and 1 put more weight on the phenotypic values (larger errors) and values close to 0.5, have a smaller weight (lower errors). Generally, we assume uniform values for the *p* argument. We will explore more in the following populations: ```{r} # Setting phenotypes F2_pop1 = setPheno(F2pop, reps = 2, h2 = 0.1, p = 0.9) # Setting phenotypes F2_pop2 = setPheno(F2pop, reps = 2, h2 = 0.1, p = 0.5) # Setting phenotypes F2_pop3 = setPheno(F2pop, reps = 2, h2 = 0.1, p = 0.1) # Correlation of the phenotypes with the genetic values cor(F2_pop1@gv, F2_pop1@pheno) cor(F2_pop2@gv, F2_pop2@pheno) cor(F2_pop3@gv, F2_pop3@pheno) ``` The next step in modeling the breeding program is to make selections on the phenotypes. We may use the function `selectInd()` ```{r} # Advancing F1->F2 F2pop = self(pop = F1pop, nProgeny = 1) # Setting phenotypes F2pop = setPheno(F2pop, varE = 10, reps = 1) # Select the best-performing individuals according to their phenotype F2popSel = selectInd(pop = F2pop, nInd = 20, use = "pheno") # gv, pheno, bv, ebv # Create a new population and assume phenotypes newPop = randCross(F2popSel, nCrosses = 100) newPop = setPheno(newPop, h2 = 0.5) # Evaluate observed response to selection between generations (as the difference between the mean of genetic values) deltaG = mean(gv(newPop)) - mean(gv(F2pop)) ``` ### Plotting To see the impact of the selections in the genetic mean of the populations, we can plot the phenotypic and genetic values from both populations: ```{r} phenoRange = range(pheno(c(F2pop, newPop))) par(mfrow = c(2, 2), mar = c(4, 4, 2, 1)) # Phenotype values in the base population and the selected individuals tmp = hist(pheno(F2pop), xlim = phenoRange, xlab = "Phenotype value", main = "Base population") abline(v = mean(pheno(F2pop)), col = "black", lty = 1, lwd = 3) hist(pheno(F2popSel), col = "blue", breaks = tmp$breaks, add = TRUE) abline(v = mean(pheno(F2popSel)), col = "blue", lty = 4, lwd = 3) # Genetic values in the base population and the selected individuals tmp = hist(gv(F2pop), xlim = phenoRange, xlab = "Genetic value", main = "Base population") abline(v = mean(gv(F2pop)), col = "black", lty = 1, lwd = 3) hist(gv(F2popSel), col = "blue", breaks = tmp$breaks, add = TRUE) abline(v = mean(gv(F2popSel)), col = "blue", lty = 4, lwd = 3) # Phenotype values in the new population hist(pheno(newPop), xlim = phenoRange, xlab = "Phenotype value", main = "New population") abline(v = mean(pheno(F2pop)), col = "black", lty = 1, lwd = 3) abline(v = mean(pheno(newPop)), col = "blue", lty = 4, lwd = 3) # Genetic values in the new population hist(gv(newPop), xlim = phenoRange, xlab = "Genetic value", main = "New population") abline(v = mean(gv(F2pop)), col = "black", lty = 1, lwd = 3) abline(v = mean(gv(newPop)), col = "blue", lty = 4, lwd = 3) ``` Then, we can advance the materials and do selections to the end of the pipeline. ```{r} # Advancing F2->F3 and selection F3pop = self(pop = F2pop, nProgeny = 4) F3pop = setPheno(F3pop, varE = 10, reps = 3) F3popSel = selectWithinFam(F3pop, nInd = 4, use = "pheno") F3popSel = selectInd(F3popSel, nInd = 300, use = "pheno") # Advancing F3->F4 and selection F4pop = self(pop = F3popSel, nProgeny = 2) F4pop = setPheno(F4pop, varE = 10, reps = 5) F4popSel = selectWithinFam(F4pop, nInd = 1, use = "pheno") F4popSel = selectInd(F4popSel, nInd = 50, use = "pheno") # Advancing F4->F5 and selection F5pop = self(pop = F4popSel, nProgeny = 1) F5pop = setPheno(F5pop, varE = 10, reps = 8) F5popSel = selectInd(F5pop, nInd = 10, use = "pheno") # Advancing F5->F6 and selection F6pop = self(pop = F5popSel, nProgeny = 1) F6pop = setPheno(F6pop, varE = 10, reps = 10) F6Sel = selectInd(F6pop, nInd = 5, use = "pheno") # Selecting Variety F6Sel = setPheno(F6Sel, varE = 10, reps = 20) Variety = selectInd(F6Sel, nInd = 1, use = "pheno") ``` ## Some (other) useful functions in **AlphaSimR**. ```{r} # 1. Making DHs F1 = randCross(Parents,120,1) DHs = makeDH(F1, nDH = 10) # 2. Hybrid Cross tester = DHs[1:5] # Five individuals as testers lines = DHs[6:105] # Lines Hybrids = hybridCross(tester, lines, crossPlan = 'testcross') length(unique(Hybrids@father)) # Set pheno for CGA linesGCA = setPhenoGCA(lines, tester, h2 = 0.2) lines = setPheno(lines, h2 = 0.2) cor(linesGCA@pheno, lines@pheno) ``` ## Examine results over generations ```{r} rm(list=ls()) # Creating the founding genome founderGenomes = runMacs(nInd = 100, nChr = 4, segSites = 20, species = "MAIZE") # Setting up trait characteristics SP = SimParam$new(founderGenomes) SP$addTraitA(nQtlPerChr = 20, mean = 10, var = 1) # Generating base population basePop = newPop(founderGenomes) # Set simulation parameters nSelected1 = 10 # Scenario 1 nSelected2 = 50 # Scenario 2 ##------------------- For the first scenario # Select the best performing individuals according to their phenotype basePopSelected = basePop # Allocate vectors nGenerations = 10 + 1 # +1 to store starting generation meanGAll = numeric(nGenerations) varGAll = numeric(nGenerations) # Save the starting values meanGAll[1] = meanG(basePop) varGAll[1] = varG(basePop) # To make the for loop below simpler we will make a copy of the object basePopSelected newPopSelected = basePopSelected # Selection over many generations for (generation in 1:(nGenerations - 1)) { # Cross parents, phenotype progeny, and select new parents newPop = randCross(newPopSelected, nCrosses = nInd(basePop)) newPop = setPheno(newPop, h2 = 0.5) newPopSelected = selectInd(pop = newPop, nInd = nSelected1, use = "pheno") # Save summaries meanGAll[1 + generation] = meanG(newPop) varGAll[1 + generation] = varG(newPop) } # Now save these outputs by copying the objects meanGAll_n10 = meanGAll varGAll_n10 = varGAll ##------------------- For second scenario # Allocate vectors nGenerations = 10 + 1 # +1 to store starting generation meanGAll = numeric(nGenerations) varGAll = numeric(nGenerations) # Save the starting values meanGAll[1] = meanG(basePop) varGAll[1] = varG(basePop) # To make the for loop below simpler we will make a copy of the object basePopSelected newPopSelected = basePopSelected # Selection over many generations for (generation in 1:(nGenerations - 1)) { # Cross parents, phenotype progeny, and select new parents newPop = randCross(newPopSelected, nCrosses = nInd(basePop)) newPop = setPheno(newPop, h2 = 0.5) newPopSelected = selectInd(pop = newPop, nInd = nSelected2, use = "pheno") # Save summaries meanGAll[1 + generation] = meanG(newPop) varGAll[1 + generation] = varG(newPop) } # Now save these outputs by copying the objects meanGAll_n50 = meanGAll varGAll_n50 = varGAll ``` Let's plot and see the changes over time ```{r} par(mfrow = c(2, 1), mar = c(4, 4, 1, 1)) # Plot mean of genetic values over time meanRanges = range(c(meanGAll_n10, meanGAll_n50)) plot(x = 1:nGenerations, y = meanGAll_n10, type = "l", col = "black", lwd = 3, xlab = "Generation", ylab = "Mean of genetic values", ylim = meanRanges) lines(x = 1:nGenerations, y = meanGAll_n50, type = "l", col = "purple", lty = 2, lwd = 3) legend(x = "topleft", legend = c(10, 50), title = "nSelected", lwd = 3, lty = c(1, 2), col = c("black", "purple"), bty = "n") # Plot variance of genetic values over time varRanges = range(c(varGAll_n50, varGAll_n10)) plot(x = 1:nGenerations, y = varGAll_n10, type = "l", col = "black", lwd = 3, xlab = "Generation", ylab = "Variance of genetic values", ylim = varRanges) lines(x = 1:nGenerations, y = varGAll_n50, type = "l", col = "purple", lty = 2, lwd = 3) legend(x = "topright", legend = c(10, 50), title = "nSelected", lwd = 3, lty = c(1, 2), col = c("black", "purple"), bty = "n") ``` ## Multi traits ```{r} # Clean the working environment rm(list = ls()) # Load AlphaSimR, simulate founder genomes, define a trait, and simulate a base population founderGenomes = runMacs(nInd = 100, nChr = 10, segSites = 100, species = "MAIZE") # Trait SP = SimParam$new(founderGenomes) SP$addTraitA(nQtlPerChr = 100, mean = c(10, 2), var = c(1,1), corA = matrix(data=c(1, 0.25, 0.25, 1), ncol = 2)) # Parents Parents = newPop(founderGenomes) ``` We can proceed and create crosses that take both traits into account. In addition, we can assume phenotypes and select individuals. ```{r} # Crossing randomly F1pop = randCross(pop = Parents, nCrosses = 100) # Advancing F1->F2 and phenotypes F2pop = self(pop = F1pop, nProgeny = 1) F2pop = setPheno(F2pop, varE = c(10, 2), reps = 1) # Select the top individuals based on their phenotypes - trait 1 popSelT1 = selectInd(pop = F2pop, nInd = 20, use = "pheno", trait = 1) # gv, pheno, bv, ebv # Select the top individuals based on their phenotypes - trait 2 popSelT2 = selectInd(pop = F2pop, nInd = 20, use = "pheno", trait = 2) # gv, pheno, bv, ebv # Select the top individuals based on their phenotypes - trait 2 popSelBoth = selectInd(pop = F2pop, nInd = 20, use = "pheno", trait = selIndex, b=c(0.5,0.5)) # index # Gain with selections based on individual traits or in the index. df = data.frame(Basepop = c(apply(F2pop@pheno, 2, FUN = mean)), Sel_T1 = c(apply(popSelT1@pheno, 2, FUN = mean)), Sel_T2 = c(apply(popSelT2@pheno, 2, FUN = mean)), Sel_Both = c(apply(popSelBoth@pheno, 2, FUN = mean))) df ``` ## References ::: {#refs} :::

Marco Antonio Peixoto