{width=10%} AlphaSimR: Base population and traits

```{=html}```
```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Introduction To start a set of simulations in the `AlphaSimR` package [@gaynor2021alphasimr], four steps must be implemented, as follows: 1. Simulate founder genomes/haplotypes. 2. Set global simulation parameters for a target trait/traits. 3. Model the breeding program. 4. Examine results by looking into population individuals' metrics. In this vignette, we will implement and further explore the first two. ## Genetic basis of base population: Founder genomes First, we clean our working environment and download/load the AlphaSimR package. ```{r} #install.packages('AlphaSimR') require(AlphaSimR) ``` To create the founder genome we may use the function `runMacs()`. It uses the method from Chen et al. [@chen2009fast] to simulate a set of haplotypes and, from that, a species-specific demography. This allows us to simulate a base genome with characteristics of the target crop that we are working with. ```{r} # Founder genome founderGenomes = runMacs(nInd = 10, # Number of individuals that compose the base genome nChr = 10, # Number of chromosome pairs of the target species segSites = 100, # number of segregation sites species = "MAIZE", # We can use the base parameters and we have available in the package MAIZE, WHEAT, CATTLE, and GENERIC. ploidy = 2) # Setting the ploidy # Object created founderGenomes # Genetic map of the segregation sites founderGenomes@genMap ``` When we choose the species in the argument *species*, four parameters are automatically adjusted: ```{r, eval=FALSE} # Genome length (i.e. Soybean) genLen = 1.15 #Gb # Effective population size (Ne) Ne = 50 # Demographic bottlenecks speciesParams = "8E8 -t 4E-7 -r 3.6E-7" # Species history speciesHist = "-eN 0.03 1 -eN 0.05 2 -eN 0.10 4 -eN 0.15 6" ``` **Obs**: `runMacs()` functions offers four target species with their demography already implemented in the package. Other than that, we can use `runMacs2()` function to set our own species characteristics (further discussed). ## Trait characteristics With the founder genomes in perspective, we can proceed to add the characteristics of the target trait or traits to the simulation. **AlphaSimR** has a vignette that we recommend you take a look at [link](https://cran.r-project.org/web/packages/AlphaSimR/vignettes/traits.pdf). It gives, thoroughly, an explanation on the traits of interest and how to interpret its effects. ### Single trait #### **Additive trait** We will start with a trait with only additive effects. For such, we have the following parameters: ```{r} # Global simulation parameters from founder genomes. SP = SimParam$new(founderGenomes) # Additive trait SP$addTraitA(nQtlPerChr = 15, # Number of QTL per chromosome mean = 10, # Trait mean var = 10) # Trait variance # QTL effects (for the traits) SP$traits[[1]]@addEff SP$traits[[1]]@lociLoc ``` After simulating the base genome and to set the trait characteristics, we can create individuals from that base genome for the trait of interest. We will use the function `newPop()`, as follows: ```{r} # Creating individuals/population basePop = newPop(founderGenomes) # Gen param genParam(basePop) # Looking at the population # haplotypes popHaplo = pullSegSiteHaplo(basePop) popHaplo[, 1:10] # Check the genotypes popGeno = pullSegSiteGeno(basePop) popGeno[, 1:10] ``` #### Allele frequency Allele frequency describes the proportion of mutations at a locus and can be estimated from the haplotypes (popHaplo object). ```{r} # Allele frequency alleleFreq = colMeans(popHaplo) plot(alleleFreq) ``` #### Non-additive effects: **Trait with additive and dominance effects** ```{r} # Global simulation parameters from founder genomes. SP = SimParam$new(founderGenomes) # Additive trait SP$addTraitAD(nQtlPerChr = 15, # Number of QTLs controlling the trait per chromosome # nQtlPerChr = c(15, 5), # Alternative: QTL effect by chromosome mean = 10, var = 10, meanDD = 0.1, # dominance degree - it varies from 0-1 (meaning no-dominance and codominance) varDD = 0.2) # Variance for the degree of dominance # QTL effects for additive effects SP$traits[[1]]@addEff # QTL effects for dominance effects SP$traits[[1]]@domEff ``` The implementation of traits in **AlphaSimR** follows a biological model, which is responsible for converting into a genetic value each individual genotype before created. In a straightforward way, the genetic value is used to create the individuals' phenotypes. The biological effects presented in **AlphaSimR** are: **A**: additive effect **D**: dominance effect **G**: genotype by environment interaction effect **E**: environmental effect So, we can create traits with the combinations of those effects (assuming that all of them as, at least, additive) using the **ADGE** framework, as it follows: ```{r, eval = FALSE} # Traits that can be created in AlphaSimR: SP$addTraitA() SP$addTraitAD() SP$addTraitADG() SP$addTraitADEG() SP$addTraitAG() SP$addTraitAE() SP$addTraitAEG() ``` It is important to have in mind that for dominance effects (**D**) we set the mean of the dominance degree (0-1) and variance, whereas for genotype by environment effect (**G**) and environmental effect (**E**) we have just to adjust the variance. ## Multi-trait framework The `AlphaSimR` package allows to simulate a set of traits for the individuals. For such, two strategies may be used. The first strategy is to set all traits at once (using `$addTrait`). In this option, a correlation matrix between each pair of traits, for each effect, should be added, as follows: ```{r} rm(list=ls()) # Founder genome founderGenomes = AlphaSimR::quickHaplo(nInd = 3, # LD, p=q=0.5, haplotypes from 0,1. nChr = 3, segSites = 100) # From the base population SP = SimParam$new(founderGenomes) # Alternative one - Setting the correlation between traits (Additive trait) SP$addTraitA(nQtlPerChr = 20, mean = c(0, 10), var = c(0.5, 2), corA = matrix(c(1.0, 0.5, 0.5, 1.0), nrow = 2)) # QTL effects for trait one SP$traits[[1]]@addEff # QTL effects for trait two SP$traits[[2]]@addEff ``` A second strategy in a multi-trait framework is to add each trait individually. In this case, we do not need to add the information of correlation between them. Another positive aspect of this second strategy is the possibility of setting a different number of QTLs controlling each trait simulated. ```{r} # From the base population SP = SimParam$new(founderGenomes) # Alternative two - One trait at a time # Trait one SP$addTraitA(nQtlPerChr = 20, mean = 0, var = 0.5) # Trait two SP$addTraitA(nQtlPerChr = 5, mean = 100, var = 20) # QTL effects for trait one SP$traits[[1]]@addEff # QTL effects for trait two SP$traits[[2]]@addEff ``` ## Split argument In breeding programs, such as maize, heterosis is harnessed by exploiting the crosses between heterotic groups. Usually, two distinct groups are used (*i*.*e*., maize dent and flint germplasm pools). The genetic divergence of these two pools is explained by their historic geographical separation and adaptation to different environments. It has undergone changes in the population structure and history of the base population. For simulations that involve population separation in terms of generations, the **AlphaSimR** package offers the split argument in the `runMacs()` function. It represents an optional historic population split in terms of generations ago. Below, we will simulate three different base genomes. The first one will use the default value (NULL) for the split argument. The second and third genomes will be split 10 and 50 generations ago, respectively. We will examine the base genome genotypes and explore them through a principal component analysis (PCA). ```{r} rm(list=ls()) #------------------- Founder genome with no split founderGenomes = runMacs(nInd = 200, nChr = 5, segSites = 100, species = "MAIZE", ploidy = 2, split = NULL) # From the base population SP = SimParam$new(founderGenomes) # Creating individuals/population basePop = newPop(founderGenomes) # Check the genotypes popGeno = pullSegSiteGeno(basePop) # PCA pca_noSplit = prcomp(popGeno) #------------------- Founder genome with split equals to 10 founderGenomes = runMacs(nInd = 200, nChr = 5, segSites = 100, species = "MAIZE", ploidy = 2, split = 10) # From the base population SP = SimParam$new(founderGenomes) # Creating individuals/population basePop = newPop(founderGenomes) # Check the genotypes popGeno_10 = pullSegSiteGeno(basePop) # PCA pca_Split10 = prcomp(popGeno_10) #------------------- Founder genome with split equals to 50 founderGenomes = runMacs(nInd = 200, nChr = 5, segSites = 100, species = "MAIZE", ploidy = 2, split = 50) # From the base population SP = SimParam$new(founderGenomes) # Creating individuals/population basePop = newPop(founderGenomes) # Check the genotypes popGeno_50 = pullSegSiteGeno(basePop) # PCA pca_Split50 = prcomp(popGeno_50) ``` Plotting the PCA from each base population created. ```{r, figures-side, fig.show="hold", out.width="50%"} # Plotting the PCA library(ggfortify) # Plot par(mar = c(1, 3, .1, .1)) autoplot(pca_noSplit, data = popGeno) autoplot(pca_Split10, data = popGeno_10) autoplot(pca_Split50, data = popGeno_50) ``` ## References ::: {#refs} :::