AlphaSimR: Base population and traits

```{=html}```
```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Introduction To start a set of simulations in the `AlphaSimR` package [@gaynor2021alphasimr], four steps must be implemented, as follows: 1. Simulate founder genomes/haplotypes. 2. Set global simulation parameters for a target trait/traits. 3. Model the breeding program. 4. Examine results by looking into population individuals' metrics. In this vignette, we will implement and further explore the first two. ## Founder genomes First, we clean our working environment and download/load the AlphaSimR package. ```{r} # install.packages('AlphaSimR') require(AlphaSimR) ``` To create the founder genome we may use the function `runMacs()`. It uses the method from Chen et al. [@chen2009fast] to simulate a set of haplotypes and, from that, a species-specific demography. This allows us to simulate a base genome with characteristics of the target crop that we are working with. ```{r} # Founder genome founderGenomes = runMacs(nInd = 3, # Number of individuals that compose the base population nChr = 3, # Number of chromosome pairs of the target species segSites = 100, # number of segregation sites species = "MAIZE", # We can use the base parameters and we have available in the package MAIZE, WHEAT, CATTLE, and GENERIC. ploidy = 2) # Setting the ploidy # Object created founderGenomes # Genetic map of the segregation sites founderGenomes@genMap ``` When we choose the species in the argument *species*, four parameters are automatically adjusted: ```{r, eval=FALSE} # Genome length (i.e. Maize) genLen = 1.43 # Effective population size (Ne) Ne = 50 # Demographic bottlenecks speciesParams = "8E8 -t 4E-7 -r 3.6E-7" # Species history speciesHist = "-eN 0.03 1 -eN 0.05 2 -eN 0.10 4 -eN 0.15 6" ``` **Obs**: `runMacs()` functions offers four target species with their demography already implemented in the package. Other than that, we can use `runMacs2()` function to set our own species characteristics (further discussed). ## Traits With the founder genomes in perspective, we can proceed to add the characteristics of the target trait or traits to the simulation. **AlphaSimR** has a vignette that we recommend you take a look at [link](https://cran.r-project.org/web/packages/AlphaSimR/vignettes/traits.pdf). It gives, thoroughly, an explanation on the traits of interest and how to interpret its effects. ### Single trait #### **Additive trait** We will start with a trait with only additive effects. For such, we have the following parameters: ```{r} # Global simulation parameters from founder genomes. SP = SimParam$new(founderGenomes) # Additive trait SP$addTraitA(nQtlPerChr = 15, # Number of QTL per chromosome mean = 10, # Trait mean var = 10) # Trait variance # QTL effects (for the traits) SP$traits[[1]]@addEff ``` After simulating the base genome and to set the trait characteristics, we can create individuals from that base genome for the trait of interest. We will use the function `newPop()`, as follows: ```{r} # Creating individuals/population basePop = newPop(founderGenomes) # Gen param genParam(basePop) # Looking at the population # haplotypes popHaplo = pullSegSiteHaplo(basePop) popHaplo[, 1:10] # Check the genotypes popGeno = pullSegSiteGeno(basePop) popGeno[, 1:10] ``` #### Allele frequency Allele frequency describes the proportion of mutations at a locus and can be estimated from the haplotypes (popHaplo object). ```{r} # Allele frequency alleleFreq = colMeans(popHaplo) plot(alleleFreq) ``` #### **Trait with additive and dominance effects** ```{r} # Global simulation parameters from founder genomes. SP = SimParam$new(founderGenomes) # Additive trait SP$addTraitAD(nQtlPerChr = 15, # Number of QTLs controlling the trait per chromosome # nQtlPerChr = c(15, 5), # Alternative: QTL effect by chromosome mean = 10, var = 10, meanDD = 0.1, # dominance degree - it varies from 0-1 (meaning no-dominance and codominance) varDD = 0.2) # Variance for the degree of dominance # QTL effects for additive effects SP$traits[[1]]@addEff # QTL effects for dominance effects SP$traits[[1]]@domEff ``` The implementation of traits in **AlphaSimR** follows a biological model, which is responsible for converting into a genetic value each individual genotype before created. In a straightforward way, the genetic value is used to create the individuals' phenotypes. The biological effects presented in **AlphaSimR** are: **A**: additive effect **D**: dominance effect **G**: genotype by environment interaction effect **E**: epistasis effect So, we can create traits with the combinations of those effects (assuming that all of them as, at least, additive) using the **ADGE** framework, as it follows: ```{r, eval = FALSE} # Traits that can be created in AlphaSimR: SP$addTraitA() SP$addTraitAD() SP$addTraitADG() SP$addTraitADEG() SP$addTraitAG() SP$addTraitAE() SP$addTraitAEG() ``` It is important to have in mind that for dominance effects (**D**) we set the mean of the dominance degree (0-1) and variance, whereas for genotype by environment effect (**G**) and epistasis effect (**E**) we have just to adjust the variance. ## Importing external Data The **AlphaSimR** package allows you to import genomic data and requires a genetic map and phased genotypes. First, we load the genetic map in the format 'Marker name, Chromosome, Position (in Morgans)'. In the example below, we have a diploid species with 20 individuals, 10 chromosomes, each with 10 markers, and these markers are equally spaced along 1 Morgan chromosome. ```{r} # Clean the directory rm(list=ls()) # Load the External Data Set and Change the Directory load("External_Data.RDATA") # Genetic map head(genMap) dim(genMap) # nrows: Number of markers ``` If we have phased haplotypes, the file needs to be 40 rows (the number of individuals times the ploidy level), with each individual having 100 loci based on the map above. ```{r} # Haplotype file haplo[1:6,1:6] dim(haplo) # nrows: Number of individuals multiplied by ploidy level, and ncols: number of markers ``` If we don't have phased haplotypes, we can simulate the haplotype phases multiple times. ```{r} # Loading a function for later sim_haplo_phase <- function(M, ploidy) { haplo <- NULL for (i in 1:nrow(M)) { b <- NULL for (j in 1:ncol(M)) { a <- rep(0, ploidy) if (M[i, j] > 0) { ones_count <- min(M[i, j], ploidy) a[1:ones_count] <- 1 } b <- cbind(b, a) } haplo <- rbind(haplo, b) colnames(haplo)=colnames(M) } return(haplo) } ``` ```{r} # SNP matrix snp[1:6,1:6] dim(snp) # nrows: Number of individuals and ncols: number of markers # Haplotype file simulated haplo=sim_haplo_phase(M=snp, ploidy=2) haplo[1:6,1:6] ``` We can also import the pedigree file. ```{r} # Pedigree file head(ped) ``` To create the founder genome using the imported data set, we can utilize the `importHaplo()` function. ```{r} # Create the founder population founderPop = importHaplo(haplo = haplo, genMap = genMap, ploidy = 2, ped = ped) founderPop # Set simulation parameters # Initialize parameters with founder haplotypes SP = SimParam$new(founderPop) ``` We can also load our own QTL effects or simulated traits using our imported genomic data set. ```{r} # Load our own QTL effects qtlEffects = data.frame(marker = c("M1", "M11","M80"), aditiveEffect = c(1, -1, 1), domEffect = c(0.3, 0, -0.3)) # Import in SimParam SP$importTrait(markerNames = qtlEffects$marker, addEff = qtlEffects$aditiveEffect, intercept = 10, domEff = qtlEffects$domEffect, name = "Trait1") # Create a population from the founder haplotypes pop = newPop(founderPop) # The population now works like any other AlphaSimR population gv(pop)[1:6] ``` ## Multi-trait framework The `AlphaSimR` package allows to simulate a set of traits for the individuals. For such, two strategies may be used. The first strategy is to set all traits at once (using `$addTrait`). In this option, a correlation matrix between each pair of traits, for each effect, should be added, as follows: ```{r} rm(list=ls()) # Founder genome founderGenomes = runMacs(nInd = 3, nChr = 3, segSites = 100, species = "MAIZE", ploidy = 2) # From the base population SP = SimParam$new(founderGenomes) # Alternative one - Setting the correlation between traits (Additive trait) SP$addTraitA(nQtlPerChr = 20, mean = c(0, 10), var = c(0.5, 2), corA = matrix(c(1.0, 0.5, 0.5, 1.0), nrow = 2)) # QTL effects for trait one SP$traits[[1]]@addEff # QTL effects for trait two SP$traits[[2]]@addEff ``` A second strategy in a multi-trait framework is to add each trait individually. In this case, we do not need to add the information of correlation between them. Another positive aspect of this second strategy is the possibility of setting a different number of QTLs controlling each trait simulated. ```{r} # From the base population SP = SimParam$new(founderGenomes) # Alternative two - One trait at a time # Trait one SP$addTraitA(nQtlPerChr = 20, mean = 0, var = 0.5) # Trait two SP$addTraitA(nQtlPerChr = 5, mean = 100, var = 20) # QTL effects for trait one SP$traits[[1]]@addEff # QTL effects for trait two SP$traits[[2]]@addEff ``` ## Split argument In breeding programs, such as maize, heterosis is harnessed by exploiting the crosses between heterotic groups. Usually, two distinct groups are used (*i*.*e*., maize dent and flint germplasm pools). The genetic divergence of these two pools is explained by their historic geographical separation and adaptation to different environments. It has undergone changes in the population structure and history of the base population. For simulations that involve population separation in terms of generations, the **AlphaSimR** package offers the split argument in the `runMacs()` function. It represents an optional historic population split in terms of generations ago. Below, we will simulate three different base genomes. The first one will use the default value (NULL) for the split argument. The second and third genomes will be split 10 and 50 generations ago, respectively. We will examine the base genome genotypes and explore them through a principal component analysis (PCA). ```{r} rm(list=ls()) #------------------- Founder genome with no split founderGenomes = runMacs(nInd = 200, nChr = 5, segSites = 100, species = "MAIZE", ploidy = 2, split = NULL) # From the base population SP = SimParam$new(founderGenomes) # Creating individuals/population basePop = newPop(founderGenomes) # Check the genotypes popGeno = pullSegSiteGeno(basePop) # PCA pca_noSplit = prcomp(popGeno) #------------------- Founder genome with split equals to 10 founderGenomes = runMacs(nInd = 200, nChr = 5, segSites = 100, species = "MAIZE", ploidy = 2, split = 10) # From the base population SP = SimParam$new(founderGenomes) # Creating individuals/population basePop = newPop(founderGenomes) # Check the genotypes popGeno_10 = pullSegSiteGeno(basePop) # PCA pca_Split10 = prcomp(popGeno_10) #------------------- Founder genome with split equals to 50 founderGenomes = runMacs(nInd = 200, nChr = 5, segSites = 100, species = "MAIZE", ploidy = 2, split = 50) # From the base population SP = SimParam$new(founderGenomes) # Creating individuals/population basePop = newPop(founderGenomes) # Check the genotypes popGeno_50 = pullSegSiteGeno(basePop) # PCA pca_Split50 = prcomp(popGeno_50) ``` Plotting the PCA from each base population created. ```{r, figures-side, fig.show="hold", out.width="50%"} # Plotting the PCA library(ggfortify) # Plot par(mar = c(1, 3, .1, .1)) autoplot(pca_noSplit, data = popGeno) autoplot(pca_Split10, data = popGeno_10) autoplot(pca_Split50, data = popGeno_50) ``` ## References ::: {#refs} :::

Marco Antonio Peixoto