Deploying genomic selection

In this vignette, we will present how to use AlphaSimR objects to deploy genomic selection (GS).

Genomic selection is a method that leverages both phenotype and genome data from historical individuals to develop a predictive model. This model is then applied to new individuals, utilizing their genome data alongside the prediction model to forecast their phenotype performance. The foundation of this predictive model lies in the inferred associations between variations in phenotypes and the corresponding variations along the genome. These associations are calculated for numerous positions in the genome, commonly referred to as genomic markers. The prevailing choice for such markers is often bi-allelic Single Nucleotide Polymorphisms (SNPs), a type of genetic variation that AlphaSimR emulates. Once these associations between phenotype and genome are established, it becomes possible to predict the performance of any individual genotyped for the selected set of SNP markers.

One trait with additive effect

The first implementation will use one additive trait. Let’s create the founder genome and also a base population and assume the trait characteristics. In addition to the argument that we have been describing, we need to use the function addSnpChip to add the SNPs for each individual.

library(AlphaSimR)
## Warning: package 'AlphaSimR' was built under R version 4.2.3
## Carregando pacotes exigidos: R6
# Founder haplotypes
founderPop = runMacs(nInd=1000, nChr=10, segSites=600)

# trait parameters
SP = SimParam$
  new(founderPop)$
  addTraitA(100)$
  addSnpChip(500)$ # Number of SNPs per pair of chromosome
  setVarE(h2=0.4)
  
# Split the founders into two population
pop1 = newPop(founderPop[1:500])
pop2 = newPop(founderPop[501:1000])

#SNPs
SNPop = pullSnpGeno(pop1)

SNPop[1:10,1:10]
##    1_1 1_3 1_4 1_7 1_8 1_9 1_10 1_11 1_12 1_13
## 1    0   0   0   0   0   0    2    2    0    0
## 2    0   0   1   1   1   0    1    1    1    0
## 3    0   0   0   0   0   0    2    2    0    0
## 4    0   1   1   0   0   0    1    0    0    2
## 5    0   2   0   0   0   0    2    0    0    2
## 6    0   0   2   2   2   0    0    0    2    0
## 7    0   1   1   0   0   0    1    0    0    1
## 8    0   1   0   0   0   0    2    2    0    0
## 9    0   0   1   1   1   0    0    0    1    0
## 10   0   1   1   0   0   0    0    0    0    1

New population from crosses and training the model

Now, we can create a large F1 population derived from the initial individuals and fit a genomic selection model using the function RRBLUP() from AlphaSimR. We may use the phenotypes assumed out of setPheno() function and, together with the marker’s information, create a model that will give us the marker’s effects, as follows:

# Create a third population derived from the first populations
pop3 = randCross(pop1, nCrosses=1000)
pop1 = setPheno(pop1, h2 = 0.6)

# Train a GS model using the first population
gsModel = RRBLUP(pop1, use = 'pheno') #RRBLUP

Using markers effect to predict the values

Now that we have a calibration, we can deploy genomic selection by using the marker effects to predict, together with the genotypes of the target population, the estimated breeding value of the individuals.

# Set EBVs for all populations using the GS model
pop1 = setEBV(pop1, gsModel)
head(pop1@ebv)
##      est_GV_Trait1
## [1,]     0.1242688
## [2,]    -1.2727601
## [3,]     0.7484424
## [4,]    -1.2779063
## [5,]     0.9605354
## [6,]    -0.2876188
pop2 = setEBV(pop2, gsModel)
pop3 = setEBV(pop3, gsModel)

Measuring the correlation with the genetic value (parametric value of the trait)

We can measure how good was the prediction by comparing the actual true genetic value (gv) with the prediction (ebv) for each individual.

# Measure prediction accuracy
cor(gv(pop1), ebv(pop1)) 
##        est_GV_Trait1
## Trait1     0.8384834
cor(gv(pop2), ebv(pop2)) 
##        est_GV_Trait1
## Trait1     0.6007227
cor(gv(pop3), ebv(pop3))
##        est_GV_Trait1
## Trait1     0.7101821

Practical implementation of a breeding program with genomic selection

In this part of the vignette, we will implement the genomic selection in a Maize pipeline. Two scenarios will be implemented after a fifteen-year Burn-In period. The first scenario will account only for truncated phenotypic selection. In the second scenario, we will implement genomic selection in the pipeline. So, in year 4, the individuals will be selected based on the estimated breeding values that came from a genomic model (ridge regression model). In addition, the parents will be selected based on ‘ebv’.

Package and files for recording the outputs

rm(list=ls())

# Packages
require(AlphaSimR)

# Creating the files to record the results
MeanG_pop = matrix(NA, 35)
MeanA_pop = matrix(NA, 35)
VarG_pop  = matrix(NA, 35)

Create parents and fill the pipeline

# Parameters files
source("aux_files/GlobalParameters.R")

# Create the parents
source("aux_files/Create_parents.R")

# FillPipeline
source("aux_files/FillPipeline.R")

# Environmental covariate
P = runif(burninYears+futureYears)

Burn-In period

for(year in 1:burninYears){
p = P[year]
cat("Working on year:",year,"\n")

source("aux_files/UpdateParents.R")  # Pick new parents based on last year's data
source("aux_files/UpdateTesters.R")  # Pick new testers and hybrid parents
source("aux_files/AdvanceYearPS.R")  # Advances yield trials by a year
source("aux_files/WriteRecordsGS.R") # Write records for GS predictions
source("aux_files/UpdateResults.R")  # Track summary data

}

# Saving the info

save.image("results/BURNIN.RData")

Phenotypic selection program

# 3.0 Loading the scenarios
load("results/BURNIN.RData")


for(year in (burninYears+1):(burninYears+futureYears)){
p = P[year]

# 3.1 Loop
cat("Working on year:",year,"\n")
source("aux_files/UpdateParents.R")  # Pick new parents based on last year's data
source("aux_files/UpdateTesters.R")  # Pick new testers and hybrid parents
source("aux_files/AdvanceYearPS.R")  # Advances yield trials by a year
source("aux_files/UpdateResults.R")  # Track summary data
  
}

# 3.2 Recording results
output1 = data.frame(scenario=rep("PS", 35), # Scenario name
                     MeanG_pop,
                     MeanA_pop,
                     VarG_pop,
                     stringsAsFactors=FALSE)

# 3.3 Saving the results as RDS
saveRDS(output1,"results/Results_PS.rds")

Genomic selection program

# 3.0 Loading the scenarios
load("results/BURNIN.RData")


for(year in (burninYears+1):(burninYears+futureYears)){
p = P[year]

cat("Working on year:",year,"\n")
if(year == (burninYears+1)){
  source('aux_files/fillGS.R')
}
source("aux_files/UpdateParents_GS.R")  # Pick new parents based on last year's data
source("aux_files/UpdateTesters.R")  # Pick new testers and hybrid parents
source("aux_files/AdvanceYearGS.R")  # advance the populations
source("aux_files/WriteRecordsGS.R")  # Track records for GS
source("aux_files/UpdateResults.R")  # Track summary data
  
}

# 3.2 Recording results
output1 = data.frame(scenario=rep("GS", 35), # Scenario name
                     MeanG_pop,
                     MeanA_pop,
                     VarG_pop,
                     stringsAsFactors=FALSE)

# 3.3 Saving the results as RDS
saveRDS(output1,"results/Results_GS.rds")

Plotting the results

# Loading the results
Scenario1 = readRDS("results/Results_PS.rds")
Scenario2 = readRDS("results/Results_GS.rds")


# Plot hybrid mean of genetic values over time
meanRanges = range(c(Scenario1$MeanG_pop[15:35], Scenario2$MeanG_pop[15:35]))
plot(x = 15:35, y = Scenario1$MeanG_pop[15:35], type = "l", col = "black", lwd = 3,
     xlab = "Year", ylab = "Mean of genetic values", ylim = meanRanges)
lines(x = 15:35, y = Scenario2$MeanG_pop[15:35], type = "l", col = "blue", lwd = 3)
legend(x = "topleft", legend = c('Pheno', 'GS'), title = "Scenarios",
       lwd = 3, lty = c(1, 1), col = c("black", "blue"), bty = "n")

References


  1. University of Florida, ↩︎

  2. University of Florida, ↩︎