Deploying Genomic selection and Cross prediction and Optimization

In this vignette, we will present how to use AlphaSimR objects to run genomic selection (GS) and how to guide crosses using SimpleMating package (Peixoto et al. 2025).

Genomic selection is a method that leverages both phenotype and genome data from historical individuals to develop a predictive model. In this way, from the populations created in AlphaSimR we can extract markers and phenotypes to build genomic models. Hence, he can use those models to predict new individuals with only genotypic data.

Loading the packages

library(AlphaSimR)
## Loading required package: R6
library(devtools)
## Loading required package: usethis
#devtools::install_github("Resende-Lab/SimpleMating")
library(SimpleMating)

Genomic selection in AlphaSimR

One trait with additive and dominance effects

The first implementation will use one additive trait. Let’s create the founder genome and also a base population and assume the trait characteristics. In addition to the argument that we have been describing, we need to use the function addSnpChip to add the SNPs for each individual.

# Founder haplotypes
founderPop = runMacs(nInd=1000, nChr=10, segSites=600)

# trait parameters
SP = SimParam$
  new(founderPop)$
  addTraitAD(100, mean = 10, var = 10, 
             meanDD=0.5, # degree of dominance
             varDD = 0.2)$ #variance for the degree of dominance
  addSnpChip(500) # Number of SNPs per pair of chromosome
  
# Split the founders into two population
pop1 = newPop(founderPop[1:500])
pop2 = newPop(founderPop[501:1000])

#SNPs
SNPop = pullSnpGeno(pop1)

SNPop[1:10,1:10]
##    1_1 1_3 1_4 1_5 1_6 1_7 1_8 1_9 1_10 1_12
## 1    0   0   2   0   0   0   2   2    0    0
## 2    0   0   1   1   0   1   2   2    0    0
## 3    0   0   1   1   0   1   2   2    0    0
## 4    0   0   2   0   0   0   1   1    1    0
## 5    0   0   2   0   0   1   2   2    0    0
## 6    0   0   2   0   0   0   2   2    0    0
## 7    0   0   1   1   0   1   2   2    0    0
## 8    0   0   1   1   1   1   2   2    0    0
## 9    0   0   1   1   0   1   2   2    0    0
## 10   0   0   2   0   0   0   1   1    1    0

New population from crosses and training the model

Now, we can create a large F1 population derived from the initial individuals and fit a genomic selection model using the function RRBLUP() from AlphaSimR. We may use the phenotypes assumed out of setPheno() function and, together with the marker’s information, create a model that will give us the marker’s effects, as follows:

# Create a third population derived from the first populations
pop3 = randCross(pop1, nCrosses=1000)
pop1 = setPheno(pop1, h2 = 0.6)

# Train a GS model using the first population
gsModel = RRBLUP(pop1, use = 'pheno') #RRBLUP

Using markers effect to predict the values

Now that we have a calibration, we can deploy genomic selection by using the marker effects to predict, together with the genotypes of the target population, the estimated breeding value of the individuals.

# Set EBVs for all populations using the GS model
pop1 = setEBV(pop1, gsModel)
head(pop1@ebv)
##      est_GV_Trait1
## [1,]     10.406506
## [2,]      4.031753
## [3,]      7.222728
## [4,]      4.219295
## [5,]     11.270597
## [6,]     11.147654
pop2 = setEBV(pop2, gsModel)
pop3 = setEBV(pop3, gsModel)

Measuring the correlation with the genetic value (parametric value of the trait)

We can measure how good was the prediction by comparing the actual true genetic value (gv) with the prediction (ebv) for each individual.

# Measure prediction accuracy
cor(gv(pop1), ebv(pop1)) 
##        est_GV_Trait1
## Trait1     0.8735137
cor(gv(pop2), ebv(pop2)) 
##        est_GV_Trait1
## Trait1     0.5381343
cor(gv(pop3), ebv(pop3))
##        est_GV_Trait1
## Trait1     0.5776633

Using SimpleMating for cross prediction and optimization

In this part of the vignette, we will implement the SimpleMating algorithm to predict cross-performance and to generate a mating plan. There are two modules in SimpleMating: one to predict the cross performance of a group of individuals that are candidate to be parents, and a second module (selectCrosses() function) to optimize and generate the mating plan.

Example 1: Using the mid-parental value (capturing only additivity)

First, lets start by creating the crosses and estimating their performance based on additive values (aka. mid-parental value), following the equation: The mid-parental value (MPV) represents the average of the two parental estimated breeding values:

\[ MPV = \frac{ebv_{P_1} + ebv_{P_2}}{2} \]

where:

  • \(ebv_{P_1}\): estimated breeding value of Parent 1
  • \(ebv_{P_2}\): estimated breeding value of Parent 2
# Creating DH from pop3
pop3subset <- selectInd(pop3, 250, use = "rand")
pop3DH <- makeDH(pop3subset)
pop3DH <- setEBV(pop3DH, gsModel)

# Creating a potential crosses
PlanCross <- SimpleMating::planCross(TargetPop = pop3DH@id, MateDesign = 'half')
## Number of crosses generated: 31125
# Markers and relationship matrix
Criterion = data.frame(ID = pop3DH@id,
                       Crit = pop3DH@ebv)

Markers <- pullSnpGeno(pop3DH) # Markers

relMat = AGHmatrix::Gmatrix(Markers)
## Initial data: 
##  Number of Individuals: 250 
##  Number of Markers: 5000 
## 
## Missing data check: 
##  Total SNPs: 5000 
##   0 SNPs dropped due to missing data threshold of 0.5 
##  Total of: 5000  SNPs 
## 
## MAF check: 
##  No SNPs with MAF below 0 
## 
## Heterozigosity data check: 
##  No SNPs with heterozygosity, missing threshold of =  0 
## 
## Summary check: 
##  Initial:  5000 SNPs 
##  Final:  5000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 0.464  seconds
# Mid-parental value
MPV_pop3 <- SimpleMating::getMPV(MatePlan = PlanCross,
                                 Criterion = Criterion,
                                 K = relMat)
## 31125 possible crosses were predicted
head(MPV_pop3)
##   Parent1 Parent2        Y          K
## 1    2181    2207 18.92357 0.28787651
## 2    2179    2207 18.70123 0.10337293
## 3    2179    2181 18.49930 0.05266872
## 4    2005    2207 18.08370 0.13522051
## 5    2129    2207 17.89560 0.12067362
## 6    2081    2207 17.89084 0.08625748

The second step is to use the selectCrosses function to optimize and generate a mating plan. The main arguments are: n.cross: number of crosses in the mating plan. max.cross: maximum number of crosses that a parent can be part of. culling.pairwise.k: threshold coming from the covariances of the relationship matrix. In this case, the the covariance is equal to half of the coancestry of one generation ($ = _t $.) is the inbreeding of the next generation (Kinghorn 1999).

# Optimization
MatingPLan <- selectCrosses(data = MPV_pop3,
                            n.cross = 20, 
                            max.cross = 4, 
                            culling.pairwise.k = 0 )
# stats
MatingPLan[[1]]
##   culling.pairwise.k target.Y    target.K
## 1                  0 16.56294 -0.06618365
# Mating Plan
MatingPLan[[2]]
##    Parent1 Parent2        Y            K
## 1     2081    2181 17.68891 -0.109545263
## 2     2005    2179 17.65942 -0.103074486
## 3     2002    2207 17.50546 -0.019398944
## 4     2030    2181 17.35036 -0.028092507
## 5     2145    2207 17.21703 -0.106803827
## 6     2057    2207 17.03654 -0.056914631
## 7     2068    2179 17.02051 -0.027055207
## 8     2145    2181 17.01510 -0.055630355
## 9     2005    2129 16.85379 -0.042552964
## 10    2005    2081 16.84903 -0.086230708
## 11    2145    2179 16.79275 -0.036378559
## 12    2170    2181 16.77582 -0.100160167
## 13    2081    2129 16.66094 -0.011248729
## 14    2002    2081 16.27080 -0.172931708
## 15    2068    2129 16.21487 -0.028277739
## 16    2002    2068 15.82474 -0.032513381
## 17    2030    2192 15.41909 -0.000628752
## 18    2057    2068 15.35582 -0.174993960
## 19    2057    2192 14.90334 -0.049283066
## 20    2170    2192 14.84455 -0.081958020
# Plot
MatingPLan[[3]]

Example 2: Using additive and dominance effects

In this second example, we will explore the concept of total genetic value, which combines the additive and dominance effects for estimating the performance of the F1 progeny. The equation is as follows (Falconer 1996): The total genetic value (\(\mu_{gv}\)) can be expressed as:

\[ \mu_{gv} = \sum_{i=1}^{n} \left[ a_i (p_i - q_i - y_i) + d_i \left( 2 p_i q_i + y_i (p_i - q_i) \right) \right] \]

where:

  • \(a_i\): additive effect at locus i
  • \(d_i\): dominance effect at locus i
  • \(p_i, q_i\): allele frequencies
  • \(y_i\): indicator variable (e.g., allele substitution term)

So, we can estimate the total-genetic value using the function getTGV() from SimpleMaing package.

# Working on pop 3
pop2subset <- selectInd(pop2, 250, use = "rand")
pop2subset = setPheno(pop2subset, h2 = 0.6)

# Creating a potential crosses
PlanCross <- SimpleMating::planCross(TargetPop = pop2subset@id, MateDesign = 'half')
## Number of crosses generated: 31125
# Markers, marker effects and relationship matrix
Markers <- pullSnpGeno(pop2subset) # Markers

relMat = AGHmatrix::Gmatrix(Markers) # relationship matrix
## Initial data: 
##  Number of Individuals: 250 
##  Number of Markers: 5000 
## 
## Missing data check: 
##  Total SNPs: 5000 
##   0 SNPs dropped due to missing data threshold of 0.5 
##  Total of: 5000  SNPs 
## 
## MAF check: 
##  No SNPs with MAF below 0 
## 
## Heterozigosity data check: 
##  No SNPs with heterozygosity, missing threshold of =  0 
## 
## Summary check: 
##  Initial:  5000 SNPs 
##  Final:  5000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 0.438  seconds
ans <- RRBLUP_D(pop2subset) # For markers effects

add_eff <- methods::slot(ans@gv[[1]], "addEff")
dom_eff <- methods::slot(ans@gv[[1]], "domEff")


# Total genetic value
tgvPlan <- SimpleMating::getTGV(MatePlan = PlanCross,
                                 addEff = add_eff, # additive effects
                                 domEff = dom_eff, # dominance effects
                                 Markers = Markers,
                                 K = relMat)
## TGVs predicted for 31125 crosses.
head(tgvPlan)
##   Parent1 Parent2        Y            K
## 1    1000     502 14.96727 -0.001671832
## 2    1000     504 12.77137  0.024823967
## 3    1000     527 14.09503  0.033254726
## 4    1000     533 13.79368 -0.001607685
## 5    1000     535 14.35836 -0.056826102
## 6    1000     538 15.03158 -0.043792026

Using the same assumptions as the first example, we can implement the optimization to get the mating plan.

# Optimization

MatingPLan <- selectCrosses(data = tgvPlan,
                            n.cross = 20, 
                            max.cross = 4, 
                            culling.pairwise.k = 0 )
# stats
MatingPLan[[1]]
##   culling.pairwise.k target.Y   target.K
## 1                  0 18.45314 -0.0575053
# Mating Plan
MatingPLan[[2]]
##    Parent1 Parent2        Y            K
## 1      953     706 19.41335 -0.058383960
## 2      953     525 19.38128 -0.061160612
## 3      872     963 19.37374 -0.053722605
## 4      647     714 19.14585 -0.088621672
## 5      953     740 18.91421 -0.037808020
## 6      950     963 18.90961 -0.069481405
## 7      647     843 18.68552 -0.015555093
## 8      872     525 18.63081 -0.012644037
## 9      950     706 18.60027 -0.054645101
## 10     740     714 18.53050 -0.060222843
## 11     872     843 18.42222 -0.038519747
## 12     725     525 18.34625 -0.041271963
## 13     647     710 18.26297 -0.042047837
## 14     725     524 18.15843 -0.110272838
## 15     725     570 17.97438 -0.131988152
## 16     725     788 17.72841 -0.044592338
## 17     886     769 17.67050 -0.123865910
## 18     740     570 17.65761 -0.085597595
## 19     561     740 17.64305 -0.009494721
## 20     561     524 17.61375 -0.010209503
# Plot
MatingPLan[[3]]

References

Falconer, Douglas Scott. 1996. Introduction to Quantitative Genetics. Pearson Education India.
Kinghorn, Brian. 1999. “19. Mate Selection for the Tactical Implementation of Breeding Programs.” Proc Adv Anim Breed Genet 13: 130–33.
Peixoto, Marco Antônio, Rodrigo Rampazo Amadeu, Leonardo Lopes Bhering, Luı́s Felipe V Ferrão, Patrı́cio R Munoz, and Márcio FR Resende Jr. 2025. “SimpleMating: R-Package for Prediction and Optimization of Breeding Crosses Using Genomic Selection.” The Plant Genome 18 (1): e20533.

  1. University of Florida, ↩︎

  2. University of Florida, ↩︎