|
Genomic selection, cross prediction and optimization via simulations
In this vignette, we will present how to use AlphaSimR objects to run genomic selection (GS) and how to guide crosses using SimpleMating package (Peixoto et al. 2025).
Genomic selection is a method that leverages both phenotype and genome data from historical individuals to develop a predictive model. In this way, from the populations created in AlphaSimR we can extract markers and phenotypes to build genomic models. Hence, he can use those models to predict new individuals with only genotypic data.
The first implementation will use one additive trait. Let’s create
the founder genome and also a base population and assume the trait
characteristics. In addition to the argument that we have been
describing, we need to use the function addSnpChip to add
the SNPs for each individual.
# Founder haplotypes
founderPop = runMacs(nInd=1000, nChr=10, segSites=600)
# trait parameters
SP = SimParam$
new(founderPop)$
addTraitAD(100, mean = 10, var = 10,
meanDD=0.5, # degree of dominance
varDD = 0.2)$ #variance for the degree of dominance
addSnpChip(500) # Number of SNPs per pair of chromosome
# Split the founders into two population
pop1 = newPop(founderPop[1:500])
pop2 = newPop(founderPop[501:1000])
#SNPs
SNPop = pullSnpGeno(pop1)
SNPop[1:10,1:10]## 1_1 1_3 1_4 1_5 1_6 1_7 1_8 1_9 1_10 1_12
## 1 0 0 2 0 0 0 2 2 0 0
## 2 0 0 1 1 0 1 2 2 0 0
## 3 0 0 1 1 0 1 2 2 0 0
## 4 0 0 2 0 0 0 1 1 1 0
## 5 0 0 2 0 0 1 2 2 0 0
## 6 0 0 2 0 0 0 2 2 0 0
## 7 0 0 1 1 0 1 2 2 0 0
## 8 0 0 1 1 1 1 2 2 0 0
## 9 0 0 1 1 0 1 2 2 0 0
## 10 0 0 2 0 0 0 1 1 1 0
Now, we can create a large F1 population derived from the initial
individuals and fit a genomic selection model using the function
RRBLUP() from AlphaSimR. We may use the
phenotypes assumed out of setPheno() function and, together
with the marker’s information, create a model that will give us the
marker’s effects, as follows:
Now that we have a calibration, we can deploy genomic selection by using the marker effects to predict, together with the genotypes of the target population, the estimated breeding value of the individuals.
## est_GV_Trait1
## [1,] 10.406506
## [2,] 4.031753
## [3,] 7.222728
## [4,] 4.219295
## [5,] 11.270597
## [6,] 11.147654
We can measure how good was the prediction by comparing the actual true genetic value (gv) with the prediction (ebv) for each individual.
## est_GV_Trait1
## Trait1 0.8735137
## est_GV_Trait1
## Trait1 0.5381343
## est_GV_Trait1
## Trait1 0.5776633
In this part of the vignette, we will implement the SimpleMating algorithm to predict cross-performance and to generate a mating plan. There are two modules in SimpleMating: one to predict the cross performance of a group of individuals that are candidate to be parents, and a second module (selectCrosses() function) to optimize and generate the mating plan.
First, lets start by creating the crosses and estimating their performance based on additive values (aka. mid-parental value), following the equation: The mid-parental value (MPV) represents the average of the two parental estimated breeding values:
\[ MPV = \frac{ebv_{P_1} + ebv_{P_2}}{2} \]
where:
# Creating DH from pop3
pop3subset <- selectInd(pop3, 250, use = "rand")
pop3DH <- makeDH(pop3subset)
pop3DH <- setEBV(pop3DH, gsModel)
# Creating a potential crosses
PlanCross <- SimpleMating::planCross(TargetPop = pop3DH@id, MateDesign = 'half')## Number of crosses generated: 31125
# Markers and relationship matrix
Criterion = data.frame(ID = pop3DH@id,
Crit = pop3DH@ebv)
Markers <- pullSnpGeno(pop3DH) # Markers
relMat = AGHmatrix::Gmatrix(Markers)## Initial data:
## Number of Individuals: 250
## Number of Markers: 5000
##
## Missing data check:
## Total SNPs: 5000
## 0 SNPs dropped due to missing data threshold of 0.5
## Total of: 5000 SNPs
##
## MAF check:
## No SNPs with MAF below 0
##
## Heterozigosity data check:
## No SNPs with heterozygosity, missing threshold of = 0
##
## Summary check:
## Initial: 5000 SNPs
## Final: 5000 SNPs ( 0 SNPs removed)
##
## Completed! Time = 0.464 seconds
# Mid-parental value
MPV_pop3 <- SimpleMating::getMPV(MatePlan = PlanCross,
Criterion = Criterion,
K = relMat)## 31125 possible crosses were predicted
## Parent1 Parent2 Y K
## 1 2181 2207 18.92357 0.28787651
## 2 2179 2207 18.70123 0.10337293
## 3 2179 2181 18.49930 0.05266872
## 4 2005 2207 18.08370 0.13522051
## 5 2129 2207 17.89560 0.12067362
## 6 2081 2207 17.89084 0.08625748
The second step is to use the selectCrosses function to optimize and generate a mating plan. The main arguments are: n.cross: number of crosses in the mating plan. max.cross: maximum number of crosses that a parent can be part of. culling.pairwise.k: threshold coming from the covariances of the relationship matrix. In this case, the the covariance is equal to half of the coancestry of one generation ($ = _t $.) is the inbreeding of the next generation (Kinghorn 1999).
# Optimization
MatingPLan <- selectCrosses(data = MPV_pop3,
n.cross = 20,
max.cross = 4,
culling.pairwise.k = 0 )
# stats
MatingPLan[[1]]## culling.pairwise.k target.Y target.K
## 1 0 16.56294 -0.06618365
## Parent1 Parent2 Y K
## 1 2081 2181 17.68891 -0.109545263
## 2 2005 2179 17.65942 -0.103074486
## 3 2002 2207 17.50546 -0.019398944
## 4 2030 2181 17.35036 -0.028092507
## 5 2145 2207 17.21703 -0.106803827
## 6 2057 2207 17.03654 -0.056914631
## 7 2068 2179 17.02051 -0.027055207
## 8 2145 2181 17.01510 -0.055630355
## 9 2005 2129 16.85379 -0.042552964
## 10 2005 2081 16.84903 -0.086230708
## 11 2145 2179 16.79275 -0.036378559
## 12 2170 2181 16.77582 -0.100160167
## 13 2081 2129 16.66094 -0.011248729
## 14 2002 2081 16.27080 -0.172931708
## 15 2068 2129 16.21487 -0.028277739
## 16 2002 2068 15.82474 -0.032513381
## 17 2030 2192 15.41909 -0.000628752
## 18 2057 2068 15.35582 -0.174993960
## 19 2057 2192 14.90334 -0.049283066
## 20 2170 2192 14.84455 -0.081958020
In this second example, we will explore the concept of total genetic value, which combines the additive and dominance effects for estimating the performance of the F1 progeny. The equation is as follows (Falconer 1996): The total genetic value (\(\mu_{gv}\)) can be expressed as:
\[ \mu_{gv} = \sum_{i=1}^{n} \left[ a_i (p_i - q_i - y_i) + d_i \left( 2 p_i q_i + y_i (p_i - q_i) \right) \right] \]
where:
So, we can estimate the total-genetic value using the function getTGV() from SimpleMaing package.
# Working on pop 3
pop2subset <- selectInd(pop2, 250, use = "rand")
pop2subset = setPheno(pop2subset, h2 = 0.6)
# Creating a potential crosses
PlanCross <- SimpleMating::planCross(TargetPop = pop2subset@id, MateDesign = 'half')## Number of crosses generated: 31125
# Markers, marker effects and relationship matrix
Markers <- pullSnpGeno(pop2subset) # Markers
relMat = AGHmatrix::Gmatrix(Markers) # relationship matrix## Initial data:
## Number of Individuals: 250
## Number of Markers: 5000
##
## Missing data check:
## Total SNPs: 5000
## 0 SNPs dropped due to missing data threshold of 0.5
## Total of: 5000 SNPs
##
## MAF check:
## No SNPs with MAF below 0
##
## Heterozigosity data check:
## No SNPs with heterozygosity, missing threshold of = 0
##
## Summary check:
## Initial: 5000 SNPs
## Final: 5000 SNPs ( 0 SNPs removed)
##
## Completed! Time = 0.438 seconds
ans <- RRBLUP_D(pop2subset) # For markers effects
add_eff <- methods::slot(ans@gv[[1]], "addEff")
dom_eff <- methods::slot(ans@gv[[1]], "domEff")
# Total genetic value
tgvPlan <- SimpleMating::getTGV(MatePlan = PlanCross,
addEff = add_eff, # additive effects
domEff = dom_eff, # dominance effects
Markers = Markers,
K = relMat)## TGVs predicted for 31125 crosses.
## Parent1 Parent2 Y K
## 1 1000 502 14.96727 -0.001671832
## 2 1000 504 12.77137 0.024823967
## 3 1000 527 14.09503 0.033254726
## 4 1000 533 13.79368 -0.001607685
## 5 1000 535 14.35836 -0.056826102
## 6 1000 538 15.03158 -0.043792026
Using the same assumptions as the first example, we can implement the optimization to get the mating plan.
# Optimization
MatingPLan <- selectCrosses(data = tgvPlan,
n.cross = 20,
max.cross = 4,
culling.pairwise.k = 0 )
# stats
MatingPLan[[1]]## culling.pairwise.k target.Y target.K
## 1 0 18.45314 -0.0575053
## Parent1 Parent2 Y K
## 1 953 706 19.41335 -0.058383960
## 2 953 525 19.38128 -0.061160612
## 3 872 963 19.37374 -0.053722605
## 4 647 714 19.14585 -0.088621672
## 5 953 740 18.91421 -0.037808020
## 6 950 963 18.90961 -0.069481405
## 7 647 843 18.68552 -0.015555093
## 8 872 525 18.63081 -0.012644037
## 9 950 706 18.60027 -0.054645101
## 10 740 714 18.53050 -0.060222843
## 11 872 843 18.42222 -0.038519747
## 12 725 525 18.34625 -0.041271963
## 13 647 710 18.26297 -0.042047837
## 14 725 524 18.15843 -0.110272838
## 15 725 570 17.97438 -0.131988152
## 16 725 788 17.72841 -0.044592338
## 17 886 769 17.67050 -0.123865910
## 18 740 570 17.65761 -0.085597595
## 19 561 740 17.64305 -0.009494721
## 20 561 524 17.61375 -0.010209503
University of Florida, mresende@ufl.edu↩︎
University of Florida, deamorimpeixotom@ufl.edu↩︎