Genomic selection is a method that leverages both phenotype and genome data from historical individuals to develop a predictive model. In this way, from a training population we can calibrate a genomic model with markers and phenotypes. Hence, he can use those models to predict new individuals with only genotypic data.
In this vignette, we will give you a simple example on how to implement a genomic selection model and how to use it in Mate Allocation.To predict individual performance we will use bWGR package (Xavier 2019) and to guide cross performance and optimization we will use the SimpleMating package (Peixoto et al. 2025).
The following data set came from a diploid maize population and it accounts for 100 individuals with genotypes (coded as 0,1,2 for aa, Aa/aA, and AA) and phenotypes for one trait assessed in one environment. First, lets download and load the data and check it.
Step 1: Load the data from this repository
# Download the data from here
url <- "https://raw.githubusercontent.com/marcopxt/marcopxt.github.io/master/talks_teach/Lectures/2025/UniArk/dataGS.RData"
dest <- "dataGS.RData"
download.file(url, destfile = dest, mode = "wb")Step 2: Load the data from your computer (or if it is downloaded, you can only load it)
## Loading objects:
## Pheno
## Geno
## Env Genotype Trait1
## 1 1 26662 179.5075
## 2 1 30678 161.2597
## 3 1 30129 164.2701
## 4 1 28907 156.8915
## 5 1 30522 159.7874
## 6 1 26995 165.1254
## Geno
## 0 1 2
## 790265 221892 487843
## SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10
## 26662 2 0 2 0 2 2 0 0 0 0
## 30678 0 0 2 0 2 0 0 0 0 2
## 30129 2 0 2 0 2 2 0 0 0 0
## 28907 2 0 2 0 2 2 0 0 0 0
## 30522 2 0 2 0 2 2 0 0 0 0
## 26995 0 0 2 0 0 0 0 2 0 0
## 28468 0 0 2 0 2 0 0 2 0 0
## 31681 0 0 2 0 2 0 0 0 0 2
## 26661 1 0 2 0 2 1 0 1 0 0
## 30398 0 0 2 0 2 0 0 0 0 2
For the prediction of individuals performance, we can use the bWGR function wgr() for solving it as a marker model. As an outcome, we going to see marker prediction and individuals prediction (the package does it internally).
# Model for markers
Model = wgr(y = Pheno$Trait1, # Phenotypes
X = as.matrix(Geno)) # Genotypes
# Prediction to all individuals
plot(Model$hat, Pheno$Trait1)In this part of the vignette, we will implement the SimpleMating algorithm to predict cross-performance and to generate a mating plan. There are two modules in SimpleMating: one to predict the cross performance of a group of individuals that are candidate to be parents, and a second module (selectCrosses() function) to optimize and generate the mating plan.
First, let’s begin by generating the crosses and estimating their performance based on additive values. The mid-parental value (MPV) represents the average of the two parents’ estimated breeding values.
\[ MPV = \frac{ebv_{P_1} + ebv_{P_2}}{2} \]
where:
# Creating a potential crosses
PlanCross <- SimpleMating::planCross(TargetPop = parents$id, MateDesign = 'half')## Number of crosses generated: 124750
## Initial data:
## Number of Individuals: 500
## Number of Markers: 3000
##
## Missing data check:
## Total SNPs: 3000
## 0 SNPs dropped due to missing data threshold of 0.5
## Total of: 3000 SNPs
##
## MAF check:
## No SNPs with MAF below 0
##
## Heterozigosity data check:
## No SNPs with heterozygosity, missing threshold of = 0
##
## Summary check:
## Initial: 3000 SNPs
## Final: 3000 SNPs ( 0 SNPs removed)
##
## Completed! Time = 0.592 seconds
# Mid-parental value
MPV_pop <- SimpleMating::getMPV(MatePlan = PlanCross,
Criterion = parents,
K = relMat)## 124750 possible crosses were predicted
## Parent1 Parent2 Y K
## 1 26662 26632 167.3148 0.22279601
## 2 26662 30133 166.9552 -0.06087269
## 3 26662 26634 166.6616 0.20220529
## 4 26632 30133 166.4763 0.15235285
## 5 26662 26661 166.4238 0.93582357
## 6 26662 30546 166.3924 -0.09635004
The second step is to use the selectCrosses function to optimize and generate a mating plan. The main arguments are: n.cross: number of crosses in the mating plan. max.cross: maximum number of crosses that a parent can be part of. culling.pairwise.k: threshold coming from the covariances of the relationship matrix. In this case, the the covariance is equal to half of the coancestry of one generation ($ = _t $.) is the inbreeding of the next generation (Kinghorn 1999).
# Optimization
MatingPLan <- SimpleMating::selectCrosses(data = MPV_pop,
n.cross = 20,
max.cross = 4,
culling.pairwise.k = 0 )
# stats
MatingPLan[[1]]## culling.pairwise.k target.Y target.K
## 1 0 164.8998 -0.0965285
## Parent1 Parent2 Y K
## 1 26662 30133 166.9552 -0.06087269
## 2 26662 30546 166.3924 -0.09635004
## 3 26662 30547 166.0928 -0.10064006
## 4 26632 30546 165.9135 -0.16503354
## 5 26632 30547 165.6139 -0.16022972
## 6 26661 30133 165.5852 -0.09060955
## 7 26670 30133 165.2910 -0.01256394
## 8 26672 30133 165.2722 -0.09474270
## 9 26661 30546 165.0224 -0.08516461
## 10 26632 27955 165.0073 -0.09661603
## 11 26632 30666 164.8243 -0.14568185
## 12 26661 30547 164.7229 -0.05194254
## 13 26661 30131 164.4443 -0.10410254
## 14 27930 26996 164.2792 -0.05900846
## 15 27930 26635 164.2707 -0.06474894
## 16 26672 30131 164.1313 -0.05935630
## 17 26672 27955 163.8032 -0.10727174
## 18 26670 30666 163.6391 -0.14804625
## 19 26672 30666 163.6203 -0.14724372
## 20 30398 26996 163.1140 -0.08034488
University of Florida, deamorimpeixotom@ufl.edu↩︎