Introduction

Genomic selection is a method that leverages both phenotype and genome data from historical individuals to develop a predictive model. In this way, from a training population we can calibrate a genomic model with markers and phenotypes. Hence, he can use those models to predict new individuals with only genotypic data.

In this vignette, we will give you a simple example on how to implement a genomic selection model and how to use it in Mate Allocation.To predict individual performance we will use bWGR package (Xavier 2019) and to guide cross performance and optimization we will use the SimpleMating package (Peixoto et al. 2025).

Loading the packages

#install.packages("bWGR")
library(bWGR)
#devtools::install_github("Resende-Lab/SimpleMating")
library(SimpleMating)
#install.packages("AGHmatrix")
library(AGHmatrix)

Genomic selection model

Loading the data

The following data set came from a diploid maize population and it accounts for 100 individuals with genotypes (coded as 0,1,2 for aa, Aa/aA, and AA) and phenotypes for one trait assessed in one environment. First, lets download and load the data and check it.

Step 1: Load the data from this repository

# Download the data from here
url <- "https://raw.githubusercontent.com/marcopxt/marcopxt.github.io/master/talks_teach/Lectures/2025/UniArk/dataGS.RData"

dest <- "dataGS.RData"

download.file(url, destfile = dest, mode = "wb")

Step 2: Load the data from your computer (or if it is downloaded, you can only load it)

# Load the dataset
load("dataGS.RData", verbose = TRUE)
## Loading objects:
##   Pheno
##   Geno
# Phenotypes
head(Pheno)
##   Env Genotype   Trait1
## 1   1    26662 179.5075
## 2   1    30678 161.2597
## 3   1    30129 164.2701
## 4   1    28907 156.8915
## 5   1    30522 159.7874
## 6   1    26995 165.1254
# Genotypes
table(Geno)
## Geno
##      0      1      2 
## 790265 221892 487843
Geno[1:10,1:10]
##       SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10
## 26662    2    0    2    0    2    2    0    0    0     0
## 30678    0    0    2    0    2    0    0    0    0     2
## 30129    2    0    2    0    2    2    0    0    0     0
## 28907    2    0    2    0    2    2    0    0    0     0
## 30522    2    0    2    0    2    2    0    0    0     0
## 26995    0    0    2    0    0    0    0    2    0     0
## 28468    0    0    2    0    2    0    0    2    0     0
## 31681    0    0    2    0    2    0    0    0    0     2
## 26661    1    0    2    0    2    1    0    1    0     0
## 30398    0    0    2    0    2    0    0    0    0     2

Using genomic model to prediction individuals performance

For the prediction of individuals performance, we can use the bWGR function wgr() for solving it as a marker model. As an outcome, we going to see marker prediction and individuals prediction (the package does it internally).

# Model for markers
Model = wgr(y = Pheno$Trait1, # Phenotypes
          X = as.matrix(Geno)) # Genotypes

# Prediction to all individuals
plot(Model$hat, Pheno$Trait1)

# Preparing the outcomes
parents <- data.frame(id = rownames(Geno),
                      blup = Model$hat)


Using SimpleMating for cross prediction and optimization

In this part of the vignette, we will implement the SimpleMating algorithm to predict cross-performance and to generate a mating plan. There are two modules in SimpleMating: one to predict the cross performance of a group of individuals that are candidate to be parents, and a second module (selectCrosses() function) to optimize and generate the mating plan.

Using the mid-parental value (capturing only additivity)

First, let’s begin by generating the crosses and estimating their performance based on additive values. The mid-parental value (MPV) represents the average of the two parents’ estimated breeding values.

\[ MPV = \frac{ebv_{P_1} + ebv_{P_2}}{2} \]

where:

  • \(ebv_{P_1}\): estimated breeding value of Parent 1
  • \(ebv_{P_2}\): estimated breeding value of Parent 2
# Creating a potential crosses
PlanCross <- SimpleMating::planCross(TargetPop = parents$id, MateDesign = 'half')
## Number of crosses generated: 124750
# Relationship matrix
relMat = AGHmatrix::Gmatrix(Geno)
## Initial data: 
##  Number of Individuals: 500 
##  Number of Markers: 3000 
## 
## Missing data check: 
##  Total SNPs: 3000 
##   0 SNPs dropped due to missing data threshold of 0.5 
##  Total of: 3000  SNPs 
## 
## MAF check: 
##  No SNPs with MAF below 0 
## 
## Heterozigosity data check: 
##  No SNPs with heterozygosity, missing threshold of =  0 
## 
## Summary check: 
##  Initial:  3000 SNPs 
##  Final:  3000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 0.592  seconds
# Mid-parental value
MPV_pop <- SimpleMating::getMPV(MatePlan = PlanCross,
                                 Criterion = parents,
                                 K = relMat)
## 124750 possible crosses were predicted
head(MPV_pop)
##   Parent1 Parent2        Y           K
## 1   26662   26632 167.3148  0.22279601
## 2   26662   30133 166.9552 -0.06087269
## 3   26662   26634 166.6616  0.20220529
## 4   26632   30133 166.4763  0.15235285
## 5   26662   26661 166.4238  0.93582357
## 6   26662   30546 166.3924 -0.09635004

Optimization algorithm to generate the Mating Plan

The second step is to use the selectCrosses function to optimize and generate a mating plan. The main arguments are: n.cross: number of crosses in the mating plan. max.cross: maximum number of crosses that a parent can be part of. culling.pairwise.k: threshold coming from the covariances of the relationship matrix. In this case, the the covariance is equal to half of the coancestry of one generation ($ = _t $.) is the inbreeding of the next generation (Kinghorn 1999).

# Optimization
MatingPLan <- SimpleMating::selectCrosses(data = MPV_pop,
                            n.cross = 20, 
                            max.cross = 4, 
                            culling.pairwise.k = 0 )
# stats
MatingPLan[[1]]
##   culling.pairwise.k target.Y   target.K
## 1                  0 164.8998 -0.0965285
# Mating Plan
MatingPLan[[2]]
##    Parent1 Parent2        Y           K
## 1    26662   30133 166.9552 -0.06087269
## 2    26662   30546 166.3924 -0.09635004
## 3    26662   30547 166.0928 -0.10064006
## 4    26632   30546 165.9135 -0.16503354
## 5    26632   30547 165.6139 -0.16022972
## 6    26661   30133 165.5852 -0.09060955
## 7    26670   30133 165.2910 -0.01256394
## 8    26672   30133 165.2722 -0.09474270
## 9    26661   30546 165.0224 -0.08516461
## 10   26632   27955 165.0073 -0.09661603
## 11   26632   30666 164.8243 -0.14568185
## 12   26661   30547 164.7229 -0.05194254
## 13   26661   30131 164.4443 -0.10410254
## 14   27930   26996 164.2792 -0.05900846
## 15   27930   26635 164.2707 -0.06474894
## 16   26672   30131 164.1313 -0.05935630
## 17   26672   27955 163.8032 -0.10727174
## 18   26670   30666 163.6391 -0.14804625
## 19   26672   30666 163.6203 -0.14724372
## 20   30398   26996 163.1140 -0.08034488
# Plot
MatingPLan[[3]]

References

Kinghorn, Brian. 1999. “19. Mate Selection for the Tactical Implementation of Breeding Programs.” Proc Adv Anim Breed Genet 13: 130–33.
Peixoto, Marco Antônio, Rodrigo Rampazo Amadeu, Leonardo Lopes Bhering, Luı́s Felipe V Ferrão, Patrı́cio R Munoz, and Márcio FR Resende Jr. 2025. “SimpleMating: R-Package for Prediction and Optimization of Breeding Crosses Using Genomic Selection.” The Plant Genome 18 (1): e20533.
Xavier, Alencar. 2019. “Efficient Estimation of Marker Effects in Plant Breeding.” G3: Genes, Genomes, Genetics 9 (11): 3855–66.

  1. University of Florida, ↩︎