1. Rationale

Genomic selection has transformed plant breeding, enabling more precise and efficient use of genetic information. A natural extension of this framework is the prediction and optimization of new crosses — not just selecting which individuals to advance, but also which parental combinations to make.

Mate selection is the process of simultaneously selecting individuals and allocating mating pairs entering a breeding program. Rather than selecting parents and designing crosses as two separate steps, mate selection integrates both decisions.

An extension of optimum contribution selection (OCS) that delivers a practical crossing plan is to jointly optimize contributions and cross allocations (Kinghorn et al., 2009; Woolliams, 2015; Gorjanc et al., 2018). This approach allows breeders to:

The selection criterion and the covariance (kinship) structure among candidates are the two key ingredients in optimal cross selection. In the sections below, we walk through the main criteria used to evaluate and rank potential

First, we will create a population using AlphaSimR package and extract from the population the information needed to implement cross-selection.


2. Building the Base Population

Before evaluating any cross criterion, we first simulate a structured breeding population. The code below produces a doubled-haploid (DH) population as well as a non-DH outcrossing population — the two contexts in which different cross criteria apply.

set.seed(52684)

# Founding genome: 200 individuals, 10 chromosomes, 500 segregating sites
founderGenomes <- runMacs(nInd    = 200,
                          nChr    = 10,
                          segSites = 500)

# Global simulation parameters
SP <- SimParam$new(founderGenomes)

# Add a trait with additive + dominance effects
SP$addTraitAD(nQtlPerChr = 200,
              mean    = 10,
              var     = 10,
              meanDD  = 0.9,
              varDD   = 0.3)
SP$addSnpChip(300)
SP$setVarE(varE = 20)  # h2 ≈ 0.33

# Base population (Hardy–Weinberg equilibrium)
Parents <- newPop(founderGenomes)

# ---- Advance population for 10 cycles (creates structure / LD) ----
ParentsPop <- Parents

for (year in 1:10) {
  set.seed(6985)
  ParentsPop <- setPheno(ParentsPop, h2 = 0.4)
  ParentsPop <- selectCross(pop = ParentsPop, nInd = 20, nCrosses = 25)
}

# ---- DH population (inbred lines; F = 1) ----
ParentsPop_DH <- makeDH(ParentsPop, nDH = 2)
ParentsPop_DH <- setPheno(ParentsPop_DH, h2 = 0.4)

cat("ParentsPop (outcrossing):", ParentsPop@nInd, "individuals\n")
## ParentsPop (outcrossing): 25 individuals
cat("ParentsPop_DH (DH lines):", ParentsPop_DH@nInd, "individuals\n")
## ParentsPop_DH (DH lines): 50 individuals

Note on heritability. With \(\sigma^2_A = 10\) and \(\sigma^2_E = 20\), the narrow-sense heritability is approximately \(h^2 = 10/(10+20) \approx 0.33\) in the base population.


3. Cross Performance Criteria

Choosing a criterion to rank crosses is at the heart of optimal cross selection. Three main criteria are commonly used, each with different assumptions about genetic architecture and the breeding system:

Criterion Abbrev. Genetic Effects Typical Application
Mid-parental value MPV Additive only DH pipelines, small-grain cereals
Total genetic value TGV Additive + Dominance Clonal crops, outcrossing species
Usefulness criterion UC Additive ± Dominance All pipelines; accounts for progeny variance

3.1 Mid-Parental Value (MPV)

For traits with additive genetic effects, the expected mean of a cross is simply the average of the two parental estimated breeding values (EBVs):

\[ MPV_{ij} = \frac{EBV_i + EBV_j}{2} \]

This is the simplest criterion and requires only genomic (or phenotypic) EBVs. It is most appropriate when dominance deviations are negligible (e.g., inbred line × inbred line crosses in a DH pipeline).

# 1. Criterion: genetic values from the DH population
Crit_DH <- data.frame(Id        = ParentsPop_DH@id,
                      Criterion = ParentsPop_DH@gv)

# 2. Genomic relationship matrix (DH lines)
relMat_DH <- AGHmatrix::Gmatrix(pullQtlGeno(ParentsPop_DH))
## Initial data: 
##  Number of Individuals: 50 
##  Number of Markers: 2000 
## 
## Missing data check: 
##  Total SNPs: 2000 
##   0 SNPs dropped due to missing data threshold of 0.5 
##  Total of: 2000  SNPs 
## 
## MAF check: 
##  No SNPs with MAF below 0 
## 
## Heterozigosity data check: 
##  No SNPs with heterozygosity, missing threshold of =  0 
## 
## Summary check: 
##  Initial:  2000 SNPs 
##  Final:  2000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 0.053  seconds
# 3. All pairwise half-diallel mating plan
CrossPlan_DH <- planCross(TargetPop  = ParentsPop_DH@id,
                          MateDesign = "half")
## Number of potential crosses generated: 1225
# 4. Mid-parental value
ST_mpv <- getMPV(MatePlan  = CrossPlan_DH,
                 Criterion = Crit_DH,
                 K         = relMat_DH)
## 1225 possible crosses were predicted
head(ST_mpv, 5)
##   Parent1 Parent2         Y           K
## 1     470     487 -21.09289  0.20184784
## 2     470     488 -21.76975  0.28374243
## 3     487     488 -22.11099  1.17490452
## 4     470     490 -22.29350 -0.14229558
## 5     456     470 -22.36950 -0.05146703
# Relationship between cross mean (Y) and mean kinship (K)
plot(ST_mpv$K, ST_mpv$Y,
     xlab = "Mean kinship (K)",
     ylab = "Mid-parental value",
     main = "MPV vs. Kinship — DH population",
     pch  = 16, col = adjustcolor("blue", 0.5))

Interpretation. The horizontal axis shows the average kinship of each parental pair. Crosses on the left (lower kinship) draw from more diverse parents, while those on the right involve more related individuals. Breeders often seek crosses with high MPV but moderate/lower kinship.


3.2 Total Genetic Value (TGV)

When dominance effects contribute substantially to trait expression — as in clonal crops or outcrossing species — the expected genetic value of the progeny must also account for heterosis. The TGV criterion is:

\[ TGV_{ij} = \mathbf{A}(\mathbf{p}_{ik} - \mathbf{q}_{ik} - \mathbf{y}_k) + \mathbf{D}(2\mathbf{p}_{ik}\mathbf{q}_{ik} + \mathbf{y}_k(\mathbf{p}_{ik} - \mathbf{q}_{ik})) \]

where A and D are vectors of additive and dominance marker effects; \(p_{ik}\) and \(q_{ik}\) are allele frequencies for marker \(k\) in parents \(i\) and \(j\); and \(y_k\) is the difference in allele frequencies between the two parents at marker \(k\).

# 1. Mating plan (outcrossing population — not DH)
CrossPlan_out <- planCross(TargetPop  = ParentsPop@id,
                           MateDesign = "half")
## Number of potential crosses generated: 300
# 2. Relationship matrix
relMat_out <- AGHmatrix::Gmatrix(pullQtlGeno(ParentsPop))
## Initial data: 
##  Number of Individuals: 25 
##  Number of Markers: 2000 
## 
## Missing data check: 
##  Total SNPs: 2000 
##   0 SNPs dropped due to missing data threshold of 0.5 
##  Total of: 2000  SNPs 
## 
## MAF check: 
##  No SNPs with MAF below 0 
## 
## Heterozigosity data check: 
##  No SNPs with heterozygosity, missing threshold of =  0 
## 
## Summary check: 
##  Initial:  2000 SNPs 
##  Final:  2000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 0.035  seconds
# 3. Marker genotypes (unphased)
Markers_out <- pullQtlGeno(ParentsPop)

# 4. Marker effects
add_eff <- SP$traits[[1]]@addEff
dom_eff <- SP$traits[[1]]@domEff

# 5. Total genetic value
ST_tgv <- getTGV(MatePlan = CrossPlan_out,
                 Markers  = Markers_out,
                 addEff   = add_eff,
                 domEff   = dom_eff,
                 ploidy   = 2,
                 K        = relMat_out)
## TGVs predicted for 300 crosses.
head(ST_tgv, 5)
##   Parent1 Parent2        Y           K
## 1     426     427 39.55368  0.10547431
## 2     426     428 40.12604  0.19150780
## 3     426     429 45.26235 -0.20130182
## 4     426     430 41.91924 -0.07864658
## 5     426     431 38.79195  0.13096231
plot(ST_tgv$K, ST_tgv$Y,
     xlab = "Mean kinship (K)",
     ylab = "Total genetic value",
     main = "TGV vs. Kinship — outcrossing population",
     pch  = 16, col = adjustcolor("darkorange", 0.5))


3.3 Usefulness Criterion (UC)

The Usefulness Criterion (Schnell & Utz, 1975) extends the expected cross mean by incorporating the progeny standard deviation (\(\sigma\)). It answers the question: What is the best individual we can realistically select from this cross?

\[ UC_{ij} = \mu_{ij} + i \cdot \sigma_{ij} \]

where:

  • \(\mu_{ij}\) is the expected cross mean (MPV or TGV)
  • \(i\) is the standardised selection intensity (a function of the selection proportion \(p\))
  • \(\sigma_{ij}\) is the expected standard deviation of progeny genetic values

The progeny variance is estimated as:

\[ \sigma^2_{ij} = \boldsymbol{\beta}' \boldsymbol{\Omega}_{ij} \boldsymbol{\beta} \]

where \(\boldsymbol{\beta}\) is the vector of marker effects and \(\boldsymbol{\Omega}_{ij}\) is the marker covariance matrix for cross \(ij\). For doubled-haploid progeny (Lehermeier et al., 2017):

\[ \mathrm{var}(X_{P_i \times P_j}) = \begin{pmatrix} 4\pi_k(1-\pi_k) & \cdots & 4D_{kl} \\ \vdots & \ddots & \vdots \\ 4D_{lk} & \cdots & 4\pi_l(1-\pi_l) \end{pmatrix} \]

The diagonal \(4\pi_k(1-\pi_k)\) captures locus-specific variance; the off-diagonal \(4D_{kl}\) captures linkage disequilibrium between markers \(k\) and \(l\), scaled by recombination via the Haldane mapping function:

\[ \theta_{kl} = 0.5\,(1 - e^{-2d_{kl}}) \]

where \(d_{kl}\) is the genetic distance (Morgans) between loci \(k\) and \(l\). A simplification would be:

\[ \mathrm{var}(X_{P_i \times P_j}) = (1-2\theta_{kl})2D^*_{kl} \] where \(\theta_{kl}\) is the expected recombination matrix between loci k and l, and \(D^*_{kl}\) is the disequilibrium parameter among parental lines. Usefulness will then be a function of Recombination, Linkage disequilibrium, and Parental haplotypes.

Cross usefulness depends on which alleles are inherited together, not just how many alleles are present.

3.3.1 UC — Additive only (DH lines)

# 1. Mating plan (DH lines)
CrossPlan_DH <- planCross(TargetPop  = ParentsPop_DH@id,
                          MateDesign = "half")
## Number of potential crosses generated: 1225
# 2. Relationship matrix
relMat_DH <- AGHmatrix::Gmatrix(pullQtlGeno(ParentsPop_DH))
## Initial data: 
##  Number of Individuals: 50 
##  Number of Markers: 2000 
## 
## Missing data check: 
##  Total SNPs: 2000 
##   0 SNPs dropped due to missing data threshold of 0.5 
##  Total of: 2000  SNPs 
## 
## MAF check: 
##  No SNPs with MAF below 0 
## 
## Heterozigosity data check: 
##  No SNPs with heterozygosity, missing threshold of =  0 
## 
## Summary check: 
##  Initial:  2000 SNPs 
##  Final:  2000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 0.048  seconds
# 3. Marker matrix
Markers_DH <- pullQtlGeno(ParentsPop_DH)

# 4. Additive marker effects
add_eff <- SP$traits[[1]]@addEff

# 5. Genetic map (cM)
MapIn <- AlphaSimR::getQtlMap(simParam = SP)[, c(2, 4, 1)]
MapIn$pos <- 100 * MapIn$pos   # convert Morgans → cM

# 6. Usefulness criterion (additive; DH)
usefA <- getUsefA(MatePlan = CrossPlan_DH,
                  Markers  = Markers_DH,
                  addEff   = as.matrix(add_eff),
                  Map.In   = MapIn,
                  propSel  = 0.05,
                  Type     = "DH",
                  display_progress = FALSE,
                  K        = relMat_DH)
##   |                                                                              |                                                                      |   0%
## Usefulness predicted for 1225 crosses.
head(usefA[[2]], 5)
##   Parent1 Parent2        Y           K
## 1     470     487 16.95282  0.20184784
## 2     478     487 15.99366 -0.30534027
## 3     455     470 15.74202  0.06467439
## 4     464     470 15.74112 -0.24466382
## 5     470     490 15.73618 -0.14229558
plot(usefA[[2]]$K, usefA[[2]]$Y,
     xlab = "Mean kinship (K)",
     ylab = "Usefulness (additive)",
     main = "UC-A vs. Kinship — DH population",
     pch  = 16, col = adjustcolor("blue", 0.5))

3.3.2 UC — Additive + Dominance (outcrossing)

For outcrossing species with heterozygous parents, phased haplotypes are required to correctly estimate \(D_{kl}\) because the coupling vs. repulsion arrangement of alleles matters for progeny variance:

  • Coupling (AB / ab): preserves favorable allele combinations → higher variance
  • Repulsion (Ab / aB): breaks them apart → lower variance
# 1. Mating plan (outcrossing)
CrossPlan_out <- planCross(TargetPop  = ParentsPop@id,
                           MateDesign = "half")
## Number of potential crosses generated: 300
# 2. Relationship matrix
relMat_out <- AGHmatrix::Gmatrix(pullQtlGeno(ParentsPop))
## Initial data: 
##  Number of Individuals: 25 
##  Number of Markers: 2000 
## 
## Missing data check: 
##  Total SNPs: 2000 
##   0 SNPs dropped due to missing data threshold of 0.5 
##  Total of: 2000  SNPs 
## 
## MAF check: 
##  No SNPs with MAF below 0 
## 
## Heterozigosity data check: 
##  No SNPs with heterozygosity, missing threshold of =  0 
## 
## Summary check: 
##  Initial:  2000 SNPs 
##  Final:  2000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 0.039  seconds
# 3. Phased haplotype markers (required for dominance UC)
MarkersPhased <- pullQtlHaplo(ParentsPop)

# 4. Marker effects
add_eff <- SP$traits[[1]]@addEff
dom_eff <- SP$traits[[1]]@domEff

# 5. Genetic map (cM)
MapIn <- AlphaSimR::getQtlMap(simParam = SP)[, c(2, 4, 1)]
MapIn$pos <- 100 * MapIn$pos

# 6. Usefulness criterion (additive + dominance; phased)
usefAD <- getUsefAD(MatePlan = CrossPlan_out,
                    Markers  = MarkersPhased,
                    addEff   = as.matrix(add_eff),
                    domEff   = as.matrix(dom_eff),
                    Map.In   = MapIn,
                    propSel  = 0.05,
                    Method   = "Phased",
                    ploidy   = 2,
                    display_progress = FALSE,
                    K        = relMat_out)
##   |                                                                              |                                                                      |   0%
## Usefulness predicted for 300 crosses.
head(usefAD[[2]], 5)
##   Parent1 Parent2        Y           K
## 1     428     439 56.52414 -0.14084834
## 2     439     444 55.42244 -0.12759826
## 3     427     439 55.16119 -0.10036198
## 4     439     442 54.97531 -0.13201495
## 5     434     439 54.70364 -0.03392756
plot(usefAD[[2]]$K, usefAD[[2]]$Y,
     xlab = "Mean kinship (K)",
     ylab = "Usefulness (additive + dominance)",
     main = "UC-AD vs. Kinship — outcrossing population",
     pch  = 16, col = adjustcolor("purple", 0.5))

Key insight: UC is generally preferred over MPV and TGV alone because it explicitly rewards crosses that produce high-variance progeny distributions — i.e., crosses with the potential to generate elite recombinants, even if the cross mean is not the highest.


4. Optimal cross selection vs optimum contribution selection

Comparison between Optimum Contribution Selection and Optimal Cross Selection
Aspect Optimum Contribution Selection Optimal Cross Selection
Decision unit Individuals Parental pairs (crosses)
Controls mating? No (mating often random or secondary) Yes (explicitly selects who mates with whom)
Controls contributions? Yes (optimize parental contributions) Implicit, via cross allocation
Within-family variance Ignored (focus on mean breeding value) Explicitly modeled
Inbreeding control Explicit constraint (ΔF or coancestry) Via cross design or penalties
Output List of selected individuals + contributions List of crosses
When to use Population improvement Population improvement, hybrid prediction

References


  1. University of Florida, ↩︎