|
Mate Allocation of Breeding Crosses
Genomic selection has transformed plant breeding, enabling more precise and efficient use of genetic information. A natural extension of this framework is the prediction and optimization of new crosses — not just selecting which individuals to advance, but also which parental combinations to make.
Mate selection is the process of simultaneously selecting individuals and allocating mating pairs entering a breeding program. Rather than selecting parents and designing crosses as two separate steps, mate selection integrates both decisions.
An extension of optimum contribution selection (OCS) that delivers a practical crossing plan is to jointly optimize contributions and cross allocations (Kinghorn et al., 2009; Woolliams, 2015; Gorjanc et al., 2018). This approach allows breeders to:
The selection criterion and the covariance (kinship) structure among candidates are the two key ingredients in optimal cross selection. In the sections below, we walk through the main criteria used to evaluate and rank potential
First, we will create a population using AlphaSimR package and extract from the population the information needed to implement cross-selection.
Before evaluating any cross criterion, we first simulate a structured breeding population. The code below produces a doubled-haploid (DH) population as well as a non-DH outcrossing population — the two contexts in which different cross criteria apply.
set.seed(52684)
# Founding genome: 200 individuals, 10 chromosomes, 500 segregating sites
founderGenomes <- runMacs(nInd = 200,
nChr = 10,
segSites = 500)
# Global simulation parameters
SP <- SimParam$new(founderGenomes)
# Add a trait with additive + dominance effects
SP$addTraitAD(nQtlPerChr = 200,
mean = 10,
var = 10,
meanDD = 0.9,
varDD = 0.3)
SP$addSnpChip(300)
SP$setVarE(varE = 20) # h2 ≈ 0.33
# Base population (Hardy–Weinberg equilibrium)
Parents <- newPop(founderGenomes)
# ---- Advance population for 10 cycles (creates structure / LD) ----
ParentsPop <- Parents
for (year in 1:10) {
set.seed(6985)
ParentsPop <- setPheno(ParentsPop, h2 = 0.4)
ParentsPop <- selectCross(pop = ParentsPop, nInd = 20, nCrosses = 25)
}
# ---- DH population (inbred lines; F = 1) ----
ParentsPop_DH <- makeDH(ParentsPop, nDH = 2)
ParentsPop_DH <- setPheno(ParentsPop_DH, h2 = 0.4)
cat("ParentsPop (outcrossing):", ParentsPop@nInd, "individuals\n")## ParentsPop (outcrossing): 25 individuals
## ParentsPop_DH (DH lines): 50 individuals
Note on heritability. With \(\sigma^2_A = 10\) and \(\sigma^2_E = 20\), the narrow-sense heritability is approximately \(h^2 = 10/(10+20) \approx 0.33\) in the base population.
Choosing a criterion to rank crosses is at the heart of optimal cross selection. Three main criteria are commonly used, each with different assumptions about genetic architecture and the breeding system:
| Criterion | Abbrev. | Genetic Effects | Typical Application |
|---|---|---|---|
| Mid-parental value | MPV | Additive only | DH pipelines, small-grain cereals |
| Total genetic value | TGV | Additive + Dominance | Clonal crops, outcrossing species |
| Usefulness criterion | UC | Additive ± Dominance | All pipelines; accounts for progeny variance |
For traits with additive genetic effects, the expected mean of a cross is simply the average of the two parental estimated breeding values (EBVs):
\[ MPV_{ij} = \frac{EBV_i + EBV_j}{2} \]
This is the simplest criterion and requires only genomic (or phenotypic) EBVs. It is most appropriate when dominance deviations are negligible (e.g., inbred line × inbred line crosses in a DH pipeline).
# 1. Criterion: genetic values from the DH population
Crit_DH <- data.frame(Id = ParentsPop_DH@id,
Criterion = ParentsPop_DH@gv)
# 2. Genomic relationship matrix (DH lines)
relMat_DH <- AGHmatrix::Gmatrix(pullQtlGeno(ParentsPop_DH))## Initial data:
## Number of Individuals: 50
## Number of Markers: 2000
##
## Missing data check:
## Total SNPs: 2000
## 0 SNPs dropped due to missing data threshold of 0.5
## Total of: 2000 SNPs
##
## MAF check:
## No SNPs with MAF below 0
##
## Heterozigosity data check:
## No SNPs with heterozygosity, missing threshold of = 0
##
## Summary check:
## Initial: 2000 SNPs
## Final: 2000 SNPs ( 0 SNPs removed)
##
## Completed! Time = 0.053 seconds
# 3. All pairwise half-diallel mating plan
CrossPlan_DH <- planCross(TargetPop = ParentsPop_DH@id,
MateDesign = "half")## Number of potential crosses generated: 1225
# 4. Mid-parental value
ST_mpv <- getMPV(MatePlan = CrossPlan_DH,
Criterion = Crit_DH,
K = relMat_DH)## 1225 possible crosses were predicted
## Parent1 Parent2 Y K
## 1 470 487 -21.09289 0.20184784
## 2 470 488 -21.76975 0.28374243
## 3 487 488 -22.11099 1.17490452
## 4 470 490 -22.29350 -0.14229558
## 5 456 470 -22.36950 -0.05146703
# Relationship between cross mean (Y) and mean kinship (K)
plot(ST_mpv$K, ST_mpv$Y,
xlab = "Mean kinship (K)",
ylab = "Mid-parental value",
main = "MPV vs. Kinship — DH population",
pch = 16, col = adjustcolor("blue", 0.5))Interpretation. The horizontal axis shows the average kinship of each parental pair. Crosses on the left (lower kinship) draw from more diverse parents, while those on the right involve more related individuals. Breeders often seek crosses with high MPV but moderate/lower kinship.
When dominance effects contribute substantially to trait expression — as in clonal crops or outcrossing species — the expected genetic value of the progeny must also account for heterosis. The TGV criterion is:
\[ TGV_{ij} = \mathbf{A}(\mathbf{p}_{ik} - \mathbf{q}_{ik} - \mathbf{y}_k) + \mathbf{D}(2\mathbf{p}_{ik}\mathbf{q}_{ik} + \mathbf{y}_k(\mathbf{p}_{ik} - \mathbf{q}_{ik})) \]
where A and D are vectors of additive and dominance marker effects; \(p_{ik}\) and \(q_{ik}\) are allele frequencies for marker \(k\) in parents \(i\) and \(j\); and \(y_k\) is the difference in allele frequencies between the two parents at marker \(k\).
# 1. Mating plan (outcrossing population — not DH)
CrossPlan_out <- planCross(TargetPop = ParentsPop@id,
MateDesign = "half")## Number of potential crosses generated: 300
## Initial data:
## Number of Individuals: 25
## Number of Markers: 2000
##
## Missing data check:
## Total SNPs: 2000
## 0 SNPs dropped due to missing data threshold of 0.5
## Total of: 2000 SNPs
##
## MAF check:
## No SNPs with MAF below 0
##
## Heterozigosity data check:
## No SNPs with heterozygosity, missing threshold of = 0
##
## Summary check:
## Initial: 2000 SNPs
## Final: 2000 SNPs ( 0 SNPs removed)
##
## Completed! Time = 0.035 seconds
# 3. Marker genotypes (unphased)
Markers_out <- pullQtlGeno(ParentsPop)
# 4. Marker effects
add_eff <- SP$traits[[1]]@addEff
dom_eff <- SP$traits[[1]]@domEff
# 5. Total genetic value
ST_tgv <- getTGV(MatePlan = CrossPlan_out,
Markers = Markers_out,
addEff = add_eff,
domEff = dom_eff,
ploidy = 2,
K = relMat_out)## TGVs predicted for 300 crosses.
## Parent1 Parent2 Y K
## 1 426 427 39.55368 0.10547431
## 2 426 428 40.12604 0.19150780
## 3 426 429 45.26235 -0.20130182
## 4 426 430 41.91924 -0.07864658
## 5 426 431 38.79195 0.13096231
plot(ST_tgv$K, ST_tgv$Y,
xlab = "Mean kinship (K)",
ylab = "Total genetic value",
main = "TGV vs. Kinship — outcrossing population",
pch = 16, col = adjustcolor("darkorange", 0.5))The Usefulness Criterion (Schnell & Utz, 1975) extends the expected cross mean by incorporating the progeny standard deviation (\(\sigma\)). It answers the question: What is the best individual we can realistically select from this cross?
\[ UC_{ij} = \mu_{ij} + i \cdot \sigma_{ij} \]
where:
The progeny variance is estimated as:
\[ \sigma^2_{ij} = \boldsymbol{\beta}' \boldsymbol{\Omega}_{ij} \boldsymbol{\beta} \]
where \(\boldsymbol{\beta}\) is the vector of marker effects and \(\boldsymbol{\Omega}_{ij}\) is the marker covariance matrix for cross \(ij\). For doubled-haploid progeny (Lehermeier et al., 2017):
\[ \mathrm{var}(X_{P_i \times P_j}) = \begin{pmatrix} 4\pi_k(1-\pi_k) & \cdots & 4D_{kl} \\ \vdots & \ddots & \vdots \\ 4D_{lk} & \cdots & 4\pi_l(1-\pi_l) \end{pmatrix} \]
The diagonal \(4\pi_k(1-\pi_k)\) captures locus-specific variance; the off-diagonal \(4D_{kl}\) captures linkage disequilibrium between markers \(k\) and \(l\), scaled by recombination via the Haldane mapping function:
\[ \theta_{kl} = 0.5\,(1 - e^{-2d_{kl}}) \]
where \(d_{kl}\) is the genetic distance (Morgans) between loci \(k\) and \(l\). A simplification would be:
\[ \mathrm{var}(X_{P_i \times P_j}) = (1-2\theta_{kl})2D^*_{kl} \] where \(\theta_{kl}\) is the expected recombination matrix between loci k and l, and \(D^*_{kl}\) is the disequilibrium parameter among parental lines. Usefulness will then be a function of Recombination, Linkage disequilibrium, and Parental haplotypes.
Cross usefulness depends on which alleles are inherited together, not just how many alleles are present.
# 1. Mating plan (DH lines)
CrossPlan_DH <- planCross(TargetPop = ParentsPop_DH@id,
MateDesign = "half")## Number of potential crosses generated: 1225
## Initial data:
## Number of Individuals: 50
## Number of Markers: 2000
##
## Missing data check:
## Total SNPs: 2000
## 0 SNPs dropped due to missing data threshold of 0.5
## Total of: 2000 SNPs
##
## MAF check:
## No SNPs with MAF below 0
##
## Heterozigosity data check:
## No SNPs with heterozygosity, missing threshold of = 0
##
## Summary check:
## Initial: 2000 SNPs
## Final: 2000 SNPs ( 0 SNPs removed)
##
## Completed! Time = 0.048 seconds
# 3. Marker matrix
Markers_DH <- pullQtlGeno(ParentsPop_DH)
# 4. Additive marker effects
add_eff <- SP$traits[[1]]@addEff
# 5. Genetic map (cM)
MapIn <- AlphaSimR::getQtlMap(simParam = SP)[, c(2, 4, 1)]
MapIn$pos <- 100 * MapIn$pos # convert Morgans → cM
# 6. Usefulness criterion (additive; DH)
usefA <- getUsefA(MatePlan = CrossPlan_DH,
Markers = Markers_DH,
addEff = as.matrix(add_eff),
Map.In = MapIn,
propSel = 0.05,
Type = "DH",
display_progress = FALSE,
K = relMat_DH)## | | | 0%
## Usefulness predicted for 1225 crosses.
## Parent1 Parent2 Y K
## 1 470 487 16.95282 0.20184784
## 2 478 487 15.99366 -0.30534027
## 3 455 470 15.74202 0.06467439
## 4 464 470 15.74112 -0.24466382
## 5 470 490 15.73618 -0.14229558
plot(usefA[[2]]$K, usefA[[2]]$Y,
xlab = "Mean kinship (K)",
ylab = "Usefulness (additive)",
main = "UC-A vs. Kinship — DH population",
pch = 16, col = adjustcolor("blue", 0.5))For outcrossing species with heterozygous parents, phased haplotypes are required to correctly estimate \(D_{kl}\) because the coupling vs. repulsion arrangement of alleles matters for progeny variance:
# 1. Mating plan (outcrossing)
CrossPlan_out <- planCross(TargetPop = ParentsPop@id,
MateDesign = "half")## Number of potential crosses generated: 300
## Initial data:
## Number of Individuals: 25
## Number of Markers: 2000
##
## Missing data check:
## Total SNPs: 2000
## 0 SNPs dropped due to missing data threshold of 0.5
## Total of: 2000 SNPs
##
## MAF check:
## No SNPs with MAF below 0
##
## Heterozigosity data check:
## No SNPs with heterozygosity, missing threshold of = 0
##
## Summary check:
## Initial: 2000 SNPs
## Final: 2000 SNPs ( 0 SNPs removed)
##
## Completed! Time = 0.039 seconds
# 3. Phased haplotype markers (required for dominance UC)
MarkersPhased <- pullQtlHaplo(ParentsPop)
# 4. Marker effects
add_eff <- SP$traits[[1]]@addEff
dom_eff <- SP$traits[[1]]@domEff
# 5. Genetic map (cM)
MapIn <- AlphaSimR::getQtlMap(simParam = SP)[, c(2, 4, 1)]
MapIn$pos <- 100 * MapIn$pos
# 6. Usefulness criterion (additive + dominance; phased)
usefAD <- getUsefAD(MatePlan = CrossPlan_out,
Markers = MarkersPhased,
addEff = as.matrix(add_eff),
domEff = as.matrix(dom_eff),
Map.In = MapIn,
propSel = 0.05,
Method = "Phased",
ploidy = 2,
display_progress = FALSE,
K = relMat_out)## | | | 0%
## Usefulness predicted for 300 crosses.
## Parent1 Parent2 Y K
## 1 428 439 56.52414 -0.14084834
## 2 439 444 55.42244 -0.12759826
## 3 427 439 55.16119 -0.10036198
## 4 439 442 54.97531 -0.13201495
## 5 434 439 54.70364 -0.03392756
plot(usefAD[[2]]$K, usefAD[[2]]$Y,
xlab = "Mean kinship (K)",
ylab = "Usefulness (additive + dominance)",
main = "UC-AD vs. Kinship — outcrossing population",
pch = 16, col = adjustcolor("purple", 0.5))Key insight: UC is generally preferred over MPV and TGV alone because it explicitly rewards crosses that produce high-variance progeny distributions — i.e., crosses with the potential to generate elite recombinants, even if the cross mean is not the highest.
| Aspect | Optimum Contribution Selection | Optimal Cross Selection |
|---|---|---|
| Decision unit | Individuals | Parental pairs (crosses) |
| Controls mating? | No (mating often random or secondary) | Yes (explicitly selects who mates with whom) |
| Controls contributions? | Yes (optimize parental contributions) | Implicit, via cross allocation |
| Within-family variance | Ignored (focus on mean breeding value) | Explicitly modeled |
| Inbreeding control | Explicit constraint (ΔF or coancestry) | Via cross design or penalties |
| Output | List of selected individuals + contributions | List of crosses |
| When to use | Population improvement | Population improvement, hybrid prediction |
Akdemir, D., Sanchez, J. I., & Jannink, J.-L. (2016). Optimization of genomic selection training populations with a genetic algorithm. G3: Genes, Genomes, Genetics, 6(3), 721–731.
Allier, A., Teyssèdre, S., Lehermeier, C., Moreau, L., & Charcosset, A. (2019). Optimized breeding strategies to harness genetic resources with different performance levels. G3: Genes, Genomes, Genetics, 9(5), 1593–1605.
Falconer, D. S., & Mackay, T. F. C. (1996). Introduction to Quantitative Genetics (4th ed.). Longman, Harlow, UK.
Fernández-González, J., Metwally, S. M., & Isidro y Sánchez, J. (2026). MateR: a novel genomic mating framework. Genetics, iyag013.
Gorjanc, G., Gaynor, R. C., & Hickey, J. M. (2018). Optimal cross selection for long-term genetic gain in two-part programs with rapid recurrent genomic selection. Genetics, 210(4), 1241–1259.
Kinghorn, B. P. (1999). An algorithm for efficient constrained mate selection. Livestock Production Science, 59(2–3), 119–130.
Kinghorn, B. P., et al. (2009). Mate selection in livestock breeding programs: Advances and challenges. In: Proceedings of the Association for the Advancement of Animal Breeding and Genetics (AABG).
Lehermeier, C., Teyssèdre, S., & Schön, C.-C. (2017). Genomic prediction of within-family variance in biparental populations. Genetics, 207(4), 1631–1644. Peixoto et al. 2025 genomic mate selection plant genome
Peixoto, M. A., Amadeu, R. R., Bhering, L. L., Ferrão, L. F. V., Munoz, P. R., & Resende Jr, M. F. (2025). SimpleMating: R‐package for prediction and optimization of breeding crosses using genomic selection. The plant genome, 18(1), e20533.
Schnell, F. W., & Utz, H. F. (1975). F1-Leistung und Elternwahl in der Züchtung von Selbstbefruchtern. In: Arbeitstagung der Vereinigung österreichischer Pflanzenzüchter, 243–248.
Wolfe, M. D., et al. (2021). Genomic mating in outbred species: predicting cross usefulness in cassava breeding. The Plant Genome, 14(1), e20099.
Woolliams, J. A. (2015). Optimal contribution selection and the management of genetic diversity. In: Proceedings of the 66th Annual Meeting of the European Federation of Animal Science (EAAP).
University of Florida, deamorimpeixotom@ufl.edu↩︎