OptCross_MateAllocation.knit

1. Rationale

Genomic selection has transformed plant breeding, enabling more precise and efficient use of genetic information. A natural extension of this framework is the prediction and optimization of new crosses — not just selecting which individuals to advance, but also which parental combinations to make.

Mate selection is the process of simultaneously selecting individuals and allocating mating pairs entering a breeding program. Rather than selecting parents and designing crosses as two separate steps, mate selection integrates both decisions.

An extension of optimum contribution selection (OCS) that delivers a practical crossing plan is to jointly optimize contributions and cross allocations (Kinghorn et al., 2009; Woolliams, 2015; Gorjanc et al., 2018). This approach allows breeders to:

Control the mean genetic value of the progeny (gain)
Manage the rate of inbreeding and diversity loss
Define an explicit crossing plan to implement in the field

The selection criterion and the covariance (kinship) structure among candidates are the two key ingredients in optimal cross selection. In the sections below, we walk through the main criteria used to evaluate and rank potential

First, we will create a population using AlphaSimR package and extract from the population the information needed to implement cross-selection.

2. Building the Base Population

Before evaluating any cross criterion, we first simulate a structured breeding population. The code below produces a doubled-haploid (DH) population as well as a non-DH outcrossing population — the two contexts in which different cross criteria apply.

set.seed(52684)

# Founding genome: 200 individuals, 10 chromosomes, 500 segregating sites
founderGenomes <- runMacs(nInd    = 200,
                          nChr    = 10,
                          segSites = 500)

# Global simulation parameters
SP <- SimParam$new(founderGenomes)

# Add a trait with additive + dominance effects
SP$addTraitAD(nQtlPerChr = 200,
              mean    = 10,
              var     = 10,
              meanDD  = 0.9,
              varDD   = 0.3)
SP$addSnpChip(300)
SP$setVarE(varE = 20)  # h2 ≈ 0.33

# Base population (Hardy–Weinberg equilibrium)
Parents <- newPop(founderGenomes)

# ---- Advance population for 10 cycles (creates structure / LD) ----
ParentsPop <- Parents

for (year in 1:10) {
  set.seed(6985)
  ParentsPop <- setPheno(ParentsPop, h2 = 0.4)
  ParentsPop <- selectCross(pop = ParentsPop, nInd = 20, nCrosses = 25)
}

# ---- DH population (inbred lines; F = 1) ----
ParentsPop_DH <- makeDH(ParentsPop, nDH = 2)
ParentsPop_DH <- setPheno(ParentsPop_DH, h2 = 0.4)

cat("ParentsPop (outcrossing):", ParentsPop@nInd, "individuals\n")

## ParentsPop (outcrossing): 25 individuals

cat("ParentsPop_DH (DH lines):", ParentsPop_DH@nInd, "individuals\n")

## ParentsPop_DH (DH lines): 50 individuals

Note on heritability. With \(\sigma^2_A = 10\) and \(\sigma^2_E = 20\), the narrow-sense heritability is approximately \(h^2 = 10/(10+20) \approx 0.33\) in the base population.

3. Cross Performance Criteria

Choosing a criterion to rank crosses is at the heart of optimal cross selection. Three main criteria are commonly used, each with different assumptions about genetic architecture and the breeding system:

Criterion	Abbrev.	Genetic Effects	Typical Application
Mid-parental value	MPV	Additive only	DH pipelines, small-grain cereals
Total genetic value	TGV	Additive + Dominance	Clonal crops, outcrossing species
Usefulness criterion	UC	Additive ± Dominance	All pipelines; accounts for progeny variance

3.1 Mid-Parental Value (MPV)

For traits with additive genetic effects, the expected mean of a cross is simply the average of the two parental estimated breeding values (EBVs):

\[ MPV_{ij} = \frac{EBV_i + EBV_j}{2} \]

This is the simplest criterion and requires only genomic (or phenotypic) EBVs. It is most appropriate when dominance deviations are negligible (e.g., inbred line × inbred line crosses in a DH pipeline).

# 1. Criterion: genetic values from the DH population
Crit_DH <- data.frame(Id        = ParentsPop_DH@id,
                      Criterion = ParentsPop_DH@gv)

# 2. Genomic relationship matrix (DH lines)
relMat_DH <- AGHmatrix::Gmatrix(pullQtlGeno(ParentsPop_DH))

## Initial data: 
##  Number of Individuals: 50 
##  Number of Markers: 2000 
## 
## Missing data check: 
##  Total SNPs: 2000 
##   0 SNPs dropped due to missing data threshold of 0.5 
##  Total of: 2000  SNPs 
## 
## MAF check: 
##  No SNPs with MAF below 0 
## 
## Heterozigosity data check: 
##  No SNPs with heterozygosity, missing threshold of =  0 
## 
## Summary check: 
##  Initial:  2000 SNPs 
##  Final:  2000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 0.053  seconds

# 3. All pairwise half-diallel mating plan
CrossPlan_DH <- planCross(TargetPop  = ParentsPop_DH@id,
                          MateDesign = "half")

## Number of potential crosses generated: 1225

# 4. Mid-parental value
ST_mpv <- getMPV(MatePlan  = CrossPlan_DH,
                 Criterion = Crit_DH,
                 K         = relMat_DH)

## 1225 possible crosses were predicted

head(ST_mpv, 5)

##   Parent1 Parent2         Y           K
## 1     470     487 -21.09289  0.20184784
## 2     470     488 -21.76975  0.28374243
## 3     487     488 -22.11099  1.17490452
## 4     470     490 -22.29350 -0.14229558
## 5     456     470 -22.36950 -0.05146703

# Relationship between cross mean (Y) and mean kinship (K)
plot(ST_mpv$K, ST_mpv$Y,
     xlab = "Mean kinship (K)",
     ylab = "Mid-parental value",
     main = "MPV vs. Kinship — DH population",
     pch  = 16, col = adjustcolor("blue", 0.5))

Interpretation. The horizontal axis shows the average kinship of each parental pair. Crosses on the left (lower kinship) draw from more diverse parents, while those on the right involve more related individuals. Breeders often seek crosses with high MPV but moderate/lower kinship.

3.2 Total Genetic Value (TGV)

When dominance effects contribute substantially to trait expression — as in clonal crops or outcrossing species — the expected genetic value of the progeny must also account for heterosis. The TGV criterion is:

\[ TGV_{ij} = \mathbf{A}(\mathbf{p}_{ik} - \mathbf{q}_{ik} - \mathbf{y}_k) + \mathbf{D}(2\mathbf{p}_{ik}\mathbf{q}_{ik} + \mathbf{y}_k(\mathbf{p}_{ik} - \mathbf{q}_{ik})) \]

where A and D are vectors of additive and dominance marker effects; \(p_{ik}\) and \(q_{ik}\) are allele frequencies for marker \(k\) in parents \(i\) and \(j\); and \(y_k\) is the difference in allele frequencies between the two parents at marker \(k\).

# 1. Mating plan (outcrossing population — not DH)
CrossPlan_out <- planCross(TargetPop  = ParentsPop@id,
                           MateDesign = "half")

## Number of potential crosses generated: 300

# 2. Relationship matrix
relMat_out <- AGHmatrix::Gmatrix(pullQtlGeno(ParentsPop))

## Initial data: 
##  Number of Individuals: 25 
##  Number of Markers: 2000 
## 
## Missing data check: 
##  Total SNPs: 2000 
##   0 SNPs dropped due to missing data threshold of 0.5 
##  Total of: 2000  SNPs 
## 
## MAF check: 
##  No SNPs with MAF below 0 
## 
## Heterozigosity data check: 
##  No SNPs with heterozygosity, missing threshold of =  0 
## 
## Summary check: 
##  Initial:  2000 SNPs 
##  Final:  2000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 0.035  seconds

# 3. Marker genotypes (unphased)
Markers_out <- pullQtlGeno(ParentsPop)

# 4. Marker effects
add_eff <- SP$traits[[1]]@addEff
dom_eff <- SP$traits[[1]]@domEff

# 5. Total genetic value
ST_tgv <- getTGV(MatePlan = CrossPlan_out,
                 Markers  = Markers_out,
                 addEff   = add_eff,
                 domEff   = dom_eff,
                 ploidy   = 2,
                 K        = relMat_out)

## TGVs predicted for 300 crosses.

head(ST_tgv, 5)

##   Parent1 Parent2        Y           K
## 1     426     427 39.55368  0.10547431
## 2     426     428 40.12604  0.19150780
## 3     426     429 45.26235 -0.20130182
## 4     426     430 41.91924 -0.07864658
## 5     426     431 38.79195  0.13096231

plot(ST_tgv$K, ST_tgv$Y,
     xlab = "Mean kinship (K)",
     ylab = "Total genetic value",
     main = "TGV vs. Kinship — outcrossing population",
     pch  = 16, col = adjustcolor("darkorange", 0.5))

3.3 Usefulness Criterion (UC)

The Usefulness Criterion (Schnell & Utz, 1975) extends the expected cross mean by incorporating the progeny standard deviation (\(\sigma\)). It answers the question: What is the best individual we can realistically select from this cross?

\[ UC_{ij} = \mu_{ij} + i \cdot \sigma_{ij} \]

where:

\(\mu_{ij}\) is the expected cross mean (MPV or TGV)
\(i\) is the standardised selection intensity (a function of the selection proportion \(p\))
\(\sigma_{ij}\) is the expected standard deviation of progeny genetic values

The progeny variance is estimated as:

\[ \sigma^2_{ij} = \boldsymbol{\beta}' \boldsymbol{\Omega}_{ij} \boldsymbol{\beta} \]

where \(\boldsymbol{\beta}\) is the vector of marker effects and \(\boldsymbol{\Omega}_{ij}\) is the marker covariance matrix for cross \(ij\). For doubled-haploid progeny (Lehermeier et al., 2017):

\[ \mathrm{var}(X_{P_i \times P_j}) = \begin{pmatrix} 4\pi_k(1-\pi_k) & \cdots & 4D_{kl} \\ \vdots & \ddots & \vdots \\ 4D_{lk} & \cdots & 4\pi_l(1-\pi_l) \end{pmatrix} \]

The diagonal \(4\pi_k(1-\pi_k)\) captures locus-specific variance; the off-diagonal \(4D_{kl}\) captures linkage disequilibrium between markers \(k\) and \(l\), scaled by recombination via the Haldane mapping function:

\[ \theta_{kl} = 0.5\,(1 - e^{-2d_{kl}}) \]

where \(d_{kl}\) is the genetic distance (Morgans) between loci \(k\) and \(l\). A simplification would be:

\[ \mathrm{var}(X_{P_i \times P_j}) = (1-2\theta_{kl})2D^*_{kl} \] where \(\theta_{kl}\) is the expected recombination matrix between loci k and l, and \(D^*_{kl}\) is the disequilibrium parameter among parental lines. Usefulness will then be a function of Recombination, Linkage disequilibrium, and Parental haplotypes.

Cross usefulness depends on which alleles are inherited together, not just how many alleles are present.

3.3.1 UC — Additive only (DH lines)

# 1. Mating plan (DH lines)
CrossPlan_DH <- planCross(TargetPop  = ParentsPop_DH@id,
                          MateDesign = "half")

## Number of potential crosses generated: 1225

# 2. Relationship matrix
relMat_DH <- AGHmatrix::Gmatrix(pullQtlGeno(ParentsPop_DH))

## Initial data: 
##  Number of Individuals: 50 
##  Number of Markers: 2000 
## 
## Missing data check: 
##  Total SNPs: 2000 
##   0 SNPs dropped due to missing data threshold of 0.5 
##  Total of: 2000  SNPs 
## 
## MAF check: 
##  No SNPs with MAF below 0 
## 
## Heterozigosity data check: 
##  No SNPs with heterozygosity, missing threshold of =  0 
## 
## Summary check: 
##  Initial:  2000 SNPs 
##  Final:  2000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 0.048  seconds

# 3. Marker matrix
Markers_DH <- pullQtlGeno(ParentsPop_DH)

# 4. Additive marker effects
add_eff <- SP$traits[[1]]@addEff

# 5. Genetic map (cM)
MapIn <- AlphaSimR::getQtlMap(simParam = SP)[, c(2, 4, 1)]
MapIn$pos <- 100 * MapIn$pos   # convert Morgans → cM

# 6. Usefulness criterion (additive; DH)
usefA <- getUsefA(MatePlan = CrossPlan_DH,
                  Markers  = Markers_DH,
                  addEff   = as.matrix(add_eff),
                  Map.In   = MapIn,
                  propSel  = 0.05,
                  Type     = "DH",
                  display_progress = FALSE,
                  K        = relMat_DH)

##   |                                                                              |                                                                      |   0%
## Usefulness predicted for 1225 crosses.

head(usefA[[2]], 5)

##   Parent1 Parent2        Y           K
## 1     470     487 16.95282  0.20184784
## 2     478     487 15.99366 -0.30534027
## 3     455     470 15.74202  0.06467439
## 4     464     470 15.74112 -0.24466382
## 5     470     490 15.73618 -0.14229558

plot(usefA[[2]]$K, usefA[[2]]$Y,
     xlab = "Mean kinship (K)",
     ylab = "Usefulness (additive)",
     main = "UC-A vs. Kinship — DH population",
     pch  = 16, col = adjustcolor("blue", 0.5))

3.3.2 UC — Additive + Dominance (outcrossing)

For outcrossing species with heterozygous parents, phased haplotypes are required to correctly estimate \(D_{kl}\) because the coupling vs. repulsion arrangement of alleles matters for progeny variance:

Coupling (AB / ab): preserves favorable allele combinations → higher variance
Repulsion (Ab / aB): breaks them apart → lower variance

# 1. Mating plan (outcrossing)
CrossPlan_out <- planCross(TargetPop  = ParentsPop@id,
                           MateDesign = "half")

## Number of potential crosses generated: 300

# 2. Relationship matrix
relMat_out <- AGHmatrix::Gmatrix(pullQtlGeno(ParentsPop))

## Initial data: 
##  Number of Individuals: 25 
##  Number of Markers: 2000 
## 
## Missing data check: 
##  Total SNPs: 2000 
##   0 SNPs dropped due to missing data threshold of 0.5 
##  Total of: 2000  SNPs 
## 
## MAF check: 
##  No SNPs with MAF below 0 
## 
## Heterozigosity data check: 
##  No SNPs with heterozygosity, missing threshold of =  0 
## 
## Summary check: 
##  Initial:  2000 SNPs 
##  Final:  2000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 0.039  seconds

# 3. Phased haplotype markers (required for dominance UC)
MarkersPhased <- pullQtlHaplo(ParentsPop)

# 4. Marker effects
add_eff <- SP$traits[[1]]@addEff
dom_eff <- SP$traits[[1]]@domEff

# 5. Genetic map (cM)
MapIn <- AlphaSimR::getQtlMap(simParam = SP)[, c(2, 4, 1)]
MapIn$pos <- 100 * MapIn$pos

# 6. Usefulness criterion (additive + dominance; phased)
usefAD <- getUsefAD(MatePlan = CrossPlan_out,
                    Markers  = MarkersPhased,
                    addEff   = as.matrix(add_eff),
                    domEff   = as.matrix(dom_eff),
                    Map.In   = MapIn,
                    propSel  = 0.05,
                    Method   = "Phased",
                    ploidy   = 2,
                    display_progress = FALSE,
                    K        = relMat_out)

##   |                                                                              |                                                                      |   0%
## Usefulness predicted for 300 crosses.

head(usefAD[[2]], 5)

##   Parent1 Parent2        Y           K
## 1     428     439 56.52414 -0.14084834
## 2     439     444 55.42244 -0.12759826
## 3     427     439 55.16119 -0.10036198
## 4     439     442 54.97531 -0.13201495
## 5     434     439 54.70364 -0.03392756

plot(usefAD[[2]]$K, usefAD[[2]]$Y,
     xlab = "Mean kinship (K)",
     ylab = "Usefulness (additive + dominance)",
     main = "UC-AD vs. Kinship — outcrossing population",
     pch  = 16, col = adjustcolor("purple", 0.5))

Key insight: UC is generally preferred over MPV and TGV alone because it explicitly rewards crosses that produce high-variance progeny distributions — i.e., crosses with the potential to generate elite recombinants, even if the cross mean is not the highest.

4. Optimal cross selection vs optimum contribution selection

Comparison between Optimum Contribution Selection and Optimal Cross Selection
Aspect	Optimum Contribution Selection	Optimal Cross Selection
Decision unit	Individuals	Parental pairs (crosses)
Controls mating?	No (mating often random or secondary)	Yes (explicitly selects who mates with whom)
Controls contributions?	Yes (optimize parental contributions)	Implicit, via cross allocation
Within-family variance	Ignored (focus on mean breeding value)	Explicitly modeled
Inbreeding control	Explicit constraint (ΔF or coancestry)	Via cross design or penalties
Output	List of selected individuals + contributions	List of crosses
When to use	Population improvement	Population improvement, hybrid prediction

References

Akdemir, D., Sanchez, J. I., & Jannink, J.-L. (2016). Optimization of genomic selection training populations with a genetic algorithm. G3: Genes, Genomes, Genetics, 6(3), 721–731.
Allier, A., Teyssèdre, S., Lehermeier, C., Moreau, L., & Charcosset, A. (2019). Optimized breeding strategies to harness genetic resources with different performance levels. G3: Genes, Genomes, Genetics, 9(5), 1593–1605.
Falconer, D. S., & Mackay, T. F. C. (1996). Introduction to Quantitative Genetics (4th ed.). Longman, Harlow, UK.
Fernández-González, J., Metwally, S. M., & Isidro y Sánchez, J. (2026). MateR: a novel genomic mating framework. Genetics, iyag013.
Gorjanc, G., Gaynor, R. C., & Hickey, J. M. (2018). Optimal cross selection for long-term genetic gain in two-part programs with rapid recurrent genomic selection. Genetics, 210(4), 1241–1259.
Kinghorn, B. P. (1999). An algorithm for efficient constrained mate selection. Livestock Production Science, 59(2–3), 119–130.
Kinghorn, B. P., et al. (2009). Mate selection in livestock breeding programs: Advances and challenges. In: Proceedings of the Association for the Advancement of Animal Breeding and Genetics (AABG).
Lehermeier, C., Teyssèdre, S., & Schön, C.-C. (2017). Genomic prediction of within-family variance in biparental populations. Genetics, 207(4), 1631–1644. Peixoto et al. 2025 genomic mate selection plant genome
Peixoto, M. A., Amadeu, R. R., Bhering, L. L., Ferrão, L. F. V., Munoz, P. R., & Resende Jr, M. F. (2025). SimpleMating: R‐package for prediction and optimization of breeding crosses using genomic selection. The plant genome, 18(1), e20533.
Schnell, F. W., & Utz, H. F. (1975). F1-Leistung und Elternwahl in der Züchtung von Selbstbefruchtern. In: Arbeitstagung der Vereinigung österreichischer Pflanzenzüchter, 243–248.
Wolfe, M. D., et al. (2021). Genomic mating in outbred species: predicting cross usefulness in cassava breeding. The Plant Genome, 14(1), e20099.
Woolliams, J. A. (2015). Optimal contribution selection and the management of genetic diversity. In: Proceedings of the 66th Annual Meeting of the European Federation of Animal Science (EAAP).

University of Florida, deamorimpeixotom@ufl.edu ↩︎

|

Mate Allocation of Breeding Crosses

Optimal Cross Selection I: Criterion

Marco Antonio Peixoto, Associate researcher¹

Gainesville, 2026

1. Rationale

2. Building the Base Population

3. Cross Performance Criteria

3.1 Mid-Parental Value (MPV)

3.2 Total Genetic Value (TGV)

3.3 Usefulness Criterion (UC)

3.3.1 UC — Additive only (DH lines)

3.3.2 UC — Additive + Dominance (outcrossing)

4. Optimal cross selection vs optimum contribution selection

References

| Mate Allocation of Breeding Crosses

Optimal Cross Selection I: Criterion

Marco Antonio Peixoto, Associate researcher1

Gainesville, 2026

1. Rationale

2. Building the Base Population

3. Cross Performance Criteria

3.1 Mid-Parental Value (MPV)

3.2 Total Genetic Value (TGV)

3.3 Usefulness Criterion (UC)

3.3.1 UC — Additive only (DH lines)

3.3.2 UC — Additive + Dominance (outcrossing)

4. Optimal cross selection vs optimum contribution selection

References

|

Mate Allocation of Breeding Crosses

Marco Antonio Peixoto, Associate researcher¹