|
Mate Allocation of Breeding Crosses
Modern plant breeding programs operate through structured pipelines that move individuals across multiple selection stages — from early-generation evaluation to variety release. At each stage, breeders must make critical decisions about which individuals to advance, which to cross, and how to balance competing objectives. Among the most consequential decisions is mate allocation: the strategic pairing of potential parents to generate the next generation of breeding.
A central tension in any breeding program is the trade-off between genetic gain and genetic diversity. Aggressive selection and tight crosses accelerate improvement but erode diversity, creating populations that are genetically narrow and vulnerable to environmental stressors or disease.
This document introduces the theoretical and computational tools
needed to reason about mate allocation, covering identity by descent,
coefficients of coancestry and inbreeding, and both pedigree-based and
marker-based relationship matrices. Practical exercises using the
AGHmatrix R package allow hands-on exploration of these
concepts with simulated and real pedigree data.
Parent selection is not straightforward. High-performing individuals are often found in high-performing families, meaning the best candidates may also be the most related. Key considerations include:
Top selected individuals and family structure
These patterns highlight the need for quantitative tools to measure and manage relatedness explicitly.
The resemblance between relatives is the bedrock of quantitative genetics. It depends on the probability that two individuals share alleles descended from a common ancestor. Two foundation concepts underpin all measures of relatedness:
IBS vs IBS representation. Adapted from Lynch & Walsh, 1998
The distinction matters: pedigree-based relationship matrices rely on IBD probabilities derived from family structure, while genomic matrices (e.g., the G matrix of VanRaden 2008) estimate realized sharing from SNP marker data, which captures IBS at a genome-wide scale.
For any locus in two individuals X and Y, there are 15 possible identity states describing how the four alleles (two per individual) relate through IBD. These condensed coefficients of identity form the theoretical basis for all pairwise genetic relationships.
The coefficient of coancestry between individuals X and Y, denoted \(Θ_{𝑥𝑦}\) , is the probability that a randomly chosen allele from X and a randomly chosen allele from Y are identical by descent. It provides a single summary of the pairwise genetic relationship between two individuals and is directly related to the additive genetic covariance between relatives:
\[\text{Cov}(A_X, A_Y) = 2 f_{XY} \sigma^2_A\]
The coefficient of coancestry is foundational to constructing the numerator relationship matrix (A) used in BLUP-based genetic evaluations.
The inbreeding coefficient \(F\) of an individual X is the probability that its two alleles at a locus are IBD — i.e., that it received copies of the same ancestral allele from both parents. Inbreeding reduces heterozygosity and can expose deleterious recessive alleles, with consequences for fitness and performance. Notably:
\[F = Θ_{𝑥𝑦}\]
The inbreeding coefficient of an individual equals the coefficient of coancestry between its parents.
The Coefficient of relationship \(𝐫_{xy}\): Probability that two alleles at the same locus, one in X and one in Y, are IBD when considering both alleles per individual.
| Pair | Θ_xy | r_xy |
|---|---|---|
| Parent–offspring | 0.250 | 0.50 |
| Full sibs | 0.250 | 0.50 |
| Half sibs | 0.125 | 0.25 |
Modern breeding programs increasingly use marker-based relationship matrices that leverage genome-wide SNP data to estimate realized relatedness, rather than expected relatedness from pedigree alone.
Given a matrix \(M\) of centered marker genotypes (coded as \(-1, 0, 1\)), the genomic relationship matrix is:
\[G = \frac{MM'}{2\sum p_i(1 - p_i)}\]
where \(p_i\) is the allele frequency at marker \(i\). The diagonal elements of \(G\) reflect individual heterozygosity (with values > 1 indicating above-average homozygosity relative to the base population), and off-diagonal elements measure the proportion of the genome shared between pairs of individuals.
The following concepts form an integrated framework for understanding genetic resemblance:
| Concept | Description |
|---|---|
| IBD / IBS | Allelic identity through descent vs. state |
| Coefficients of identity | 15 possible pairwise allelic states |
| Coefficient of coancestry | Probability of sharing an allele IBD |
| Coefficient of inbreeding | Probability of being homozygous by descent |
| Coefficients of relationship | Standardized kinship used in relationship matrices |
| Genetic covariance | Covariance among relatives tied to heritability |
A central challenge in any breeding program can be visualized along two axes — genetic gain and genetic diversity — producing four quadrants:
IBS vs IBS representation. Adapted from Lynch & Walsh, 1998
The ideal zone maximizes both genetic gain and genetic diversity. Breeding programs that push too hard for gain risk eroding diversity (moving to RISKY), while programs without directional selection may maintain diversity but stagnate (STABLE). The central question of mate allocation is:
How do we balance rapid improvement with long-term sustainability in breeding programs?
Quantitative tools — including optimal contribution selection, restriction of average kinship, and structured mate allocation algorithms — provide principled answers to this question.
The following exercises use the AGHmatrix R package to
compute pedigree-based and marker-based relationship matrices, and to
derive inbreeding coefficients from both sources.
We begin with a simple 14-individual pedigree. Founders (individuals
with no known parents) are coded with 0 for both FATHER and
MOTHER.
# Define Pedigree I
ped <- data.frame(
ID = c(1:14),
FATHER = c(0, 0, 0, 0, 0, 1, 3, 3, 0, 0, 4, 6, 8, 10),
MOTHER = c(0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 5, 5, 9, 9)
)
# Convert columns to factors (required by AGHmatrix)
ped$ID <- as.factor(ped$ID)
ped$FATHER <- as.factor(ped$FATHER)
ped$MOTHER <- as.factor(ped$MOTHER)
# Compute the numerator relationship matrix (A matrix)
Aped <- Amatrix(ped, ploidy = 2)#> Verifying conflicting data...
#> Organizing data...
#> Your data was chronologically organized with success.
#> Constructing matrix A using ploidy = 2
#> Completed! Time = 0 minutes
#> 1 2 3 4 5 6 7 8 9 10 11 12 13 14
#> 1 1.00 0.00 0.00 0.0 0.0 0.500 0.000 0.000 0.0 0.0 0.00 0.250 0.000 0.00
#> 2 0.00 1.00 0.00 0.0 0.0 0.500 0.500 0.500 0.0 0.0 0.00 0.250 0.250 0.00
#> 3 0.00 0.00 1.00 0.0 0.0 0.000 0.500 0.500 0.0 0.0 0.00 0.000 0.250 0.00
#> 4 0.00 0.00 0.00 1.0 0.0 0.000 0.000 0.000 0.0 0.0 0.50 0.000 0.000 0.00
#> 5 0.00 0.00 0.00 0.0 1.0 0.000 0.000 0.000 0.0 0.0 0.50 0.500 0.000 0.00
#> 6 0.50 0.50 0.00 0.0 0.0 1.000 0.250 0.250 0.0 0.0 0.00 0.500 0.125 0.00
#> 7 0.00 0.50 0.50 0.0 0.0 0.250 1.000 0.500 0.0 0.0 0.00 0.125 0.250 0.00
#> 8 0.00 0.50 0.50 0.0 0.0 0.250 0.500 1.000 0.0 0.0 0.00 0.125 0.500 0.00
#> 9 0.00 0.00 0.00 0.0 0.0 0.000 0.000 0.000 1.0 0.0 0.00 0.000 0.500 0.50
#> 10 0.00 0.00 0.00 0.0 0.0 0.000 0.000 0.000 0.0 1.0 0.00 0.000 0.000 0.50
#> 11 0.00 0.00 0.00 0.5 0.5 0.000 0.000 0.000 0.0 0.0 1.00 0.250 0.000 0.00
#> 12 0.25 0.25 0.00 0.0 0.5 0.500 0.125 0.125 0.0 0.0 0.25 1.000 0.062 0.00
#> 13 0.00 0.25 0.25 0.0 0.0 0.125 0.250 0.500 0.5 0.0 0.00 0.062 1.000 0.25
#> 14 0.00 0.00 0.00 0.0 0.0 0.000 0.000 0.000 0.5 0.5 0.00 0.000 0.250 1.00
# Mean inbreeding coefficient across the population
# F = diag(A) - 1 (for diploids)
mean_F_I <- mean(diag(Aped)) - 1
cat("Mean inbreeding coefficient (Pedigree I):", round(mean_F_I, 4), "\n")#> Mean inbreeding coefficient (Pedigree I): 0
Interpretation: The mean inbreeding coefficient represents the average probability that individuals in this population carry two alleles that are identical by descent. Values > 0 indicate that some individuals have inbred parents.
Pedigree II modifies the structure by introducing additional crosses that increase relatedness among later-generation individuals. Note that individual 14 now has individual 12 as father and individual 5 as mother, creating closer genealogical ties.
# Define Pedigree II
pedII <- data.frame(
ID = c(1:14),
FATHER = c(0, 0, 0, 0, 0, 1, 3, 3, 0, 0, 4, 6, 8, 12),
MOTHER = c(0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 5, 5, 9, 5)
)
# Convert columns to factors
pedII$ID <- as.factor(pedII$ID)
pedII$FATHER <- as.factor(pedII$FATHER)
pedII$MOTHER <- as.factor(pedII$MOTHER)
# Compute the A matrix for Pedigree II
ApedII <- Amatrix(pedII, ploidy = 2)#> Verifying conflicting data...
#> Organizing data...
#> Your data was chronologically organized with success.
#> Constructing matrix A using ploidy = 2
#> Completed! Time = 0 minutes
#> 1 2 3 4 5 6 7 8 9 10 11 12 13 14
#> 1 1.000 0.000 0.00 0.0 0.00 0.500 0.000 0.000 0.0 0 0.000 0.250 0.000 0.125
#> 2 0.000 1.000 0.00 0.0 0.00 0.500 0.500 0.500 0.0 0 0.000 0.250 0.250 0.125
#> 3 0.000 0.000 1.00 0.0 0.00 0.000 0.500 0.500 0.0 0 0.000 0.000 0.250 0.000
#> 4 0.000 0.000 0.00 1.0 0.00 0.000 0.000 0.000 0.0 0 0.500 0.000 0.000 0.000
#> 5 0.000 0.000 0.00 0.0 1.00 0.000 0.000 0.000 0.0 0 0.500 0.500 0.000 0.750
#> 6 0.500 0.500 0.00 0.0 0.00 1.000 0.250 0.250 0.0 0 0.000 0.500 0.125 0.250
#> 7 0.000 0.500 0.50 0.0 0.00 0.250 1.000 0.500 0.0 0 0.000 0.125 0.250 0.062
#> 8 0.000 0.500 0.50 0.0 0.00 0.250 0.500 1.000 0.0 0 0.000 0.125 0.500 0.062
#> 9 0.000 0.000 0.00 0.0 0.00 0.000 0.000 0.000 1.0 0 0.000 0.000 0.500 0.000
#> 10 0.000 0.000 0.00 0.0 0.00 0.000 0.000 0.000 0.0 1 0.000 0.000 0.000 0.000
#> 11 0.000 0.000 0.00 0.5 0.50 0.000 0.000 0.000 0.0 0 1.000 0.250 0.000 0.375
#> 12 0.250 0.250 0.00 0.0 0.50 0.500 0.125 0.125 0.0 0 0.250 1.000 0.062 0.750
#> 13 0.000 0.250 0.25 0.0 0.00 0.125 0.250 0.500 0.5 0 0.000 0.062 1.000 0.031
#> 14 0.125 0.125 0.00 0.0 0.75 0.250 0.062 0.062 0.0 0 0.375 0.750 0.031 1.250
# Mean inbreeding coefficient for Pedigree II
mean_F_II <- mean(diag(ApedII)) - 1
cat("Mean inbreeding coefficient (Pedigree II):", round(mean_F_II, 4), "\n")#> Mean inbreeding coefficient (Pedigree II): 0.0179
Question: How does the change in mating structure between Pedigree I and Pedigree II affect the average inbreeding level? Which crosses are responsible for the difference?
We now construct a genomic relationship matrix (G matrix) from
simulated SNP data, following VanRaden (2008) Method 1, as implemented
in AGHmatrix.
# Simulate a marker matrix: 12 individuals x 10 SNP loci
# Coded as -1 (aa), 0 (Aa), 1 (AA)
set.seed(5632)
M <- matrix(sample(c(-1, 0, 1), 120, replace = TRUE), ncol = 10)
# Inspect raw marker matrix
cat("Marker matrix M (rows = individuals, columns = SNPs):\n")#> Marker matrix M (rows = individuals, columns = SNPs):
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 1 1 1 -1 -1 1 -1 0 0 0
#> [2,] 1 1 1 1 -1 -1 0 0 -1 0
#> [3,] 1 1 0 -1 -1 -1 0 0 0 -1
#> [4,] 1 -1 -1 1 1 1 1 1 0 1
#> [5,] -1 -1 -1 0 1 0 -1 1 -1 0
#> [6,] -1 1 0 0 1 1 -1 1 1 1
#> [7,] 1 0 0 -1 -1 0 0 -1 0 0
#> [8,] -1 0 1 1 0 -1 1 1 1 0
#> [9,] -1 -1 0 1 -1 0 0 0 0 -1
#> [10,] 1 1 -1 0 1 1 1 0 -1 0
#> [11,] 0 0 1 0 0 0 1 1 1 -1
#> [12,] 1 0 0 1 1 0 -1 1 1 1
#>
#> Cross-product MM':
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#> [1,] 7 2 3 -3 -3 1 3 -3 -2 0 0 0
#> [2,] 2 7 3 -2 -3 -3 1 1 0 0 0 0
#> [3,] 3 3 6 -4 -3 -3 3 -1 -1 0 1 -2
#> [4,] -3 -2 -4 9 2 1 -2 0 -1 4 0 4
#> [5,] -3 -3 -3 2 7 2 -3 -1 1 0 -2 1
#> [6,] 1 -3 -3 1 2 8 -3 1 -2 0 0 4
#> [7,] 3 1 3 -2 -3 -3 4 -3 -1 0 -1 -2
#> [8,] -3 1 -1 0 -1 1 -3 7 2 -3 4 1
#> [9,] -2 0 -1 -1 1 -2 -1 2 5 -3 1 -2
#> [10,] 0 0 0 4 0 0 0 -3 -3 7 -1 0
#> [11,] 0 0 1 0 -2 0 -1 4 1 -1 5 0
#> [12,] 0 0 -2 4 1 4 -2 1 -2 0 0 7
cat("\nNote: diagonal = sum of squared marker scores per individual;\n",
"off-diagonal = shared marker signal between pairs.\n")#>
#> Note: diagonal = sum of squared marker scores per individual;
#> off-diagonal = shared marker signal between pairs.
# Recode M to 0, 1, 2 scale (required by AGHmatrix::Gmatrix)
M1 <- M + 1
# Compute G matrix using VanRaden (2008) Method 1
GMat <- AGHmatrix::Gmatrix(M1, ploidy = 2)#> Initial data:
#> Number of Individuals: 12
#> Number of Markers: 10
#>
#> Missing data check:
#> Total SNPs: 10
#> 0 SNPs dropped due to missing data threshold of 0.5
#> Total of: 10 SNPs
#>
#> MAF check:
#> No SNPs with MAF below 0
#>
#> Heterozigosity data check:
#> No SNPs with heterozygosity, missing threshold of = 0
#>
#> Summary check:
#> Initial: 10 SNPs
#> Final: 10 SNPs ( 0 SNPs removed)
#>
#> Completed! Time = 0.001 seconds
#> Genomic Relationship Matrix (G):
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#> [1,] 1.34 0.29 0.56 -0.78 -0.61 0.08 0.67 -0.73 -0.38 -0.09 -0.14 -0.21
#> [2,] 0.29 1.30 0.55 -0.59 -0.62 -0.76 0.24 0.08 0.01 -0.11 -0.16 -0.23
#> [3,] 0.56 0.55 1.23 -0.93 -0.55 -0.69 0.72 -0.26 -0.12 -0.04 0.12 -0.57
#> [4,] -0.78 -0.59 -0.93 1.65 0.37 0.03 -0.42 -0.16 -0.23 0.68 -0.19 0.56
#> [5,] -0.61 -0.62 -0.55 0.37 1.58 0.41 -0.45 -0.19 0.36 0.03 -0.43 0.12
#> [6,] 0.08 -0.76 -0.69 0.03 0.41 1.51 -0.59 0.08 -0.40 -0.11 -0.16 0.60
#> [7,] 0.67 0.24 0.72 -0.42 -0.45 -0.59 1.03 -0.57 -0.02 0.06 -0.19 -0.47
#> [8,] -0.73 0.08 -0.26 -0.16 -0.19 0.08 -0.57 1.34 0.44 -0.71 0.68 0.00
#> [9,] -0.38 0.01 -0.12 -0.23 0.36 -0.40 -0.02 0.44 1.20 -0.57 0.20 -0.49
#> [10,] -0.09 -0.11 -0.04 0.68 0.03 -0.11 0.06 -0.71 -0.57 1.37 -0.33 -0.19
#> [11,] -0.14 -0.16 0.12 -0.19 -0.43 -0.16 -0.19 0.68 0.20 -0.33 0.86 -0.25
#> [12,] -0.21 -0.23 -0.57 0.56 0.12 0.60 -0.47 0.00 -0.49 -0.19 -0.25 1.13
# Genomic inbreeding coefficients
# F_genomic = diag(G) - 1
F_genomic <- diag(GMat) - 1
cat("Genomic inbreeding coefficients per individual:\n")#> Genomic inbreeding coefficients per individual:
#> [1] 0.3376 0.3032 0.2344 0.6473 0.5785 0.5097 0.0280 0.3376 0.2000
#> [10] 0.3720 -0.1441 0.1312
#>
#> Mean genomic inbreeding: 0.2946
Interpretation: Diagonal values of G greater than 1 indicate individuals that are more homozygous than the population average (positive inbreeding), while values less than 1 indicate above-average heterozygosity. Off-diagonal elements measure realized genomic relationships between pairs of individuals.
# Summary of mean inbreeding across methods
summary_df <- data.frame(
Method = c("Pedigree I (A matrix)", "Pedigree II (A matrix)", "Simulated SNPs (G matrix)"),
Mean_F = round(c(mean_F_I, mean_F_II, mean(F_genomic)), 4),
Source = c("Pedigree", "Pedigree", "Genomic markers")
)
knitr::kable(summary_df,
col.names = c("Method", "Mean Inbreeding (F)", "Data Source"),
caption = "Summary of inbreeding estimates across methods and pedigree structures.")| Method | Mean Inbreeding (F) | Data Source |
|---|---|---|
| Pedigree I (A matrix) | 0.0000 | Pedigree |
| Pedigree II (A matrix) | 0.0179 | Pedigree |
| Simulated SNPs (G matrix) | 0.2946 | Genomic markers |
Course: Survey Tools in Breeding and Methods — University of Florida, Gainesville, 2026
University of Florida, deamorimpeixotom@ufl.edu↩︎