Introduction

Modern plant breeding programs operate through structured pipelines that move individuals across multiple selection stages — from early-generation evaluation to variety release. At each stage, breeders must make critical decisions about which individuals to advance, which to cross, and how to balance competing objectives. Among the most consequential decisions is mate allocation: the strategic pairing of potential parents to generate the next generation of breeding.

A central tension in any breeding program is the trade-off between genetic gain and genetic diversity. Aggressive selection and tight crosses accelerate improvement but erode diversity, creating populations that are genetically narrow and vulnerable to environmental stressors or disease.

This document introduces the theoretical and computational tools needed to reason about mate allocation, covering identity by descent, coefficients of coancestry and inbreeding, and both pedigree-based and marker-based relationship matrices. Practical exercises using the AGHmatrix R package allow hands-on exploration of these concepts with simulated and real pedigree data.


Topics Explored

Selection of Parents: Who to Select?

Parent selection is not straightforward. High-performing individuals are often found in high-performing families, meaning the best candidates may also be the most related. Key considerations include:

  • Relatedness among top performers: The best individuals in a population tend to share common ancestors, creating a risk of genetic bottlenecks if they are mated repeatedly.
  • Family structure: In many breeding programs, “good individuals in good families” are systematically favored, further concentrating pedigree relationships.
Top selected individuals and family structure

Top selected individuals and family structure


These patterns highlight the need for quantitative tools to measure and manage relatedness explicitly.

Resemblance Between Relatives

The resemblance between relatives is the bedrock of quantitative genetics. It depends on the probability that two individuals share alleles descended from a common ancestor. Two foundation concepts underpin all measures of relatedness:

Identity by Descent (IBD) vs. Identity by State (IBS)

  • Identity by Descent (IBD): Two alleles are IBD if they are copies of the exact same ancestral allele, tracing back through a known pedigree. IBD is a genealogical concept.
  • Identity by State (IBS): Two alleles are IBS if they carry the same nucleotide sequence, regardless of ancestry. Two alleles can be IBS without being IBD.
IBS vs IBS representation. Adapted from Lynch & Walsh, 1998

IBS vs IBS representation. Adapted from Lynch & Walsh, 1998

The distinction matters: pedigree-based relationship matrices rely on IBD probabilities derived from family structure, while genomic matrices (e.g., the G matrix of VanRaden 2008) estimate realized sharing from SNP marker data, which captures IBS at a genome-wide scale.

Coefficients of Identity

For any locus in two individuals X and Y, there are 15 possible identity states describing how the four alleles (two per individual) relate through IBD. These condensed coefficients of identity form the theoretical basis for all pairwise genetic relationships.

Coefficient of Coancestry

The coefficient of coancestry between individuals X and Y, denoted \(Θ_{𝑥𝑦}\) , is the probability that a randomly chosen allele from X and a randomly chosen allele from Y are identical by descent. It provides a single summary of the pairwise genetic relationship between two individuals and is directly related to the additive genetic covariance between relatives:

\[\text{Cov}(A_X, A_Y) = 2 f_{XY} \sigma^2_A\]

The coefficient of coancestry is foundational to constructing the numerator relationship matrix (A) used in BLUP-based genetic evaluations.

Coefficient of Inbreeding

The inbreeding coefficient \(F\) of an individual X is the probability that its two alleles at a locus are IBD — i.e., that it received copies of the same ancestral allele from both parents. Inbreeding reduces heterozygosity and can expose deleterious recessive alleles, with consequences for fitness and performance. Notably:

\[F = Θ_{𝑥𝑦}\]

The inbreeding coefficient of an individual equals the coefficient of coancestry between its parents.

Coefficient of Inbreeding

The Coefficient of relationship \(𝐫_{xy}\): Probability that two alleles at the same locus, one in X and one in Y, are IBD when considering both alleles per individual.

Relationship and coancestry coefficients
Pair Θ_xy r_xy
Parent–offspring 0.250 0.50
Full sibs 0.250 0.50
Half sibs 0.125 0.25

Genomic Relationship Matrices

Modern breeding programs increasingly use marker-based relationship matrices that leverage genome-wide SNP data to estimate realized relatedness, rather than expected relatedness from pedigree alone.

VanRaden (2008) Method 1

Given a matrix \(M\) of centered marker genotypes (coded as \(-1, 0, 1\)), the genomic relationship matrix is:

\[G = \frac{MM'}{2\sum p_i(1 - p_i)}\]

where \(p_i\) is the allele frequency at marker \(i\). The diagonal elements of \(G\) reflect individual heterozygosity (with values > 1 indicating above-average homozygosity relative to the base population), and off-diagonal elements measure the proportion of the genome shared between pairs of individuals.

Connecting Concepts

The following concepts form an integrated framework for understanding genetic resemblance:

Concept Description
IBD / IBS Allelic identity through descent vs. state
Coefficients of identity 15 possible pairwise allelic states
Coefficient of coancestry Probability of sharing an allele IBD
Coefficient of inbreeding Probability of being homozygous by descent
Coefficients of relationship Standardized kinship used in relationship matrices
Genetic covariance Covariance among relatives tied to heritability

The Genetic Gain vs. Diversity Trade-off

A central challenge in any breeding program can be visualized along two axes — genetic gain and genetic diversity — producing four quadrants:

IBS vs IBS representation. Adapted from Lynch & Walsh, 1998

IBS vs IBS representation. Adapted from Lynch & Walsh, 1998

The ideal zone maximizes both genetic gain and genetic diversity. Breeding programs that push too hard for gain risk eroding diversity (moving to RISKY), while programs without directional selection may maintain diversity but stagnate (STABLE). The central question of mate allocation is:

How do we balance rapid improvement with long-term sustainability in breeding programs?

Quantitative tools — including optimal contribution selection, restriction of average kinship, and structured mate allocation algorithms — provide principled answers to this question.


Practical Exercises

The following exercises use the AGHmatrix R package to compute pedigree-based and marker-based relationship matrices, and to derive inbreeding coefficients from both sources.

Setup

# Install AGHmatrix if not already available
# install.packages("AGHmatrix")
require(AGHmatrix)

Exercise 1: Pedigree-based Relationship Matrix (Pedigree I)

We begin with a simple 14-individual pedigree. Founders (individuals with no known parents) are coded with 0 for both FATHER and MOTHER.

# Define Pedigree I
ped <- data.frame(
  ID     = c(1:14),
  FATHER = c(0, 0, 0, 0, 0, 1, 3, 3,  0,  0,  4,  6,  8, 10),
  MOTHER = c(0, 0, 0, 0, 0, 2, 2, 2,  0,  0,  5,  5,  9,  9)
)

# Convert columns to factors (required by AGHmatrix)
ped$ID     <- as.factor(ped$ID)
ped$FATHER <- as.factor(ped$FATHER)
ped$MOTHER <- as.factor(ped$MOTHER)

# Compute the numerator relationship matrix (A matrix)
Aped <- Amatrix(ped, ploidy = 2)
#> Verifying conflicting data... 
#> Organizing data... 
#> Your data was chronologically organized with success. 
#> Constructing matrix A using ploidy = 2 
#> Completed! Time = 0  minutes
# View the A matrix
round(Aped, 3)
#>       1    2    3   4   5     6     7     8   9  10   11    12    13   14
#> 1  1.00 0.00 0.00 0.0 0.0 0.500 0.000 0.000 0.0 0.0 0.00 0.250 0.000 0.00
#> 2  0.00 1.00 0.00 0.0 0.0 0.500 0.500 0.500 0.0 0.0 0.00 0.250 0.250 0.00
#> 3  0.00 0.00 1.00 0.0 0.0 0.000 0.500 0.500 0.0 0.0 0.00 0.000 0.250 0.00
#> 4  0.00 0.00 0.00 1.0 0.0 0.000 0.000 0.000 0.0 0.0 0.50 0.000 0.000 0.00
#> 5  0.00 0.00 0.00 0.0 1.0 0.000 0.000 0.000 0.0 0.0 0.50 0.500 0.000 0.00
#> 6  0.50 0.50 0.00 0.0 0.0 1.000 0.250 0.250 0.0 0.0 0.00 0.500 0.125 0.00
#> 7  0.00 0.50 0.50 0.0 0.0 0.250 1.000 0.500 0.0 0.0 0.00 0.125 0.250 0.00
#> 8  0.00 0.50 0.50 0.0 0.0 0.250 0.500 1.000 0.0 0.0 0.00 0.125 0.500 0.00
#> 9  0.00 0.00 0.00 0.0 0.0 0.000 0.000 0.000 1.0 0.0 0.00 0.000 0.500 0.50
#> 10 0.00 0.00 0.00 0.0 0.0 0.000 0.000 0.000 0.0 1.0 0.00 0.000 0.000 0.50
#> 11 0.00 0.00 0.00 0.5 0.5 0.000 0.000 0.000 0.0 0.0 1.00 0.250 0.000 0.00
#> 12 0.25 0.25 0.00 0.0 0.5 0.500 0.125 0.125 0.0 0.0 0.25 1.000 0.062 0.00
#> 13 0.00 0.25 0.25 0.0 0.0 0.125 0.250 0.500 0.5 0.0 0.00 0.062 1.000 0.25
#> 14 0.00 0.00 0.00 0.0 0.0 0.000 0.000 0.000 0.5 0.5 0.00 0.000 0.250 1.00
# Mean inbreeding coefficient across the population
# F = diag(A) - 1  (for diploids)
mean_F_I <- mean(diag(Aped)) - 1
cat("Mean inbreeding coefficient (Pedigree I):", round(mean_F_I, 4), "\n")
#> Mean inbreeding coefficient (Pedigree I): 0

Interpretation: The mean inbreeding coefficient represents the average probability that individuals in this population carry two alleles that are identical by descent. Values > 0 indicate that some individuals have inbred parents.


Exercise 2: Pedigree-based Relationship Matrix (Pedigree II)

Pedigree II modifies the structure by introducing additional crosses that increase relatedness among later-generation individuals. Note that individual 14 now has individual 12 as father and individual 5 as mother, creating closer genealogical ties.

# Define Pedigree II
pedII <- data.frame(
  ID     = c(1:14),
  FATHER = c(0, 0, 0, 0, 0, 1, 3, 3,  0,  0,  4,  6,  8, 12),
  MOTHER = c(0, 0, 0, 0, 0, 2, 2, 2,  0,  0,  5,  5,  9,  5)
)

# Convert columns to factors
pedII$ID     <- as.factor(pedII$ID)
pedII$FATHER <- as.factor(pedII$FATHER)
pedII$MOTHER <- as.factor(pedII$MOTHER)

# Compute the A matrix for Pedigree II
ApedII <- Amatrix(pedII, ploidy = 2)
#> Verifying conflicting data... 
#> Organizing data... 
#> Your data was chronologically organized with success. 
#> Constructing matrix A using ploidy = 2 
#> Completed! Time = 0  minutes
# View the A matrix
round(ApedII, 3)
#>        1     2    3   4    5     6     7     8   9 10    11    12    13    14
#> 1  1.000 0.000 0.00 0.0 0.00 0.500 0.000 0.000 0.0  0 0.000 0.250 0.000 0.125
#> 2  0.000 1.000 0.00 0.0 0.00 0.500 0.500 0.500 0.0  0 0.000 0.250 0.250 0.125
#> 3  0.000 0.000 1.00 0.0 0.00 0.000 0.500 0.500 0.0  0 0.000 0.000 0.250 0.000
#> 4  0.000 0.000 0.00 1.0 0.00 0.000 0.000 0.000 0.0  0 0.500 0.000 0.000 0.000
#> 5  0.000 0.000 0.00 0.0 1.00 0.000 0.000 0.000 0.0  0 0.500 0.500 0.000 0.750
#> 6  0.500 0.500 0.00 0.0 0.00 1.000 0.250 0.250 0.0  0 0.000 0.500 0.125 0.250
#> 7  0.000 0.500 0.50 0.0 0.00 0.250 1.000 0.500 0.0  0 0.000 0.125 0.250 0.062
#> 8  0.000 0.500 0.50 0.0 0.00 0.250 0.500 1.000 0.0  0 0.000 0.125 0.500 0.062
#> 9  0.000 0.000 0.00 0.0 0.00 0.000 0.000 0.000 1.0  0 0.000 0.000 0.500 0.000
#> 10 0.000 0.000 0.00 0.0 0.00 0.000 0.000 0.000 0.0  1 0.000 0.000 0.000 0.000
#> 11 0.000 0.000 0.00 0.5 0.50 0.000 0.000 0.000 0.0  0 1.000 0.250 0.000 0.375
#> 12 0.250 0.250 0.00 0.0 0.50 0.500 0.125 0.125 0.0  0 0.250 1.000 0.062 0.750
#> 13 0.000 0.250 0.25 0.0 0.00 0.125 0.250 0.500 0.5  0 0.000 0.062 1.000 0.031
#> 14 0.125 0.125 0.00 0.0 0.75 0.250 0.062 0.062 0.0  0 0.375 0.750 0.031 1.250
# Mean inbreeding coefficient for Pedigree II
mean_F_II <- mean(diag(ApedII)) - 1
cat("Mean inbreeding coefficient (Pedigree II):", round(mean_F_II, 4), "\n")
#> Mean inbreeding coefficient (Pedigree II): 0.0179

Question: How does the change in mating structure between Pedigree I and Pedigree II affect the average inbreeding level? Which crosses are responsible for the difference?


Exercise 3: Marker-based Relationship Matrix (G Matrix)

We now construct a genomic relationship matrix (G matrix) from simulated SNP data, following VanRaden (2008) Method 1, as implemented in AGHmatrix.

# Simulate a marker matrix: 12 individuals x 10 SNP loci
# Coded as -1 (aa), 0 (Aa), 1 (AA)
set.seed(5632)
M <- matrix(sample(c(-1, 0, 1), 120, replace = TRUE), ncol = 10)

# Inspect raw marker matrix
cat("Marker matrix M (rows = individuals, columns = SNPs):\n")
#> Marker matrix M (rows = individuals, columns = SNPs):
print(M)
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]    1    1    1   -1   -1    1   -1    0    0     0
#>  [2,]    1    1    1    1   -1   -1    0    0   -1     0
#>  [3,]    1    1    0   -1   -1   -1    0    0    0    -1
#>  [4,]    1   -1   -1    1    1    1    1    1    0     1
#>  [5,]   -1   -1   -1    0    1    0   -1    1   -1     0
#>  [6,]   -1    1    0    0    1    1   -1    1    1     1
#>  [7,]    1    0    0   -1   -1    0    0   -1    0     0
#>  [8,]   -1    0    1    1    0   -1    1    1    1     0
#>  [9,]   -1   -1    0    1   -1    0    0    0    0    -1
#> [10,]    1    1   -1    0    1    1    1    0   -1     0
#> [11,]    0    0    1    0    0    0    1    1    1    -1
#> [12,]    1    0    0    1    1    0   -1    1    1     1
# Compute the cross-product MM'
MtM <- M %*% t(M)

cat("\nCross-product MM':\n")
#> 
#> Cross-product MM':
print(MtM)
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#>  [1,]    7    2    3   -3   -3    1    3   -3   -2     0     0     0
#>  [2,]    2    7    3   -2   -3   -3    1    1    0     0     0     0
#>  [3,]    3    3    6   -4   -3   -3    3   -1   -1     0     1    -2
#>  [4,]   -3   -2   -4    9    2    1   -2    0   -1     4     0     4
#>  [5,]   -3   -3   -3    2    7    2   -3   -1    1     0    -2     1
#>  [6,]    1   -3   -3    1    2    8   -3    1   -2     0     0     4
#>  [7,]    3    1    3   -2   -3   -3    4   -3   -1     0    -1    -2
#>  [8,]   -3    1   -1    0   -1    1   -3    7    2    -3     4     1
#>  [9,]   -2    0   -1   -1    1   -2   -1    2    5    -3     1    -2
#> [10,]    0    0    0    4    0    0    0   -3   -3     7    -1     0
#> [11,]    0    0    1    0   -2    0   -1    4    1    -1     5     0
#> [12,]    0    0   -2    4    1    4   -2    1   -2     0     0     7
cat("\nNote: diagonal = sum of squared marker scores per individual;\n",
    "off-diagonal = shared marker signal between pairs.\n")
#> 
#> Note: diagonal = sum of squared marker scores per individual;
#>  off-diagonal = shared marker signal between pairs.
# Recode M to 0, 1, 2 scale (required by AGHmatrix::Gmatrix)
M1 <- M + 1

# Compute G matrix using VanRaden (2008) Method 1
GMat <- AGHmatrix::Gmatrix(M1, ploidy = 2)
#> Initial data: 
#>  Number of Individuals: 12 
#>  Number of Markers: 10 
#> 
#> Missing data check: 
#>  Total SNPs: 10 
#>   0 SNPs dropped due to missing data threshold of 0.5 
#>  Total of: 10  SNPs 
#> 
#> MAF check: 
#>  No SNPs with MAF below 0 
#> 
#> Heterozigosity data check: 
#>  No SNPs with heterozygosity, missing threshold of =  0 
#> 
#> Summary check: 
#>  Initial:  10 SNPs 
#>  Final:  10  SNPs ( 0  SNPs removed) 
#>  
#> Completed! Time = 0.001  seconds
cat("Genomic Relationship Matrix (G):\n")
#> Genomic Relationship Matrix (G):
round(GMat, 2)
#>        [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
#>  [1,]  1.34  0.29  0.56 -0.78 -0.61  0.08  0.67 -0.73 -0.38 -0.09 -0.14 -0.21
#>  [2,]  0.29  1.30  0.55 -0.59 -0.62 -0.76  0.24  0.08  0.01 -0.11 -0.16 -0.23
#>  [3,]  0.56  0.55  1.23 -0.93 -0.55 -0.69  0.72 -0.26 -0.12 -0.04  0.12 -0.57
#>  [4,] -0.78 -0.59 -0.93  1.65  0.37  0.03 -0.42 -0.16 -0.23  0.68 -0.19  0.56
#>  [5,] -0.61 -0.62 -0.55  0.37  1.58  0.41 -0.45 -0.19  0.36  0.03 -0.43  0.12
#>  [6,]  0.08 -0.76 -0.69  0.03  0.41  1.51 -0.59  0.08 -0.40 -0.11 -0.16  0.60
#>  [7,]  0.67  0.24  0.72 -0.42 -0.45 -0.59  1.03 -0.57 -0.02  0.06 -0.19 -0.47
#>  [8,] -0.73  0.08 -0.26 -0.16 -0.19  0.08 -0.57  1.34  0.44 -0.71  0.68  0.00
#>  [9,] -0.38  0.01 -0.12 -0.23  0.36 -0.40 -0.02  0.44  1.20 -0.57  0.20 -0.49
#> [10,] -0.09 -0.11 -0.04  0.68  0.03 -0.11  0.06 -0.71 -0.57  1.37 -0.33 -0.19
#> [11,] -0.14 -0.16  0.12 -0.19 -0.43 -0.16 -0.19  0.68  0.20 -0.33  0.86 -0.25
#> [12,] -0.21 -0.23 -0.57  0.56  0.12  0.60 -0.47  0.00 -0.49 -0.19 -0.25  1.13
# Genomic inbreeding coefficients
# F_genomic = diag(G) - 1
F_genomic <- diag(GMat) - 1
cat("Genomic inbreeding coefficients per individual:\n")
#> Genomic inbreeding coefficients per individual:
print(round(F_genomic, 4))
#>  [1]  0.3376  0.3032  0.2344  0.6473  0.5785  0.5097  0.0280  0.3376  0.2000
#> [10]  0.3720 -0.1441  0.1312
cat("\nMean genomic inbreeding:", round(mean(F_genomic), 4), "\n")
#> 
#> Mean genomic inbreeding: 0.2946

Interpretation: Diagonal values of G greater than 1 indicate individuals that are more homozygous than the population average (positive inbreeding), while values less than 1 indicate above-average heterozygosity. Off-diagonal elements measure realized genomic relationships between pairs of individuals.


Summary Comparison

# Summary of mean inbreeding across methods
summary_df <- data.frame(
  Method       = c("Pedigree I (A matrix)", "Pedigree II (A matrix)", "Simulated SNPs (G matrix)"),
  Mean_F       = round(c(mean_F_I, mean_F_II, mean(F_genomic)), 4),
  Source       = c("Pedigree", "Pedigree", "Genomic markers")
)

knitr::kable(summary_df,
             col.names = c("Method", "Mean Inbreeding (F)", "Data Source"),
             caption = "Summary of inbreeding estimates across methods and pedigree structures.")
Summary of inbreeding estimates across methods and pedigree structures.
Method Mean Inbreeding (F) Data Source
Pedigree I (A matrix) 0.0000 Pedigree
Pedigree II (A matrix) 0.0179 Pedigree
Simulated SNPs (G matrix) 0.2946 Genomic markers


References


Course: Survey Tools in Breeding and Methods — University of Florida, Gainesville, 2026


  1. University of Florida, ↩︎