FA_CovStructures.knit

Introduction

In this practice, we will show how to implement models accounting for several traits (estimate variance components and breeding values). We will cover Diagonal, Unstructured, and Factor analytic structures for modelling genetic and residuals effects, all in a Bayesian framework.

To perform the analyses, we will need the following packages:

## Carregando pacotes exigidos: AGHmatrix

## Carregando pacotes exigidos: BGLR

## Carregando pacotes exigidos: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Carregando pacotes exigidos: tidyverse

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.2     ✔ purrr   1.0.1
## ✔ tibble  3.2.1     ✔ stringr 1.5.0
## ✔ tidyr   1.3.0     ✔ forcats 0.5.2
## ✔ readr   2.1.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Dataset

This dataset was simulated using the coalescent theory implemented in \(AlphasimR\) package (Gaynor et al., 2021). This dataset mimic an evaluation of 500 maize genotypes, in four locations, and four traits were measured. Also, a set of 3000 single nucleotide polymorphism (SNPs) were randomly sampled through the 10 pair of chromosomes.

Phenotypic data

We generated BLUEs (best linear unbiased estimation) for each trait, and it can be loaded from Phenotypes.csv.

# Loading data
Phenotypes = read.csv("Phenotypes.csv")

# As factor
Phenotypes$Genotype = Phenotypes$Genotype %>% as.factor
Phenotypes$Env = Phenotypes$Env %>% as.factor # Only one

# Only Environment 1
data = Phenotypes[Phenotypes$Env == 1,] 
head(data)

##   Env Genotype   Trait1   Trait2   Trait3   Trait4
## 1   1    26662 179.5075 32.61931 63.41136 14.89854
## 2   1    30678 161.2597 33.97275 80.52581 19.19048
## 3   1    30129 164.2701 35.10956 60.14341 27.24498
## 4   1    28907 156.8915 34.03063 72.26622 23.37594
## 5   1    30522 159.7874 37.05115 64.53436 11.41274
## 6   1    26995 165.1254 36.30135 72.99611 15.52279

Genomic data

The genomic data (3000 SNP markers) was coded as 0,1,2 for aa,Aa,AA, respectively. The dataset for all 500 genotypes can be loaded from Genotypes.csv.

# SNPs
Genotypes = as.matrix(read.csv("Genotypes.csv"))

# Genotypes names
rownames(Genotypes)= unique(Phenotypes$Genotype)

Creating G Matrix

Here, we will compose the additive relationship matrix (G matrix) following the propositions made by VanRaden (2008).

G.Mat = AGHmatrix::Gmatrix(Genotypes)

## Initial data: 
##  Number of Individuals: 500 
##  Number of Markers: 3000 
## 
## Missing data check: 
##  Total SNPs: 3000 
##   0 SNPs dropped due to missing data threshold of 1 
##  Total of: 3000  SNPs 
## MAF check: 
##  No SNPs with MAF below 0 
## Monomorphic check: 
##  No monomorphic SNPs 
## Summary check: 
##  Initial:  3000 SNPs 
##  Final:  3000  SNPs ( 0  SNPs removed) 
##  
## Completed! Time = 2.31  seconds

Multitrait Model

Statistical model

The GBLUP model in a multitrait framework is given as follows:

\[ y = 1u + Za + e \]

where \(y\) is the matrix of BLUEs data for the target traits; \(u\) represents the mean (fixed); \(a\) is the matrix of genotype effects (assumed to be random), where \(a \sim N(0,\sum_a\otimes G)\), \(G\) is a matrix that describes the genomic similarities between pair of genotypes and \(\sum_a\) represents the genetic covariance matrix; and \(e\) is the residuals (random), where \(e\sim N(0,\sum_{res}\otimes I)\), \(\sum_{res}\) is the residual covariance matrix. The letter \(Z\) refers to the incidence matrix for \(a\). Here, we use the ‘Multitrait()’ function from BGLR package for fitting the model (Pérez-Rodrı́guez & Los Campos, 2022)

Model 1 - DIAG-US

For this model we will consider a DIAG matrix for the genetic effects and an US covariance matrix for the residuals.

# Linear predictor - specify the genetic matrix
ETA <- list(A=list(K=G.Mat, model="RKHS", Cov = list(type = 'DIAG')))

# Only Y
Y = as.matrix(data[,-c(1:2)])

# Model
fm.DU <- Multitrait(y = Y,
                    ETA = ETA,
                    resCov = list(df0 = 5, S0 = NULL, type = "UN"), #default
                    nIter = 1200, burnIn = 120, thin = 5, saveAt = 'ex1_',
                    verbose = FALSE)

## Checking variance co-variance matrix K  for linear term 1

## Ok

## Setting linear term 1

## MSx=1.49553739794528

## DIAGonal covariance matrix

## df0 set to  5 for all the traits

## S0 was set to:

##    Trait1    Trait2    Trait3    Trait4 
## 176.57614  40.36525 125.87882  68.38115

## Initializing resCov

## Setting hyperparameters for UNstructured R

## S0 was set to

##             Trait1      Trait2     Trait3     Trait4
## Trait1 377.2517494  -0.1556996  23.355343   2.222493
## Trait2  -0.1556996  86.2396311 -20.163648   1.783437
## Trait3  23.3553430 -20.1636485 268.937827  -1.551458
## Trait4   2.2224932   1.7834367  -1.551458 146.095096

## Done

Retrieving estimates

# Genetic covariance matrix
fm.DU$ETA$A$Cov$Omega

##          [,1]     [,2]     [,3]     [,4]
## [1,] 23.72315 0.000000  0.00000 0.000000
## [2,]  0.00000 7.561147  0.00000 0.000000
## [3,]  0.00000 0.000000 26.11251 0.000000
## [4,]  0.00000 0.000000  0.00000 7.344573

# Residual covariance matrix
fm.DU$resCov$R

##             Trait1     Trait2     Trait3      Trait4
## Trait1 43.81727609 -1.3071768  1.0578344 -0.01362685
## Trait2 -1.30717682  6.1718599 -0.5155850  0.27952859
## Trait3  1.05783437 -0.5155850 11.3597791  0.66463594
## Trait4 -0.01362685  0.2795286  0.6646359 19.55131259

# Phenotypic variation
fm.DU$ETA$A$Cov$Omega + fm.DU$resCov$R

##             Trait1     Trait2     Trait3      Trait4
## Trait1 67.54042583 -1.3071768  1.0578344 -0.01362685
## Trait2 -1.30717682 13.7330069 -0.5155850  0.27952859
## Trait3  1.05783437 -0.5155850 37.4722845  0.66463594
## Trait4 -0.01362685  0.2795286  0.6646359 26.89588552

Model 2 - US-DIAG

For this model we will consider a US matrix for the genetic effects and an DIAG covariance matrix for the residuals.

# Linear predictor - specify the genetic matrix
ETA <- list(A=list(K=G.Mat, model="RKHS", Cov = list(type = 'UN')))

# Phenotypic data
Y = as.matrix(data[,-c(1:2)])


# Model
fm.UD <- Multitrait(y = Y,
                    ETA = ETA,
                    resCov = list(type = "DIAG"),
                    nIter = 1200, burnIn = 120, thin = 5, saveAt = 'ex2_',
                    verbose = FALSE)

## Checking variance co-variance matrix K  for linear term 1

## Ok

## Setting linear term 1

## MSx=1.49553739794528

## UNstructured covariance matrix

## df0 was set to 5

## S0 set to

##             Trait1      Trait2     Trait3    Trait4
## Trait1 252.2516321  -0.1041094  15.616689  1.486083
## Trait2  -0.1041094  57.6646436 -13.482544  1.192506
## Trait3  15.6166894 -13.4825438 179.826882 -1.037392
## Trait4   1.4860834   1.1925056  -1.037392 97.687357

## Initializing resCov

## Setting hyperparameters for DIAG R

## df0 set to  5 for all the traits

## S0 was set to

##    Trait1    Trait2    Trait3    Trait4 
## 264.07622  60.36774 188.25648 102.26657

## Done

Retrieving estimates

# Genetic covariance matrix
fm.UD$ETA$A$Cov$Omega

##            Trait1     Trait2     Trait3    Trait4
## Trait1 22.7453008  0.4777540  1.6834449 0.4312543
## Trait2  0.4777540  7.4759243 -2.1954825 0.1020759
## Trait3  1.6834449 -2.1954825 26.3616513 0.1770074
## Trait4  0.4312543  0.1020759  0.1770074 7.6961548

# Residual covariance matrix
fm.UD$resCov$R

##         [,1]     [,2]     [,3]     [,4]
## [1,] 44.3539 0.000000  0.00000  0.00000
## [2,]  0.0000 6.039915  0.00000  0.00000
## [3,]  0.0000 0.000000 10.55519  0.00000
## [4,]  0.0000 0.000000  0.00000 19.35977

# Phenotypic variation
fm.UD$resCov$R + fm.UD$ETA$A$Cov$Omega

##            Trait1     Trait2     Trait3     Trait4
## Trait1 67.0992045  0.4777540  1.6834449  0.4312543
## Trait2  0.4777540 13.5158396 -2.1954825  0.1020759
## Trait3  1.6834449 -2.1954825 36.9168395  0.1770074
## Trait4  0.4312543  0.1020759  0.1770074 27.0559283

Model 3 - US-US

For this model we will consider a US matrix for the genetic effects and an US covariance matrix for the residuals.

# Linear predictor - specify the genetic matrix
ETA <- list(A=list(K=G.Mat, model="RKHS", Cov = list(type = 'UN')))

# Phenotypic data
Y = as.matrix(data[,-c(1:2)])

# Model
fmU <- Multitrait(y = Y,
                  ETA = ETA,
                  resCov = list(type = "UN"), #default
                  nIter = 1200, burnIn = 120, thin = 5, saveAt = 'ex3_',
                  verbose = FALSE)

## Checking variance co-variance matrix K  for linear term 1

## Ok

## Setting linear term 1

## MSx=1.49553739794528

## UNstructured covariance matrix

## df0 was set to 5

## S0 set to

##             Trait1      Trait2     Trait3    Trait4
## Trait1 252.2516321  -0.1041094  15.616689  1.486083
## Trait2  -0.1041094  57.6646436 -13.482544  1.192506
## Trait3  15.6166894 -13.4825438 179.826882 -1.037392
## Trait4   1.4860834   1.1925056  -1.037392 97.687357

## Initializing resCov

## Setting hyperparameters for UNstructured R

## df0 was set to 5

## S0 was set to

##             Trait1      Trait2     Trait3     Trait4
## Trait1 377.2517494  -0.1556996  23.355343   2.222493
## Trait2  -0.1556996  86.2396311 -20.163648   1.783437
## Trait3  23.3553430 -20.1636485 268.937827  -1.551458
## Trait4   2.2224932   1.7834367  -1.551458 146.095096

## Done

Retrieving some estimates

# Genetic covariance matrix
fmU$ETA$A$Cov$Omega

##             Trait1       Trait2     Trait3       Trait4
## Trait1 23.16447799  1.539124576  1.6007934 -0.089182710
## Trait2  1.53912458  7.460363154 -2.5709971 -0.009351724
## Trait3  1.60079344 -2.570997139 25.5385417 -0.467832718
## Trait4 -0.08918271 -0.009351724 -0.4678327  7.418943506

# Residual covariance matrix
fmU$resCov$R

##            Trait1     Trait2     Trait3     Trait4
## Trait1 44.6376718 -2.1972030  0.3726908  0.1265974
## Trait2 -2.1972030  6.2591006  0.3790549  0.2474159
## Trait3  0.3726908  0.3790549 11.5912207  0.9028863
## Trait4  0.1265974  0.2474159  0.9028863 19.6760567

# Phenotypic variation
fmU$ETA$A$Cov$Omega +  fmU$resCov$R

##             Trait1     Trait2     Trait3      Trait4
## Trait1 67.80214980 -0.6580784  1.9734842  0.03741465
## Trait2 -0.65807840 13.7194638 -2.1919422  0.23806416
## Trait3  1.97348422 -2.1919422 37.1297624  0.43505362
## Trait4  0.03741465  0.2380642  0.4350536 27.09500019

Comparision

# Genetic covariance matrix
fmU$ETA$A$Cov$Omega

##             Trait1       Trait2     Trait3       Trait4
## Trait1 23.16447799  1.539124576  1.6007934 -0.089182710
## Trait2  1.53912458  7.460363154 -2.5709971 -0.009351724
## Trait3  1.60079344 -2.570997139 25.5385417 -0.467832718
## Trait4 -0.08918271 -0.009351724 -0.4678327  7.418943506

fm.UD$ETA$A$Cov$Omega

##            Trait1     Trait2     Trait3    Trait4
## Trait1 22.7453008  0.4777540  1.6834449 0.4312543
## Trait2  0.4777540  7.4759243 -2.1954825 0.1020759
## Trait3  1.6834449 -2.1954825 26.3616513 0.1770074
## Trait4  0.4312543  0.1020759  0.1770074 7.6961548

fm.DU$ETA$A$Cov$Omega

##          [,1]     [,2]     [,3]     [,4]
## [1,] 23.72315 0.000000  0.00000 0.000000
## [2,]  0.00000 7.561147  0.00000 0.000000
## [3,]  0.00000 0.000000 26.11251 0.000000
## [4,]  0.00000 0.000000  0.00000 7.344573

# Residual covariance matrix
fmU$resCov$R

##            Trait1     Trait2     Trait3     Trait4
## Trait1 44.6376718 -2.1972030  0.3726908  0.1265974
## Trait2 -2.1972030  6.2591006  0.3790549  0.2474159
## Trait3  0.3726908  0.3790549 11.5912207  0.9028863
## Trait4  0.1265974  0.2474159  0.9028863 19.6760567

fm.UD$resCov$R

##         [,1]     [,2]     [,3]     [,4]
## [1,] 44.3539 0.000000  0.00000  0.00000
## [2,]  0.0000 6.039915  0.00000  0.00000
## [3,]  0.0000 0.000000 10.55519  0.00000
## [4,]  0.0000 0.000000  0.00000 19.35977

fm.DU$resCov$R

##             Trait1     Trait2     Trait3      Trait4
## Trait1 43.81727609 -1.3071768  1.0578344 -0.01362685
## Trait2 -1.30717682  6.1718599 -0.5155850  0.27952859
## Trait3  1.05783437 -0.5155850 11.3597791  0.66463594
## Trait4 -0.01362685  0.2795286  0.6646359 19.55131259

# Phenotypic variation
fmU$resCov$R + fmU$ETA$A$Cov$Omega

##             Trait1     Trait2     Trait3      Trait4
## Trait1 67.80214980 -0.6580784  1.9734842  0.03741465
## Trait2 -0.65807840 13.7194638 -2.1919422  0.23806416
## Trait3  1.97348422 -2.1919422 37.1297624  0.43505362
## Trait4  0.03741465  0.2380642  0.4350536 27.09500019

fm.UD$resCov$R + fm.UD$ETA$A$Cov$Omega

##            Trait1     Trait2     Trait3     Trait4
## Trait1 67.0992045  0.4777540  1.6834449  0.4312543
## Trait2  0.4777540 13.5158396 -2.1954825  0.1020759
## Trait3  1.6834449 -2.1954825 36.9168395  0.1770074
## Trait4  0.4312543  0.1020759  0.1770074 27.0559283

fm.DU$resCov$R + fm.DU$ETA$A$Cov$Omega

##             Trait1     Trait2     Trait3      Trait4
## Trait1 67.54042583 -1.3071768  1.0578344 -0.01362685
## Trait2 -1.30717682 13.7330069 -0.5155850  0.27952859
## Trait3  1.05783437 -0.5155850 37.4722845  0.66463594
## Trait4 -0.01362685  0.2795286  0.6646359 26.89588552

Model 4 - Multi-trait

We will implement a model using the information of the same trait assessed in several environments.

# Only Trait 2
data = Phenotypes[,c(1, 2, 4)] %>%
    pivot_wider(names_from = 'Env', values_from = 'Trait2')

head(data)

## # A tibble: 6 × 5
##   Genotype   `1`   `2`   `3`   `4`
##   <fct>    <dbl> <dbl> <dbl> <dbl>
## 1 26662     32.6  22.7  22.4  11.5
## 2 30678     34.0  27.1  30.0  13.8
## 3 30129     35.1  28.9  28.3  13.4
## 4 28907     34.0  30.0  31.1  16.0
## 5 30522     37.1  28.5  36.9  17.7
## 6 26995     36.3  30.1  28.3  16.1

For this model we will consider a US matrix for the genetic effects and an US covariance matrix for the residuals.

# Linear predictor - specify the genetic matrix
ETA <- list(A=list(K=G.Mat, model="RKHS", Cov = list(type = 'UN')))

# Phenotypic data
Y = as.matrix(data[,-1])

# Model
fmU2 <- Multitrait(y = Y,
                  ETA = ETA,
                  resCov = list(type = "UN"), #default
                  nIter = 1200, burnIn = 120, thin = 5, saveAt = 'ex4_',
                  verbose = FALSE)

## Checking variance co-variance matrix K  for linear term 1

## Ok

## Setting linear term 1

## MSx=1.49553739794528

## UNstructured covariance matrix

## df0 was set to 5

## S0 set to

##          1        2        3        4
## 1 57.66464 33.63711 40.83856 36.97746
## 2 33.63711 54.83955 36.68483 34.02574
## 3 40.83856 36.68483 63.36140 37.29528
## 4 36.97746 34.02574 37.29528 58.45911

## Initializing resCov

## Setting hyperparameters for UNstructured R

## df0 was set to 5

## S0 was set to

##          1        2        3        4
## 1 86.23963 50.30556 61.07559 55.30118
## 2 50.30556 82.01460 54.86354 50.88677
## 3 61.07559 54.86354 94.75934 55.77648
## 4 55.30118 50.88677 55.77648 87.42778

## Done

Retrieving some estimates

# Genetic covariance matrix
fmU2$ETA$A$Cov$Omega

##          1        2        3        4
## 1 7.075688 5.565782 6.616675 5.692943
## 2 5.565782 6.222793 6.057264 5.345246
## 3 6.616675 6.057264 8.067617 6.163346
## 4 5.692943 5.345246 6.163346 6.388281

# Residual covariance matrix
fmU2$resCov$R

##           1         2        3        4
## 1 6.3186681 0.7686159 1.531315 1.701129
## 2 0.7686159 6.8128593 1.141632 1.183247
## 3 1.5313148 1.1416319 7.011619 1.113291
## 4 1.7011294 1.1832469 1.113291 7.829411

# Phenotypic variation
fmU2$ETA$A$Cov$Omega +  fmU2$resCov$R

##           1         2         3         4
## 1 13.394356  6.334397  8.147990  7.394073
## 2  6.334397 13.035653  7.198896  6.528493
## 3  8.147990  7.198896 15.079237  7.276637
## 4  7.394073  6.528493  7.276637 14.217692

Model 5 - Factor Analytic

For this model we will consider a FA matrix for the genetic effects and an DIAG covariance matrix for the residuals. To see more details, please check Smith et al. (2019). The variance components via FA structure can be solved via the following expression:

\[ \Omega = WW' + \Psi \] where W is the environmental loading and \(\Psi\) is a diagonal matrix with the specific variance for each environment.

# Creating an FA structure before entering the model
M <- matrix(nrow = 4, ncol = 1, TRUE)
CovFA<-list(type="FA",M=M) # M represents the number of environments/traits

# Linear predictor
ETA <- list(A=list(K = G.Mat, model='RKHS', Cov = CovFA))

# Phenotypic data
Y = as.matrix(data[,-1])


# Model
fmF <- Multitrait(y = Y,
                  ETA = ETA,
                  resCov = list(type='UN'),
                  nIter = 1200, burnIn = 200, thin = 5, saveAt='ex5_',
                  verbose = FALSE)

## Checking variance co-variance matrix K  for linear term 1

## Ok

## Setting linear term 1

## MSx=1.49553739794528

## FA covariance matrix

## df0 set to 5 for all the traits

## S0 was set to:

##        1        2        3        4 
## 40.36525 38.38769 44.35298 40.92138

## var was set to 100

## Initializing resCov

## Setting hyperparameters for UNstructured R

## df0 was set to 5

## S0 was set to

##          1        2        3        4
## 1 86.23963 50.30556 61.07559 55.30118
## 2 50.30556 82.01460 54.86354 50.88677
## 3 61.07559 54.86354 94.75934 55.77648
## 4 55.30118 50.88677 55.77648 87.42778

## Done

Retrieving some estimates

# Genetic covariance matrix
fmF$ETA$A$Cov$Omega    # Posterior mean of BB’+Psi

##          [,1]     [,2]     [,3]     [,4]
## [1,] 7.686612 5.650285 6.427334 5.628976
## [2,] 5.650285 7.035436 6.091097 5.329216
## [3,] 6.427334 6.091097 8.712215 6.064655
## [4,] 5.628976 5.329216 6.064655 6.944147

# Estimated loadings
(W <- fmF$ETA$A$Cov$W)

##          [,1]
## [1,] 2.438731
## [2,] 2.311333
## [3,] 2.630492
## [4,] 2.300876

# Estimated variance of trait-specific factors (Psi)
(PSI <- fmF$ETA$A$Cov$PSI)

## [1] 1.697132 1.662461 1.760868 1.614950

W%*%t(W) + diag(PSI)

##          [,1]     [,2]     [,3]     [,4]
## [1,] 7.644540 5.636718 6.415063 5.611217
## [2,] 5.636718 7.004720 6.079943 5.318090
## [3,] 6.415063 6.079943 8.680357 6.052436
## [4,] 5.611217 5.318090 6.052436 6.908980

# Residual matrix 
fmF$resCov$R

##           1         2        3        4
## 1 6.0726749 0.7403339 1.676004 1.790833
## 2 0.7403339 6.4031317 1.129594 1.156120
## 3 1.6760042 1.1295940 6.765991 1.244392
## 4 1.7908329 1.1561203 1.244392 7.629460

#fmF$ETA[[1]]$Cov

References

Gaynor, R. C., Gorjanc, G., & Hickey, J. M. (2021). AlphaSimR: An r package for breeding program simulations. G3, 11(2), jkaa017.

Pérez-Rodrı́guez, P., & Los Campos, G. de. (2022). Multitrait bayesian shrinkage and variable selection models with the BGLR-r package. Genetics, 222(1), iyac112.

Smith, A. B., Borg, L. M., Gogel, B. J., & Cullis, B. R. (2019). Estimation of factor analytic mixed models for the analysis of multi-treatment multi-environment trial data. Journal of Agricultural, Biological and Environmental Statistics, 24, 573–588.

VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science, 91(11), 4414–4423.

University of Florida, jhernandezjarqui@ufl.edu ↩︎
University of Florida, mresende@ufl.edu ↩︎
University of Florida, deamorimpeixotom@ufl.edu ↩︎

Bayesian Factor Analytic and Other Covariance Structures

Diego Jarquin, professor¹

Marcio Resende, professor²

Marco Antonio Peixoto, Post doc³

Introduction

Dataset

Multitrait Model

Model 1 - DIAG-US

Model 2 - US-DIAG

Model 3 - US-US

Model 4 - Multi-trait

Model 5 - Factor Analytic

References

Bayesian Factor Analytic and Other Covariance Structures

Diego Jarquin, professor1

Marcio Resende, professor2

Marco Antonio Peixoto, Post doc3

Introduction

Dataset

Multitrait Model

Model 1 - DIAG-US

Model 2 - US-DIAG

Model 3 - US-US

Model 4 - Multi-trait

Model 5 - Factor Analytic

References

Diego Jarquin, professor¹

Marcio Resende, professor²

Marco Antonio Peixoto, Post doc³