Bayesian Factor Analytic and Other Covariance Structures
In this practice, we will show how to implement models accounting for several traits (estimate variance components and breeding values). We will cover Diagonal, Unstructured, and Factor analytic structures for modelling genetic and residuals effects, all in a Bayesian framework.
To perform the analyses, we will need the following packages:
## Carregando pacotes exigidos: AGHmatrix
## Carregando pacotes exigidos: BGLR
## Carregando pacotes exigidos: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Carregando pacotes exigidos: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.2 ✔ purrr 1.0.1
## ✔ tibble 3.2.1 ✔ stringr 1.5.0
## ✔ tidyr 1.3.0 ✔ forcats 0.5.2
## ✔ readr 2.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
This dataset was simulated using the coalescent theory implemented in \(AlphasimR\) package (Gaynor et al., 2021). This dataset mimic an evaluation of 500 maize genotypes, in four locations, and four traits were measured. Also, a set of 3000 single nucleotide polymorphism (SNPs) were randomly sampled through the 10 pair of chromosomes.
Phenotypic data
We generated BLUEs (best linear unbiased estimation) for each trait, and it can be loaded from Phenotypes.csv.
# Loading data
= read.csv("Phenotypes.csv")
Phenotypes
# As factor
$Genotype = Phenotypes$Genotype %>% as.factor
Phenotypes$Env = Phenotypes$Env %>% as.factor # Only one
Phenotypes
# Only Environment 1
= Phenotypes[Phenotypes$Env == 1,]
data head(data)
## Env Genotype Trait1 Trait2 Trait3 Trait4
## 1 1 26662 179.5075 32.61931 63.41136 14.89854
## 2 1 30678 161.2597 33.97275 80.52581 19.19048
## 3 1 30129 164.2701 35.10956 60.14341 27.24498
## 4 1 28907 156.8915 34.03063 72.26622 23.37594
## 5 1 30522 159.7874 37.05115 64.53436 11.41274
## 6 1 26995 165.1254 36.30135 72.99611 15.52279
Genomic data
The genomic data (3000 SNP markers) was coded as 0,1,2 for aa,Aa,AA, respectively. The dataset for all 500 genotypes can be loaded from Genotypes.csv.
# SNPs
= as.matrix(read.csv("Genotypes.csv"))
Genotypes
# Genotypes names
rownames(Genotypes)= unique(Phenotypes$Genotype)
Creating G Matrix
Here, we will compose the additive relationship matrix (G matrix) following the propositions made by VanRaden (2008).
= AGHmatrix::Gmatrix(Genotypes) G.Mat
## Initial data:
## Number of Individuals: 500
## Number of Markers: 3000
##
## Missing data check:
## Total SNPs: 3000
## 0 SNPs dropped due to missing data threshold of 1
## Total of: 3000 SNPs
## MAF check:
## No SNPs with MAF below 0
## Monomorphic check:
## No monomorphic SNPs
## Summary check:
## Initial: 3000 SNPs
## Final: 3000 SNPs ( 0 SNPs removed)
##
## Completed! Time = 2.31 seconds
Statistical model
The GBLUP model in a multitrait framework is given as follows:
\[ y = 1u + Za + e \]
where \(y\) is the matrix of BLUEs data for the target traits; \(u\) represents the mean (fixed); \(a\) is the matrix of genotype effects (assumed to be random), where \(a \sim N(0,\sum_a\otimes G)\), \(G\) is a matrix that describes the genomic similarities between pair of genotypes and \(\sum_a\) represents the genetic covariance matrix; and \(e\) is the residuals (random), where \(e\sim N(0,\sum_{res}\otimes I)\), \(\sum_{res}\) is the residual covariance matrix. The letter \(Z\) refers to the incidence matrix for \(a\). Here, we use the ‘Multitrait()’ function from BGLR package for fitting the model (Pérez-Rodrı́guez & Los Campos, 2022)
For this model we will consider a DIAG matrix for the genetic effects and an US covariance matrix for the residuals.
# Linear predictor - specify the genetic matrix
<- list(A=list(K=G.Mat, model="RKHS", Cov = list(type = 'DIAG')))
ETA
# Only Y
= as.matrix(data[,-c(1:2)])
Y
# Model
<- Multitrait(y = Y,
fm.DU ETA = ETA,
resCov = list(df0 = 5, S0 = NULL, type = "UN"), #default
nIter = 1200, burnIn = 120, thin = 5, saveAt = 'ex1_',
verbose = FALSE)
## Checking variance co-variance matrix K for linear term 1
## Ok
## Setting linear term 1
## MSx=1.49553739794528
## DIAGonal covariance matrix
## df0 set to 5 for all the traits
## S0 was set to:
## Trait1 Trait2 Trait3 Trait4
## 176.57614 40.36525 125.87882 68.38115
## Initializing resCov
## Setting hyperparameters for UNstructured R
## S0 was set to
## Trait1 Trait2 Trait3 Trait4
## Trait1 377.2517494 -0.1556996 23.355343 2.222493
## Trait2 -0.1556996 86.2396311 -20.163648 1.783437
## Trait3 23.3553430 -20.1636485 268.937827 -1.551458
## Trait4 2.2224932 1.7834367 -1.551458 146.095096
## Done
Retrieving estimates
# Genetic covariance matrix
$ETA$A$Cov$Omega fm.DU
## [,1] [,2] [,3] [,4]
## [1,] 23.72315 0.000000 0.00000 0.000000
## [2,] 0.00000 7.561147 0.00000 0.000000
## [3,] 0.00000 0.000000 26.11251 0.000000
## [4,] 0.00000 0.000000 0.00000 7.344573
# Residual covariance matrix
$resCov$R fm.DU
## Trait1 Trait2 Trait3 Trait4
## Trait1 43.81727609 -1.3071768 1.0578344 -0.01362685
## Trait2 -1.30717682 6.1718599 -0.5155850 0.27952859
## Trait3 1.05783437 -0.5155850 11.3597791 0.66463594
## Trait4 -0.01362685 0.2795286 0.6646359 19.55131259
# Phenotypic variation
$ETA$A$Cov$Omega + fm.DU$resCov$R fm.DU
## Trait1 Trait2 Trait3 Trait4
## Trait1 67.54042583 -1.3071768 1.0578344 -0.01362685
## Trait2 -1.30717682 13.7330069 -0.5155850 0.27952859
## Trait3 1.05783437 -0.5155850 37.4722845 0.66463594
## Trait4 -0.01362685 0.2795286 0.6646359 26.89588552
For this model we will consider a US matrix for the genetic effects and an DIAG covariance matrix for the residuals.
# Linear predictor - specify the genetic matrix
<- list(A=list(K=G.Mat, model="RKHS", Cov = list(type = 'UN')))
ETA
# Phenotypic data
= as.matrix(data[,-c(1:2)])
Y
# Model
<- Multitrait(y = Y,
fm.UD ETA = ETA,
resCov = list(type = "DIAG"),
nIter = 1200, burnIn = 120, thin = 5, saveAt = 'ex2_',
verbose = FALSE)
## Checking variance co-variance matrix K for linear term 1
## Ok
## Setting linear term 1
## MSx=1.49553739794528
## UNstructured covariance matrix
## df0 was set to 5
## S0 set to
## Trait1 Trait2 Trait3 Trait4
## Trait1 252.2516321 -0.1041094 15.616689 1.486083
## Trait2 -0.1041094 57.6646436 -13.482544 1.192506
## Trait3 15.6166894 -13.4825438 179.826882 -1.037392
## Trait4 1.4860834 1.1925056 -1.037392 97.687357
## Initializing resCov
## Setting hyperparameters for DIAG R
## df0 set to 5 for all the traits
## S0 was set to
## Trait1 Trait2 Trait3 Trait4
## 264.07622 60.36774 188.25648 102.26657
## Done
Retrieving estimates
# Genetic covariance matrix
$ETA$A$Cov$Omega fm.UD
## Trait1 Trait2 Trait3 Trait4
## Trait1 22.7453008 0.4777540 1.6834449 0.4312543
## Trait2 0.4777540 7.4759243 -2.1954825 0.1020759
## Trait3 1.6834449 -2.1954825 26.3616513 0.1770074
## Trait4 0.4312543 0.1020759 0.1770074 7.6961548
# Residual covariance matrix
$resCov$R fm.UD
## [,1] [,2] [,3] [,4]
## [1,] 44.3539 0.000000 0.00000 0.00000
## [2,] 0.0000 6.039915 0.00000 0.00000
## [3,] 0.0000 0.000000 10.55519 0.00000
## [4,] 0.0000 0.000000 0.00000 19.35977
# Phenotypic variation
$resCov$R + fm.UD$ETA$A$Cov$Omega fm.UD
## Trait1 Trait2 Trait3 Trait4
## Trait1 67.0992045 0.4777540 1.6834449 0.4312543
## Trait2 0.4777540 13.5158396 -2.1954825 0.1020759
## Trait3 1.6834449 -2.1954825 36.9168395 0.1770074
## Trait4 0.4312543 0.1020759 0.1770074 27.0559283
For this model we will consider a US matrix for the genetic effects and an US covariance matrix for the residuals.
# Linear predictor - specify the genetic matrix
<- list(A=list(K=G.Mat, model="RKHS", Cov = list(type = 'UN')))
ETA
# Phenotypic data
= as.matrix(data[,-c(1:2)])
Y
# Model
<- Multitrait(y = Y,
fmU ETA = ETA,
resCov = list(type = "UN"), #default
nIter = 1200, burnIn = 120, thin = 5, saveAt = 'ex3_',
verbose = FALSE)
## Checking variance co-variance matrix K for linear term 1
## Ok
## Setting linear term 1
## MSx=1.49553739794528
## UNstructured covariance matrix
## df0 was set to 5
## S0 set to
## Trait1 Trait2 Trait3 Trait4
## Trait1 252.2516321 -0.1041094 15.616689 1.486083
## Trait2 -0.1041094 57.6646436 -13.482544 1.192506
## Trait3 15.6166894 -13.4825438 179.826882 -1.037392
## Trait4 1.4860834 1.1925056 -1.037392 97.687357
## Initializing resCov
## Setting hyperparameters for UNstructured R
## df0 was set to 5
## S0 was set to
## Trait1 Trait2 Trait3 Trait4
## Trait1 377.2517494 -0.1556996 23.355343 2.222493
## Trait2 -0.1556996 86.2396311 -20.163648 1.783437
## Trait3 23.3553430 -20.1636485 268.937827 -1.551458
## Trait4 2.2224932 1.7834367 -1.551458 146.095096
## Done
Retrieving some estimates
# Genetic covariance matrix
$ETA$A$Cov$Omega fmU
## Trait1 Trait2 Trait3 Trait4
## Trait1 23.16447799 1.539124576 1.6007934 -0.089182710
## Trait2 1.53912458 7.460363154 -2.5709971 -0.009351724
## Trait3 1.60079344 -2.570997139 25.5385417 -0.467832718
## Trait4 -0.08918271 -0.009351724 -0.4678327 7.418943506
# Residual covariance matrix
$resCov$R fmU
## Trait1 Trait2 Trait3 Trait4
## Trait1 44.6376718 -2.1972030 0.3726908 0.1265974
## Trait2 -2.1972030 6.2591006 0.3790549 0.2474159
## Trait3 0.3726908 0.3790549 11.5912207 0.9028863
## Trait4 0.1265974 0.2474159 0.9028863 19.6760567
# Phenotypic variation
$ETA$A$Cov$Omega + fmU$resCov$R fmU
## Trait1 Trait2 Trait3 Trait4
## Trait1 67.80214980 -0.6580784 1.9734842 0.03741465
## Trait2 -0.65807840 13.7194638 -2.1919422 0.23806416
## Trait3 1.97348422 -2.1919422 37.1297624 0.43505362
## Trait4 0.03741465 0.2380642 0.4350536 27.09500019
Comparision
# Genetic covariance matrix
$ETA$A$Cov$Omega fmU
## Trait1 Trait2 Trait3 Trait4
## Trait1 23.16447799 1.539124576 1.6007934 -0.089182710
## Trait2 1.53912458 7.460363154 -2.5709971 -0.009351724
## Trait3 1.60079344 -2.570997139 25.5385417 -0.467832718
## Trait4 -0.08918271 -0.009351724 -0.4678327 7.418943506
$ETA$A$Cov$Omega fm.UD
## Trait1 Trait2 Trait3 Trait4
## Trait1 22.7453008 0.4777540 1.6834449 0.4312543
## Trait2 0.4777540 7.4759243 -2.1954825 0.1020759
## Trait3 1.6834449 -2.1954825 26.3616513 0.1770074
## Trait4 0.4312543 0.1020759 0.1770074 7.6961548
$ETA$A$Cov$Omega fm.DU
## [,1] [,2] [,3] [,4]
## [1,] 23.72315 0.000000 0.00000 0.000000
## [2,] 0.00000 7.561147 0.00000 0.000000
## [3,] 0.00000 0.000000 26.11251 0.000000
## [4,] 0.00000 0.000000 0.00000 7.344573
# Residual covariance matrix
$resCov$R fmU
## Trait1 Trait2 Trait3 Trait4
## Trait1 44.6376718 -2.1972030 0.3726908 0.1265974
## Trait2 -2.1972030 6.2591006 0.3790549 0.2474159
## Trait3 0.3726908 0.3790549 11.5912207 0.9028863
## Trait4 0.1265974 0.2474159 0.9028863 19.6760567
$resCov$R fm.UD
## [,1] [,2] [,3] [,4]
## [1,] 44.3539 0.000000 0.00000 0.00000
## [2,] 0.0000 6.039915 0.00000 0.00000
## [3,] 0.0000 0.000000 10.55519 0.00000
## [4,] 0.0000 0.000000 0.00000 19.35977
$resCov$R fm.DU
## Trait1 Trait2 Trait3 Trait4
## Trait1 43.81727609 -1.3071768 1.0578344 -0.01362685
## Trait2 -1.30717682 6.1718599 -0.5155850 0.27952859
## Trait3 1.05783437 -0.5155850 11.3597791 0.66463594
## Trait4 -0.01362685 0.2795286 0.6646359 19.55131259
# Phenotypic variation
$resCov$R + fmU$ETA$A$Cov$Omega fmU
## Trait1 Trait2 Trait3 Trait4
## Trait1 67.80214980 -0.6580784 1.9734842 0.03741465
## Trait2 -0.65807840 13.7194638 -2.1919422 0.23806416
## Trait3 1.97348422 -2.1919422 37.1297624 0.43505362
## Trait4 0.03741465 0.2380642 0.4350536 27.09500019
$resCov$R + fm.UD$ETA$A$Cov$Omega fm.UD
## Trait1 Trait2 Trait3 Trait4
## Trait1 67.0992045 0.4777540 1.6834449 0.4312543
## Trait2 0.4777540 13.5158396 -2.1954825 0.1020759
## Trait3 1.6834449 -2.1954825 36.9168395 0.1770074
## Trait4 0.4312543 0.1020759 0.1770074 27.0559283
$resCov$R + fm.DU$ETA$A$Cov$Omega fm.DU
## Trait1 Trait2 Trait3 Trait4
## Trait1 67.54042583 -1.3071768 1.0578344 -0.01362685
## Trait2 -1.30717682 13.7330069 -0.5155850 0.27952859
## Trait3 1.05783437 -0.5155850 37.4722845 0.66463594
## Trait4 -0.01362685 0.2795286 0.6646359 26.89588552
We will implement a model using the information of the same trait assessed in several environments.
# Only Trait 2
= Phenotypes[,c(1, 2, 4)] %>%
data pivot_wider(names_from = 'Env', values_from = 'Trait2')
head(data)
## # A tibble: 6 × 5
## Genotype `1` `2` `3` `4`
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 26662 32.6 22.7 22.4 11.5
## 2 30678 34.0 27.1 30.0 13.8
## 3 30129 35.1 28.9 28.3 13.4
## 4 28907 34.0 30.0 31.1 16.0
## 5 30522 37.1 28.5 36.9 17.7
## 6 26995 36.3 30.1 28.3 16.1
For this model we will consider a US matrix for the genetic effects and an US covariance matrix for the residuals.
# Linear predictor - specify the genetic matrix
<- list(A=list(K=G.Mat, model="RKHS", Cov = list(type = 'UN')))
ETA
# Phenotypic data
= as.matrix(data[,-1])
Y
# Model
<- Multitrait(y = Y,
fmU2 ETA = ETA,
resCov = list(type = "UN"), #default
nIter = 1200, burnIn = 120, thin = 5, saveAt = 'ex4_',
verbose = FALSE)
## Checking variance co-variance matrix K for linear term 1
## Ok
## Setting linear term 1
## MSx=1.49553739794528
## UNstructured covariance matrix
## df0 was set to 5
## S0 set to
## 1 2 3 4
## 1 57.66464 33.63711 40.83856 36.97746
## 2 33.63711 54.83955 36.68483 34.02574
## 3 40.83856 36.68483 63.36140 37.29528
## 4 36.97746 34.02574 37.29528 58.45911
## Initializing resCov
## Setting hyperparameters for UNstructured R
## df0 was set to 5
## S0 was set to
## 1 2 3 4
## 1 86.23963 50.30556 61.07559 55.30118
## 2 50.30556 82.01460 54.86354 50.88677
## 3 61.07559 54.86354 94.75934 55.77648
## 4 55.30118 50.88677 55.77648 87.42778
## Done
Retrieving some estimates
# Genetic covariance matrix
$ETA$A$Cov$Omega fmU2
## 1 2 3 4
## 1 7.075688 5.565782 6.616675 5.692943
## 2 5.565782 6.222793 6.057264 5.345246
## 3 6.616675 6.057264 8.067617 6.163346
## 4 5.692943 5.345246 6.163346 6.388281
# Residual covariance matrix
$resCov$R fmU2
## 1 2 3 4
## 1 6.3186681 0.7686159 1.531315 1.701129
## 2 0.7686159 6.8128593 1.141632 1.183247
## 3 1.5313148 1.1416319 7.011619 1.113291
## 4 1.7011294 1.1832469 1.113291 7.829411
# Phenotypic variation
$ETA$A$Cov$Omega + fmU2$resCov$R fmU2
## 1 2 3 4
## 1 13.394356 6.334397 8.147990 7.394073
## 2 6.334397 13.035653 7.198896 6.528493
## 3 8.147990 7.198896 15.079237 7.276637
## 4 7.394073 6.528493 7.276637 14.217692
For this model we will consider a FA matrix for the genetic effects and an DIAG covariance matrix for the residuals. To see more details, please check Smith et al. (2019). The variance components via FA structure can be solved via the following expression:
\[ \Omega = WW' + \Psi \] where W is the environmental loading and \(\Psi\) is a diagonal matrix with the specific variance for each environment.
# Creating an FA structure before entering the model
<- matrix(nrow = 4, ncol = 1, TRUE)
M <-list(type="FA",M=M) # M represents the number of environments/traits
CovFA
# Linear predictor
<- list(A=list(K = G.Mat, model='RKHS', Cov = CovFA))
ETA
# Phenotypic data
= as.matrix(data[,-1])
Y
# Model
<- Multitrait(y = Y,
fmF ETA = ETA,
resCov = list(type='UN'),
nIter = 1200, burnIn = 200, thin = 5, saveAt='ex5_',
verbose = FALSE)
## Checking variance co-variance matrix K for linear term 1
## Ok
## Setting linear term 1
## MSx=1.49553739794528
## FA covariance matrix
## df0 set to 5 for all the traits
## S0 was set to:
## 1 2 3 4
## 40.36525 38.38769 44.35298 40.92138
## var was set to 100
## Initializing resCov
## Setting hyperparameters for UNstructured R
## df0 was set to 5
## S0 was set to
## 1 2 3 4
## 1 86.23963 50.30556 61.07559 55.30118
## 2 50.30556 82.01460 54.86354 50.88677
## 3 61.07559 54.86354 94.75934 55.77648
## 4 55.30118 50.88677 55.77648 87.42778
## Done
Retrieving some estimates
# Genetic covariance matrix
$ETA$A$Cov$Omega # Posterior mean of BB’+Psi fmF
## [,1] [,2] [,3] [,4]
## [1,] 7.686612 5.650285 6.427334 5.628976
## [2,] 5.650285 7.035436 6.091097 5.329216
## [3,] 6.427334 6.091097 8.712215 6.064655
## [4,] 5.628976 5.329216 6.064655 6.944147
# Estimated loadings
<- fmF$ETA$A$Cov$W) (W
## [,1]
## [1,] 2.438731
## [2,] 2.311333
## [3,] 2.630492
## [4,] 2.300876
# Estimated variance of trait-specific factors (Psi)
<- fmF$ETA$A$Cov$PSI) (PSI
## [1] 1.697132 1.662461 1.760868 1.614950
%*%t(W) + diag(PSI) W
## [,1] [,2] [,3] [,4]
## [1,] 7.644540 5.636718 6.415063 5.611217
## [2,] 5.636718 7.004720 6.079943 5.318090
## [3,] 6.415063 6.079943 8.680357 6.052436
## [4,] 5.611217 5.318090 6.052436 6.908980
# Residual matrix
$resCov$R fmF
## 1 2 3 4
## 1 6.0726749 0.7403339 1.676004 1.790833
## 2 0.7403339 6.4031317 1.129594 1.156120
## 3 1.6760042 1.1295940 6.765991 1.244392
## 4 1.7908329 1.1561203 1.244392 7.629460
#fmF$ETA[[1]]$Cov
University of Florida, jhernandezjarqui@ufl.edu↩︎
University of Florida, mresende@ufl.edu↩︎
University of Florida, deamorimpeixotom@ufl.edu↩︎