1. Introduction

Spatial analyses represents an optimal tool for post-hoc correction of field trials information. Heterogeneous factors that were not full corrected by the plotting techniques (i.e. soil fertility, diseases, and water holding capacity) can cause some disturbance in the prediction of BLUP values and, at the end, in the selection of the genotypes.

Below, we implement a set of models that is known as spatial analyses. In addition to demonstrate each step of the analysis, we also cover several parameters of interest for estimation/prediction purposes.

2. How to measure the best fit of a model to the data

The goodness-of-fit of a model to the data can be measured by different parameters. Below, we describe the most useful for spatial analyses. It is worth to mentioning that the spatial analyses’ objective is to select the most accurately and parsimoniously model that captures the spatial effects in the field. This will provides the most accurate and precise estimation of variance components and prediction of the genotypes values (Isik et al. 2018).

Heritability (\(h^2\))

The heritability represents how much of the phenotypic variation came from a genetic source. We can measure it based on the the variance components estimated by the model. The simplest representation of heritability (\(h^2\)) follows the equation:

\[ h^2 = \frac{\sigma^2_g}{\sigma^2_g + \sigma^2_{res}} \]

where \(\sigma^2_g\) is represents the genotypic variance and \(\sigma^2_{res}\) represents the residual variance.

Prediction accuracy (\(r_{\tilde{g}g}\))

The prediction accuracy represents a measure of how close a prediction is to the real genotypic value of a genotype for a target trait. It can be estimated by:

\[ r_{\tilde{g}g} = cor(predicted.value,real.value) \]

One simple way to get an estimation of accuracy is using the square root of the heritability:

\[ r_{\tilde{g}g} = {\sqrt{h^2}} \]

In mixed models, one alternative to estimated the heritability is to use the prediction error variance (PEV) from the prediction. It can be easily estimated by:

\[ PEV = std.error^2 \]

where the \(std.error\) represents the standard error from the genotypes predictions. After, we can use the following equation (Mrode, 2014):

\[ r_{\tilde{g}g} = \sqrt{1-\frac{PEV}{\sigma^2_g}} \]

The accuracy values can vary from 0 to 1. Values close to 1 represents the best accuracy from a prediction of a target trait.

LRT test

The likelihood information test (LRT - Wilks, 1938) is indicated to compared models. However, the comparison take into account that the models share the same fixed effects and one model (reduced model) has a subset of the random effects of the other model (full model). The LRT formula is given as follows:

\[ LRT = -2(logL_{full}-logL_{reduced}) \] where \(logL_{full}\) represents the log likelihood of the full model (i.e. all random effects), and \(logL_{reduced}\) represents the log likelihood of the reduced model (i.e. missing one random effect). The difference in-between models should be higher than 3.84 (\(\chi^2\) test, with 5% of probability and 1 degree of freedom). At this level (higher than 3.84) we consider that we have enough evidence to reject the null hypothesis, where the models are similar.

Information criteria (AIC and BIC)

The most indicated parameters to measure the goodness-of-fit of one model to the data are Akaike Information Criteria (AIC - Akaike, 1974) and Schwarz’s Bayesian Information Criteria (BIC - Schwarz, 1978). They are indicated for models that are not nested and share the same fixed effects. The formula definition for AIC and BIC are the following:

\[ AIC = -2*logL + 2t \]

\[ BIC = -2*logL + 2t*log(v) \]

where LogL represents the log likelihood of a model, \(t\) represents the number of covariance parameters in the model, \(v\) is the residual degrees of freedom, where \(v = n-p\), being \(n\) the number of observations and \(p\) the number of parameters in fixed effect factors. If the difference in-between two models is superior to 2, we can consider that the model with lower value of AIC/BIC is the best fit model to the data. In addition, if the difference is lower than 2, we should choose the most parsimoniously model, in other words, the model with lower number of parameters.

Verbyla information

When we have models with different number of fixed effects, the AIC/BIC criterion becomes prohibitive, once the log-likelihood came from REML estimates. One alternative to compare such kind of models is to use the Verbyla et al. (2019) correction for estimating the AIC/BIc corrected. With this correction, we will be able to compare those models. The calculation of the information criteria is an adaption of the code supplied in File S1 of Verbyla (2019) given in the function below (icREML). The corrected log-likelihood is calculated as:

\[ {loglik = log(REML) - log(|C|)/2}, \]

where C is log.determinant \((LogDet((X'H^{-1X})^{-1}))\) .

The AIC and BIC are calculated as:

\[ AIC = {- 2 * loglik + 2 * (rand.par + fixed.par)} \] \[ BIC = {- 2 * loglik + (fixed.par + rand.par) * log(n - r + fixed.par)}, \]

where rand.par represents the number of variances estimated by the full model , fixed.par represents the fixed degrees of freedom estimated by the full model, n is the number of observations, and r is the rank of the fixed effects design matrix.

## Function icREML from Verbyla et al. (2019)

# Author Ari Verbyla (ari.verbyla@csiro.au)

icREML <- function(fm, scale=1) {
    if(!is.list(fm)) stop(" Models need to be in a list\n")
    if(is.null(names(fm))) namesfm <- paste("fm", 1:length(fm))
    else namesfm <- names(fm)
    require(asreml)
    asreml.options(Cfixed = TRUE, gammaPar=FALSE)
    fm <- lapply(fm, function(el) {
        if(is.null(el$Cfixed)) {
            out <- update(el, maxit=1) }
        else out <- el
        out})
    logl <- lapply(fm, function(el) el$loglik)
    summ <- lapply(fm, function(el) summary(el, coef=TRUE)$coef.fixed)
    which.X0 <- lapply(summ, function(el) !is.na(el[, "z.ratio"]))
    p.0 <- lapply(which.X0, function(el) sum(el))
    Cfixed <- lapply(fm, function(el) el$Cfixed)
    logdet <- lapply(1:length(fm), function(el, Cfixed, which.X0, scale) {
        log(prod(svd(as.matrix(scale*Cfixed[[el]][which.X0[[el]], which.X0[[el]]]))$d))
    }, Cfixed, which.X0, scale)
    vparam <- lapply(fm, function(el) summary(el)$varcomp)
    q.0 <- lapply(vparam, function(el) sum(!(el$bound == "F" | el$bound == "B")))
    b.0 <- lapply(vparam, function(el) sum(el$bound == "F" | el$bound == "B"))
    logl <- lapply(1:length(fm), function(el, logl, logdet, p.0) {
        logl[[el]] - logdet[[el]]/2}, logl, logdet,p.0)
    aic <- unlist(lapply(1:length(fm), function(el, logl, p.0, q.0) {
        -2*logl[[el]] + 2*(p.0[[el]] + q.0[[el]])}, logl, p.0, q.0))
    bic <- unlist(lapply(1:length(fm), function(el, logl, p.0, q.0, fm) {
        -2*logl[[el]] + log(fm[[el]]$nedf+p.0[[el]])*(p.0[[el]] + q.0[[el]])},
        logl, p.0, q.0, fm))
    results <- data.frame(model=namesfm, loglik = unlist(logl), p=unlist(p.0),
                          q=unlist(q.0), b = unlist(b.0), AIC = aic, BIC = bic, logdet=unlist(logdet))
    row.names(results) <- 1:dim(results)[1]
    invisible(results)
}

3. Dataset

Packages

require(asreml)
## Online License checked out Mon Nov 24 15:24:36 2025
require(dplyr)
rm(list=ls())

Dataset

data = read.table("../data/data_spatial.txt", h=TRUE)

This dataset refers to an evaluation of 78 maize genotypes, in one location, and the trait measured was Yield.

Factor Levels
Location One
Genotypes 78
Checks 6
Blocks 3
Ranges 18
Rows 14
 # Setting as factors
 
 data$Row = data$Row %>% as.factor()
 data$Range = data$Range %>% as.factor()
 data$Block = data$Block %>% as.factor()
 data$Genotype = data$Genotype %>% as.factor()
 data$Check  = data$Check %>% as.factor()

Exploring the dataset by graphs

boxplot(Yield~Block,
data=data,
main="Trait performance over Blocks",
xlab="Block Number",
ylab="Yield (kg/ha)",
col="orange",
border="brown"
)

Exploring the dataset by graphs

boxplot(Yield~Range,
data=data,
main="Trait performance over ranges",
xlab="Range Number",
ylab="Yield (kg/ha)",
col="orange",
border="brown"
)

Exploring the dataset by graphs

boxplot(Yield~Row,
data=data,
main="Trait performance over rows",
xlab="Row number",
ylab="Yield (kg/ha)",
col="orange",
border="brown"
)

Spatial design

4. Model implementation

For the spatial analyses we list in the table below the possible models for spatial analyses. We used the before mentioned parameters to choose the best model (AIC, BIC, accuracy, heritability). The given model in a matrix notation is as follow:

\[ y = Xb + Zg + e \]

where \(y\) is the vector of phenotypic data; \(b\) is the vector of checks (assumed to be fixed) plus the overall mean; \(g\) is the vector of genotype effects (assumed to be random), where \(g~N(0,\sigma^2_g)\); and \(e\) is the vector of residuals (random), where \(e~N(0,\sigma^2_{res})\). The letters \(X\) and \(Z\) refers to the incidence matrices for \(b\) and \(g\), respectively.

Models fitted for spatial analyses
Model Action Fitted model for residual
Mod1 Only residual effect \(\sigma^2_{res}I_r⊗I_c\)
Mod2 Includes AR1 for row \(\sigma^2_{𝜉}∑_r(đžē_r)⊗I_c\)
Mod3 Includes AR1 for range \(\sigma^2_{𝜉}I_r⊗∑_c(đžē_c)\)
Mod4 Includes AR1 for row/range \(\sigma^2_{𝜉}∑_r(đžē_r)⊗∑_c(đžē_c)\)
Mod5 Includes AR1 for row/range and nugget effect \(\sigma^2_{𝜉}∑_r(đžē_r)⊗∑_c(đžē_c), \sigma^2_đœŧ\)

Model 1: No spatial structures

# Toy model
fm = asreml(fixed = Yield~Block + Check,
            random = ~ Genotype,
            na.action = na.method(x = "include"),
            data = data)
## ASReml Version 4.2 24/11/2025 15:24:37
##           LogLik        Sigma2     DF     wall
##  1     -1758.211     764665.89    239   15:24:37
##  2     -1757.840     745540.39    239   15:24:37
##  3     -1757.558     723297.95    239   15:24:37
##  4     -1757.475     705501.14    239   15:24:37
##  5     -1757.475     704507.71    239   15:24:37
plot(fm)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the asreml package.
##   Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Model 1 Same model, but with structure
mod1 = asreml(fixed = Yield~Block + Check,
              random = ~ Genotype,
              na.action = na.method(x = "include"),
              residual = ~id(Row):id(Range),
              data = data)
## ASReml Version 4.2 24/11/2025 15:24:37
##           LogLik        Sigma2     DF     wall
##  1     -1758.211     764665.89    239   15:24:37
##  2     -1757.840     745540.39    239   15:24:37
##  3     -1757.558     723297.95    239   15:24:37
##  4     -1757.475     705501.14    239   15:24:37
##  5     -1757.475     704507.71    239   15:24:37

Model outcomes

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod1,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod1)$varcomp
##             component std.error  z.ratio bound %ch
## Genotype     152638.4  68632.40 2.224000     P   0
## Row:Range!R  704507.7  78298.04 8.997769     P   0
# Traditional definition
(H2.mod1<-vpredict(mod1,H2~V1/(V1+V2)))   
##     Estimate         SE
## H2 0.1780775 0.07367362
# Model accuracy

# Estimation via heritability
(Acc.mod1 = sqrt(H2.mod1$Estimate[1]))
## [1] 0.4219923
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod1)$varcomp[1,1]
(Acc.PEV.mod1<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.6198571
# Akaike information criterion
aic.mod1 = summary(mod1)$aic[1]

# Bayesian information criterion
bic.mod1 = summary(mod1)$bic[1]

Model 2: Inclusion of AR1 for row

# Model 2 - Trends in Rows
mod2 = asreml(fixed = Yield~Block + Check,
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):id(Range),
              data = data)
## ASReml Version 4.2 24/11/2025 15:24:37
##           LogLik        Sigma2     DF     wall
##  1     -1754.561     748982.22    239   15:24:37
##  2     -1752.926     726224.94    239   15:24:37
##  3     -1751.483     706478.13    239   15:24:37
##  4     -1750.862     699733.88    239   15:24:37
##  5     -1750.834     701926.22    239   15:24:37
##  6     -1750.833     702401.38    239   15:24:37
plot(varioGram.asreml(mod2))

Model outcomes

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod2,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod2)$varcomp
##                      component    std.error  z.ratio bound %ch
## Genotype          1.733852e+05 6.586114e+04 2.632587     P 0.1
## Row:Range!R       7.024014e+05 8.044066e+04 8.731920     P 0.0
## Row:Range!Row!cor 3.006683e-01 7.272461e-02 4.134341     U 0.2
# Traditional definition
(H2.mod2<-vpredict(mod2,H2~V1/(V1+V2)))   
##     Estimate         SE
## H2 0.1979766 0.06784472
# Model accuracy

# Estimation via heritability
(Acc.mod2 = sqrt(H2.mod2$Estimate[1]))
## [1] 0.4449456
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod2)$varcomp[1,1]
(Acc.PEV.mod2<-sqrt(1-(mean.PEV/sigma2g)))
## [1] 0.6715797
# Akaike information criterion
aic.mod2 = summary(mod2)$aic[1]

# Bayesian information criterion
bic.mod2 = summary(mod2)$bic[1]

Model 3: inclusion of AR1 for range

# Model 3
mod3 = asreml(fixed = Yield~Block + Check,
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~Row:ar1(Range),
              data = data)
## ASReml Version 4.2 24/11/2025 15:24:37
##           LogLik        Sigma2     DF     wall
##  1     -1753.378     741532.76    239   15:24:37
##  2     -1750.778     714741.81    239   15:24:37
##  3     -1748.471     695384.48    239   15:24:37
##  4     -1747.619     694717.81    239   15:24:37
##  5     -1747.464     700825.27    239   15:24:37
##  6     -1747.458     702887.87    239   15:24:37
##  7     -1747.457     703240.78    239   15:24:37
plot(varioGram(mod3))

Model output

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod3,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod3)$varcomp
##                        component    std.error  z.ratio bound %ch
## Genotype            1.685575e+05 6.246899e+04 2.698259     P 0.0
## Row:Range!R         7.032408e+05 8.130975e+04 8.648910     P 0.0
## Row:Range!Range!cor 3.489894e-01 6.671926e-02 5.230715     U 0.1
# Traditional definition
(H2.mod3<-vpredict(mod3,H2~V1/(V1+V2)))   
##     Estimate         SE
## H2 0.1933446 0.06506777
# Model accuracy

# Estimation via heritability
(Acc.mod3 = sqrt(H2.mod3$Estimate[1]))
## [1] 0.4397097
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod3)$varcomp[1,1]
(Acc.PEV.mod3<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.6765499
# Akaike information criterion
aic.mod3 = summary(mod3)$aic[1]

# Bayesian information criterion
bic.mod3 = summary(mod3)$bic[1]

Model 4: Inclusion of AR1 for row and range

# Model 4
mod4 = asreml(fixed = Yield~Block + Check,
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)
## ASReml Version 4.2 24/11/2025 15:24:37
##           LogLik        Sigma2     DF     wall
##  1     -1750.221     729345.30    239   15:24:37
##  2     -1747.591     701054.78    239   15:24:37
##  3     -1745.133     677512.51    239   15:24:37
##  4     -1743.910     671571.65    239   15:24:37
##  5     -1743.814     674922.82    239   15:24:37
##  6     -1743.807     675642.36    239   15:24:37
##  7     -1743.806     675814.98    239   15:24:37
plot(varioGram(mod4))

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod4,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod4)$varcomp
##                        component    std.error  z.ratio bound %ch
## Genotype            1.998635e+05 6.462991e+04 3.092430     P 0.1
## Row:Range!R         6.758150e+05 7.785376e+04 8.680570     P 0.0
## Row:Range!Row!cor   2.416316e-01 7.952655e-02 3.038377     U 0.2
## Row:Range!Range!cor 3.105714e-01 7.302496e-02 4.252948     U 0.1
# Traditional definition
(H2.mod4<-vpredict(mod4,H2~V1/(V1+V2)))   
##     Estimate         SE
## H2 0.2282384 0.06447421
# Model accuracy

# Estimation via heritability
(Acc.mod4 = sqrt(H2.mod4$Estimate[1]))
## [1] 0.4777431
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod4)$varcomp[1,1]
(Acc.PEV.mod4<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.7211822
# Akaike information criterion
aic.mod4 = summary(mod4)$aic[1]

# Bayesian information criterion
bic.mod4 = summary(mod4)$bic[1]

Model 5: Inclusion of nugget effect

# Model 5
mod5 = asreml(fixed = Yield~Block + Check,
              random = ~Genotype + idv(units),
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)
## ASReml Version 4.2 24/11/2025 15:24:37
##           LogLik        Sigma2     DF     wall
##  1     -1750.804     668853.81    239   15:24:37
##  2     -1748.635     380878.77    239   15:24:37  (  2 restrained)
##  3     -1739.354     291125.58    239   15:24:37  (  1 restrained)
##  4     -1734.592     211785.41    239   15:24:37
##  5     -1733.940     264767.79    239   15:24:37
##  6     -1733.869     291729.85    239   15:24:37
##  7     -1733.851     306451.39    239   15:24:37
##  8     -1733.850     297084.96    239   15:24:37
##  9     -1733.850     300364.32    239   15:24:37
## 10     -1733.850     297061.24    239   15:24:37
## 11     -1733.849     298964.34    239   15:24:37
plot(varioGram(mod5))

Model output

#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod5,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod5)$varcomp
##                        component    std.error   z.ratio bound %ch
## Genotype            1.199431e+05 5.377807e+04  2.230335     P 0.1
## units!units         4.664785e+05 6.593710e+04  7.074598     P 0.1
## Row:Range!R         2.989643e+05 1.457003e+05  2.051913     P 0.0
## Row:Range!Row!cor   7.743999e-01 1.643659e-01  4.711439     U 0.1
## Row:Range!Range!cor 8.813115e-01 7.370977e-02 11.956509     U 0.0
# Traditional definition
(H2.mod5<-vpredict(mod5,H2~V1/(V1+V2+V3)))   
##     Estimate         SE
## H2 0.1354698 0.06132768
# Model accuracy

# Estimation via heritability
(Acc.mod5 = sqrt(H2.mod5$Estimate[1]))
## [1] 0.3680623
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod5)$varcomp[1,1]
(Acc.PEV.mod5<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.6216511
# Akaike information criterion
aic.mod5 = summary(mod5)$aic[1]

# Bayesian information criterion
bic.mod5 = summary(mod5)$bic[1]

5. Model comparision

Heritabilities

h2 = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                h2 = c(H2.mod1$Estimate[1],H2.mod2$Estimate[1],H2.mod3$Estimate[1],H2.mod4$Estimate[1],H2.mod5$Estimate[1]))

h2
##   Model        h2
## 1  Mod1 0.1780775
## 2  Mod2 0.1979766
## 3  Mod3 0.1933446
## 4  Mod4 0.2282384
## 5  Mod5 0.1354698

Accuracy via heritability

Acc.h2 = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                Acc.h2 = c(Acc.mod1,Acc.mod2,Acc.mod3,Acc.mod4,Acc.mod5))

Acc.h2
##   Model    Acc.h2
## 1  Mod1 0.4219923
## 2  Mod2 0.4449456
## 3  Mod3 0.4397097
## 4  Mod4 0.4777431
## 5  Mod5 0.3680623

Accuracy via PEV

Acc.PEV = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                Acc.PEV = c(Acc.PEV.mod1,Acc.PEV.mod2,Acc.PEV.mod3,Acc.PEV.mod4,Acc.PEV.mod5))

Acc.PEV
##   Model   Acc.PEV
## 1  Mod1 0.6198571
## 2  Mod2 0.6715797
## 3  Mod3 0.6765499
## 4  Mod4 0.7211822
## 5  Mod5 0.6216511

AIC

mod.AIC = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                AIC = c(aic.mod1,aic.mod2,aic.mod3,aic.mod4,aic.mod5))

mod.AIC
##   Model      AIC
## 1  Mod1 3518.950
## 2  Mod2 3507.666
## 3  Mod3 3500.915
## 4  Mod4 3495.612
## 5  Mod5 3477.699

BIC

mod.BIC = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                BIC = c(bic.mod1,bic.mod2,bic.mod3,bic.mod4,bic.mod5))

mod.BIC
##   Model      BIC
## 1  Mod1 3525.902
## 2  Mod2 3518.095
## 3  Mod3 3511.344
## 4  Mod4 3509.518
## 5  Mod5 3495.081

All information for the residual effect modelling

out = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5"),
                 h2 = h2$h2,
                 Acc.h2 = Acc.h2$Acc.h2,
                 Acc.PEV = Acc.PEV$Acc.PEV,
                 AIC = mod.AIC$AIC,
                 BIC = mod.BIC$BIC)
out
##   Model        h2    Acc.h2   Acc.PEV      AIC      BIC
## 1  Mod1 0.1780775 0.4219923 0.6198571 3518.950 3525.902
## 2  Mod2 0.1979766 0.4449456 0.6715797 3507.666 3518.095
## 3  Mod3 0.1933446 0.4397097 0.6765499 3500.915 3511.344
## 4  Mod4 0.2282384 0.4777431 0.7211822 3495.612 3509.518
## 5  Mod5 0.1354698 0.3680623 0.6216511 3477.699 3495.081

Based on AIC and BIC from those models, we can conclude that the goodness-of-fit came from the model 4, where we modeled AR1 for rows and AR1 for columns. The following comparisons will be based on this model.

6. Modelling the global effects

After the modelling of the effects of spatial correlated errors, we can include the information of row and range as an covariate in the model using the function \(lin()\). In this case, we are accounting for an single linear trend for range and another for row effect. In addition, we check the possibility of inclusion of the row and range effects as random effects in the model. We described the models below:

Model Global/Extraneous Natural Random parameter Fixed parameter
Mod5 \(\sigma^2_{𝜉}∑_r(đžē_r)⊗∑_c(đžē_c)\) 4 21
Mod6 \(\beta_r × R\) \(\sigma^2_{𝜉}∑_r(đžē_r)⊗∑_c(đžē_c)\) 4 22
Mod7 \(\beta_r × R,\beta_c × C\) \(\sigma^2_{𝜉}∑_r(đžē_r)⊗∑_c(đžē_c)\) 4 23
Mod8 \(\beta_r × R,\beta_c × C, \sigma^2_rI_r\) \(\sigma^2_{𝜉}∑_r(đžē_r)⊗∑_c(đžē_c)\) 5 23
Mod9 \(\beta_r × R,\beta_c × C, \sigma^2_rI_r, \sigma^2_cI_c\) \(\sigma^2_{𝜉}∑_r(đžē_r)⊗∑_c(đžē_c)\) 6 23

Model 4: Ar1xAr1

For comparison purposes, we will borrow the best residual model from the previous step to be the baseline structure for the next step.

Model 6: Linear effect of row

# Model 6
mod6 = asreml(fixed = Yield~Block + Check + lin(Row),
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)
## ASReml Version 4.2 24/11/2025 15:24:37
##           LogLik        Sigma2     DF     wall
##  1     -1733.881     653688.63    238   15:24:37
##  2     -1732.734     630009.54    238   15:24:37
##  3     -1731.773     605195.67    238   15:24:37
##  4     -1731.403     588098.50    238   15:24:37
##  5     -1731.389     587309.98    238   15:24:37
##  6     -1731.389     587280.38    238   15:24:37
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod6,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod6)$varcomp
##                        component    std.error  z.ratio bound %ch
## Genotype            1.710435e+05 6.024437e+04 2.839162     P 0.1
## Row:Range!R         5.872804e+05 6.638806e+04 8.846174     P 0.0
## Row:Range!Row!cor   1.442504e-01 8.198674e-02 1.759436     U 0.3
## Row:Range!Range!cor 2.060071e-01 7.685306e-02 2.680533     U 0.1
# Traditional definition
(H2.mod6<-vpredict(mod6,H2~V1/(V1+V2)))   
##     Estimate         SE
## H2 0.2255547 0.06969605
# Model accuracy

# Estimation via heritability
(Acc.mod6 = sqrt(H2.mod6$Estimate[1]))
## [1] 0.474926
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod6)$varcomp[1,1]
(Acc.PEV.mod6<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.6929692

Model 7: Linear effect of row and range

# Model 7
mod7 = asreml(fixed = Yield~Block+Check+lin(Row)+lin(Range),
              random = ~Genotype,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1729.958     654411.38    237   15:24:38
##  2     -1728.917     632418.04    237   15:24:38
##  3     -1728.031     609197.12    237   15:24:38
##  4     -1727.678     593082.30    237   15:24:38
##  5     -1727.664     592218.41    237   15:24:38
##  6     -1727.663     592155.47    237   15:24:38
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod7,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod7)$varcomp
##                        component    std.error  z.ratio bound %ch
## Genotype            1.672210e+05 6.026109e+04 2.774941     P 0.1
## Row:Range!R         5.921555e+05 6.717669e+04 8.814895     P 0.0
## Row:Range!Row!cor   1.487639e-01 8.203234e-02 1.813479     U 0.3
## Row:Range!Range!cor 2.042100e-01 7.677841e-02 2.659733     U 0.1
# Traditional definition
(H2.mod7<-vpredict(mod7,H2~V1/(V1+V2)))   
##     Estimate         SE
## H2 0.2202083 0.06998301
# Model accuracy

# Estimation via heritability
(Acc.mod7 = sqrt(H2.mod7$Estimate[1]))
## [1] 0.4692635
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod7)$varcomp[1,1]
(Acc.PEV.mod7<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.6869388

Model 8: Linear effect of row/range + random effect for row

# Model
mod8 = asreml(fixed = Yield~Block+Check+lin(Row)+lin(Range),
              random = ~Genotype + Row,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1730.543     629043.56    237   15:24:38
##  2     -1728.924     612205.57    237   15:24:38
##  3     -1727.842     593490.36    237   15:24:38
##  4     -1727.492     578823.59    237   15:24:38
##  5     -1727.477     577445.20    237   15:24:38
##  6     -1727.477     577257.86    237   15:24:38
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod8,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod8)$varcomp
##                        component    std.error   z.ratio bound %ch
## Row                 1.434870e+04 2.676769e+04 0.5360457     P 0.1
## Genotype            1.705439e+05 6.065430e+04 2.8117360     P 0.2
## Row:Range!R         5.772579e+05 6.872706e+04 8.3992800     P 0.0
## Row:Range!Row!cor   1.597616e-01 8.443777e-02 1.8920635     U 0.3
## Row:Range!Range!cor 1.824142e-01 8.524195e-02 2.1399585     U 0.1
# Traditional definition
(H2.mod8<-vpredict(mod8,H2~V2/(V1+V2+V3)))   
##     Estimate         SE
## H2 0.2237667 0.06986698
# Model accuracy

# Estimation via heritability
(Acc.mod8 = sqrt(H2.mod8$Estimate[1]))
## [1] 0.4730398
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod8)$varcomp[2,1]
(Acc.PEV.mod8<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.6916815

Model 9: Linear effect of row/range + random for row/range

# Model
mod9 = asreml(fixed = Yield~Block+Check+lin(Row)+lin(Range),
              random = ~Genotype + Row + Range,
              na.action = na.method(x = "include"),
              residual = ~ar1(Row):ar1(Range),
              data = data)
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1733.571     617561.33    237   15:24:38  (  1 restrained)
##  2     -1729.082     609433.09    237   15:24:38  (  1 restrained)
##  3     -1727.831     592638.87    237   15:24:38  (  1 restrained)
##  4     -1727.491     578689.89    237   15:24:38  (  1 restrained)
##  5     -1727.477     577432.27    237   15:24:38  (  1 restrained)
##  6     -1727.477     577254.74    237   15:24:38
#---- Outcomes
# BLUPs for the genotypes
BLUPs<-summary(mod9,coef=T)$coef.random
BLUP.geno<-as.matrix(BLUPs[1:78,])

# variance components
summary(mod9)$varcomp
##                        component    std.error   z.ratio bound %ch
## Row                 1.434893e+04 2.676845e+04 0.5360388     P 0.1
## Range               5.841408e-02           NA        NA     B  NA
## Genotype            1.705472e+05 6.065713e+04 2.8116600     P 0.1
## Row:Range!R         5.772547e+05 6.872724e+04 8.3992133     P 0.0
## Row:Range!Row!cor   1.597693e-01 8.443868e-02 1.8921343     U 0.3
## Row:Range!Range!cor 1.824183e-01 8.524316e-02 2.1399763     U 0.1
# Traditional definition
(H2.mod9<-vpredict(mod9,H2~V3/(V1+V2+V3+V4)))   
##     Estimate         SE
## H2 0.2237709 0.06986924
# Model accuracy

# Estimation via heritability
(Acc.mod9 = sqrt(H2.mod9$Estimate[1]))
## [1] 0.4730443
# Definition via PEV (Predicted error variance)
mean.PEV<-mean(BLUP.geno[,2]^2)
sigma2g<-summary(mod9)$varcomp[3,1]
(Acc.PEV.mod9<-sqrt(1-mean.PEV/sigma2g))
## [1] 0.6916728

7. Model Comparision

Heritabilities

## Heritabilities

h2 = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5","Mod6","Mod7", "Mod8", "Mod9"),
                h2 = c(H2.mod1$Estimate[1],H2.mod2$Estimate[1],H2.mod3$Estimate[1],
                       H2.mod4$Estimate[1],H2.mod5$Estimate[1],H2.mod6$Estimate[1],
                       H2.mod7$Estimate[1],H2.mod8$Estimate[1],H2.mod9$Estimate[1]))

h2
##   Model        h2
## 1  Mod1 0.1780775
## 2  Mod2 0.1979766
## 3  Mod3 0.1933446
## 4  Mod4 0.2282384
## 5  Mod5 0.1354698
## 6  Mod6 0.2255547
## 7  Mod7 0.2202083
## 8  Mod8 0.2237667
## 9  Mod9 0.2237709

Accuracy via heritabilities

Acc.h2 = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5","Mod6","Mod7", "Mod8", "Mod9"),
                Acc.h2 = c(Acc.mod1,Acc.mod2,Acc.mod3,Acc.mod4,Acc.mod5,Acc.mod6,Acc.mod7,Acc.mod8,Acc.mod9))

Acc.h2
##   Model    Acc.h2
## 1  Mod1 0.4219923
## 2  Mod2 0.4449456
## 3  Mod3 0.4397097
## 4  Mod4 0.4777431
## 5  Mod5 0.3680623
## 6  Mod6 0.4749260
## 7  Mod7 0.4692635
## 8  Mod8 0.4730398
## 9  Mod9 0.4730443

Accuracy via PEV

Acc.PEV = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5","Mod6","Mod7", "Mod8", "Mod9"),
                Acc.PEV = c(Acc.PEV.mod1,Acc.PEV.mod2,Acc.PEV.mod3,Acc.PEV.mod4,Acc.PEV.mod5,Acc.PEV.mod6,Acc.PEV.mod7,Acc.PEV.mod8,Acc.PEV.mod9))

Acc.PEV
##   Model   Acc.PEV
## 1  Mod1 0.6198571
## 2  Mod2 0.6715797
## 3  Mod3 0.6765499
## 4  Mod4 0.7211822
## 5  Mod5 0.6216511
## 6  Mod6 0.6929692
## 7  Mod7 0.6869388
## 8  Mod8 0.6916815
## 9  Mod9 0.6916728

AIC/BIC

AICBIC = list(mod1,mod2,mod3,mod4,mod5, mod6, mod7, mod8, mod9)

icREML <- function(fm, scale=1) {
    if(!is.list(fm)) stop(" Models need to be in a list\n")
    if(is.null(names(fm))) namesfm <- paste("fm", 1:length(fm))
    else namesfm <- names(fm)
    require(asreml)
    asreml.options(Cfixed = TRUE, gammaPar=FALSE)
    fm <- lapply(fm, function(el) {
        if(is.null(el$Cfixed)) {
            out <- update(el, maxit=1) }
        else out <- el
        out})
    logl <- lapply(fm, function(el) el$loglik)
    summ <- lapply(fm, function(el) summary(el, coef=TRUE)$coef.fixed)
    which.X0 <- lapply(summ, function(el) !is.na(el[, "z.ratio"]))
    p.0 <- lapply(which.X0, function(el) sum(el))
    Cfixed <- lapply(fm, function(el) el$Cfixed)
    logdet <- lapply(1:length(fm), function(el, Cfixed, which.X0, scale) {
        log(prod(svd(as.matrix(scale*Cfixed[[el]][which.X0[[el]], which.X0[[el]]]))$d))
    }, Cfixed, which.X0, scale)
    vparam <- lapply(fm, function(el) summary(el)$varcomp)
    q.0 <- lapply(vparam, function(el) sum(!(el$bound == "F" | el$bound == "B")))
    b.0 <- lapply(vparam, function(el) sum(el$bound == "F" | el$bound == "B"))
    logl <- lapply(1:length(fm), function(el, logl, logdet, p.0) {
        logl[[el]] - logdet[[el]]/2}, logl, logdet,p.0)
    aic <- unlist(lapply(1:length(fm), function(el, logl, p.0, q.0) {
        -2*logl[[el]] + 2*(p.0[[el]] + q.0[[el]])}, logl, p.0, q.0))
    bic <- unlist(lapply(1:length(fm), function(el, logl, p.0, q.0, fm) {
        -2*logl[[el]] + log(fm[[el]]$nedf+p.0[[el]])*(p.0[[el]] + q.0[[el]])},
        logl, p.0, q.0, fm))
    results <- data.frame(model=namesfm, loglik = unlist(logl), p=unlist(p.0),
                          q=unlist(q.0), b = unlist(b.0), AIC = aic, BIC = bic, logdet=unlist(logdet))
    row.names(results) <- 1:dim(results)[1]
    invisible(results)
}


est_AICBIC = icREML(AICBIC)
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1757.475     704499.27    239   15:24:38
## Warning in asreml(fixed = Yield ~ Block + Check, random = ~Genotype, residual =
## ~id(Row):id(Range), : Log-likelihood not converged
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1750.833     702490.41    239   15:24:38
## Warning in asreml(fixed = Yield ~ Block + Check, random = ~Genotype, residual =
## ~ar1(Row):id(Range), : Log-likelihood not converged
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1747.457     703307.57    239   15:24:38
## Warning in asreml(fixed = Yield ~ Block + Check, random = ~Genotype, residual =
## ~Row:ar1(Range), : Log-likelihood not converged
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1743.806     675861.30    239   15:24:38
## Warning in asreml(fixed = Yield ~ Block + Check, random = ~Genotype, residual =
## ~ar1(Row):ar1(Range), : Log-likelihood not converged
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1733.849     298672.03    239   15:24:38
## Warning in asreml(fixed = Yield ~ Block + Check, random = ~Genotype +
## idv(units), : Log-likelihood not converged
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1731.389     587268.48    238   15:24:38
## Warning in asreml(fixed = Yield ~ Block + Check + lin(Row), random = ~Genotype,
## : Log-likelihood not converged
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1727.663     592136.90    237   15:24:38
## Warning in asreml(fixed = Yield ~ Block + Check + lin(Row) + lin(Range), :
## Log-likelihood not converged
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1727.477     577217.77    237   15:24:38
## Warning in asreml(fixed = Yield ~ Block + Check + lin(Row) + lin(Range), :
## Log-likelihood not converged
## ASReml Version 4.2 24/11/2025 15:24:38
##           LogLik        Sigma2     DF     wall
##  1     -1727.477     577217.09    237   15:24:38
## Warning in asreml(fixed = Yield ~ Block + Check + lin(Row) + lin(Range), :
## Log-likelihood not converged

All information for the fixed/random terms

output = data.frame(Model = c("Mod1","Mod2", "Mod3", "Mod4","Mod5","Mod6","Mod7", "Mod8", "Mod9"),
                    h2 = round(h2$h2,3),
                    Acc.h2 = round(Acc.h2$Acc.h2,3),
                    Acc.PEV = round(Acc.PEV$Acc.PEV,3),
                    AIC = est_AICBIC$AIC,
                    BIC = est_AICBIC$BIC)


output
##   Model    h2 Acc.h2 Acc.PEV      AIC      BIC
## 1  Mod1 0.178  0.422   0.620 3638.877 3677.524
## 2  Mod2 0.198  0.445   0.672 3628.146 3670.307
## 3  Mod3 0.193  0.440   0.677 3620.966 3663.127
## 4  Mod4 0.228  0.478   0.721 3616.150 3661.824
## 5  Mod5 0.135  0.368   0.622 3601.421 3650.609
## 6  Mod6 0.226  0.475   0.693 3597.763 3646.951
## 7  Mod7 0.220  0.469   0.687 3599.557 3652.259
## 8  Mod8 0.224  0.473   0.692 3601.364 3657.579
## 9  Mod9 0.224  0.473   0.692 3601.364 3657.579

References

Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716–723. https://doi.org/10.1109/TAC.1974.1100705

Coelho, I., Peixoto, M. A., Marcal, T. D. S., Bernardeli, A., Alves, R., Lima, R. O., & Bhering, L. L. (2021). Accounting for spatial trends in multi-environment diallel analysis in maize breeding. PloS one, 16(10), e0258473.

Cullis, B.R., A.B. Smith, and N.E. Coombes. 2006. On the design of early generation variety trials with correlated data. J. Agric. Biol. Environ. Stat. 11(4): 381–393. doi: 10.1198/108571106X15444

Werner, C. R., Gemenet, D. C., & Tolhurst, D. J. (2024). FieldSimR: an R package for simulating plot data in multi-environment field trials. Frontiers in plant science, 15, 1330574.

Gilmour, A.R.,Cullis, B.R. & Verbyla, A.P. (1997). Accounting for natural and extraneous variationin the analysis of field experiments.Journal of Agricultural, Biological and Environmental Statistics2,269–293

Isik et al., 2017. Genetic Data Analysis for Plant and Animal Breeding.

Mrode, R. A. (2014). Linear models for the prediction of animal breeding values. Cabi.

Schwarz G. Estimating the dimension of a model. Ann Stat. 1978; 6: 461–464.

Verbyla, A. P. (2019). A note on model selection using information criteria for general linear models estimated using REML. Australian & New Zealand Journal of Statistics, 61, 39–50. doi: 10.1111/anzs.12254.

Wilks SS. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. Ann Math Stat. 1938; 9: 60–62. https://doi.org/10.1214/aoms/1177732360


  1. University of Florida, â†Šī¸Ž

  2. University of Florida, â†Šī¸Ž