---
title: "Simulation of plot errors and phenotypes in a plant breeding field trial"
output: html_document
vignette: >
%\VignetteIndexEntry{Simulation of plot errors and phenotypes in a plant breeding field trial}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup, include = FALSE}
library(AlphaSimR)
library(FieldSimR)
library(ggplot2)
```
### **Background**
The **R package ['FieldSimR'](https://cran.r-project.org/package=FieldSimR)** enables simulation of multi-environment plant breeding trial phenotypes through the simulation of plot errors and their subsequent combination with (simulated) genetic values. Its **core function generates plot errors** comprising 1) a spatially correlated error term, 2) a random error term, and 3) an extraneous error term. Spatially correlated errors are simulated using either bivariate interpolation, or a two-dimensional autoregressive process of order one (AR1:AR1).The three error terms are combined at a user-defined ratio.
**This document demonstrates how to:**
1. Simulate plot errors for multiple traits tested in multiple environments, and
2. Simulate phenotypes through the combination of the plot errors with simulated genetic values.
The **simulation of plot errors** requires specification of various simulation parameters that define:
1. The field trial layout
2. The total error variance
3. The spatial error
4. Extraneous variation
**Phenotypes** are then simulated through the combination of the plot errors with the genetic values stored in the package's example data frame `gv_df_unstr`. The data frame contains simulated genetic values for two traits in three environments based on an unstructured model for genotype-by-environment (GxE) interaction. The simulation of the genetic values is shown in the vignette on the [Simulation of genetic values based on an unstructured model for genotype-by-environment (GxE) interaction](https://crwerner.github.io/fieldsimr/articles/unstructured_GxE_demo.html).
### **1. Simulation of plot errors**
#### **1.1 Field trial layout**
We conceive a scenario in which 100 maize hybrids are measured for grain yield (t/ha) and plant height (cm) in three environments. The first and the third environment include two blocks, and the second environment includes three blocks. Each block comprises 20 rows and 5 columns. The blocks are arranged in column direction (“side-by-side”). The plot length (column direction) is 8 meters, and the plot width (row direction) is 2 meters.
```{r}
ntraits <- 2 # Number of traits
nenvs <- 3 # Number of environments
nblocks <- c(2, 2, 3) # Number of blocks per environment
block_dir <- "col" # Arrangement of blocks ("side-by-side")
ncols <- c(10, 10, 15) # Number of columns per environment
nrows <- 20 # Number of rows per environment
plot_length <- 8 # Plot length; here in meters (column direction)
plot_width <- 2 # Plot width; here in meters (row direction)
```
**Note:** `plot_length` and `plot_width` are only required when `spatial_model = "Bivariate"`. The two arguments are used to set the x-coordinates and y-coordinates required for the bivariate interpolation algorithm to model a spatial correlation between plots. Therefore, the assumed unit of length (meters here) has no actual meaning, and the ratio between `plot_length` and `plot_width` is important rather than the absolute values. We recommend to use realistic, absolute values for `plot_length` and `plot_width` regardless of the unit of length.
When spatially correlated errors are simulated based on a **two-dimensional autoregressive process (AR1:AR1)**, `col_cor` and `row_cor` have to be defined instead.
#### **1.2 Total error**
To obtain pre-defined target heritabilities at the plot-level, we need to define the total error variances for the two simulated traits relative to their genetic variances. We assume broad-sense heritabilities at the plot level of ***H2*** **= 0.3 for grain yield** and ***H2*** **= 0.5 for plant height** at all three environments. The heritabilities for the six trait x environment combinations are stored in a single vector.
```{r}
H2 <- c(0.3, 0.3, 0.3, 0.5, 0.5, 0.5) # c(Yld:E1, Yld:E2, Yld:E3, Pht:E1, Pht:E2, Pht:E3)
```
The total genetic variances of the six trait x environment combinations (environments nested within traits) can be extracted from the description of the simulation of the genetic values in FieldSimR::unstructured_GxE_demo.
```{r}
var <- c(0.086, 0.12, 0.06, 15.1, 8.5, 11.7) # c(Yld:E1, Yld:E2, Yld:E3, Pht:E1, Pht:E2, Pht:E3)
```
We now create a simple function to calculate the total error variances based on our pre-defined target heritabilities in vector `H2`and the total genetic variances in `var`.
```{r}
# Calculation of error variances based on the genetic variance and target heritability vectors.
calc_varR <- function(var, H2) {
varR <- (var / H2) - var
return(varR)
}
varR <- calc_varR(var, H2)
round(varR, 2) # Vector of error variances: c(Yld:E1, Yld:E2, Yld:E3, Pht:E1, Pht:E2, Pht:E3)
```
#### **1.3 Spatial error**
We simulate a spatial error term using bivariate interpolation and assume the proportion of spatial error variance to total error variance to be 0.4 in all three environments. Additionally, we assume a correlation between the spatial error of the two traits. However, since we have no information on the magnitude of the correlation, we randomly sample a value between 0 and 0.5.
```{r}
spatial_model <- "Bivariate" # Spatial error model.
prop_spatial <- 0.4 # Proportion of spatial trend.
ScorR <- rand_cor_mat(ntraits, min.cor = 0, max.cor = 0.5, pos.def = TRUE)
round(ScorR, 2)
```
#### **1.4 Extraneous variation**
Extraneous effects can, for example, result from trial management procedures in row and/or column direction, such as soil tillage, sowing or harvesting. We want to simulate extraneous variation in row direction and assume the proportion of extraneous error variance to total error variance to be 0.2. We assume a correlation between the error of the two traits because of extraneous variation and randomly sample a value between 0 and 0.5.
```{r}
ext_ord <- "zig-zag"
ext_dir <- "row"
prop_ext <- 0.2
EcorR <- rand_cor_mat(ntraits, min.cor = 0, max.cor = 0.5, pos.def = TRUE)
round(EcorR, 2)
```
**Note:** the **proportion of the random error variance to total error variance** is defined as 1 - (prop_spatial + prop_ext). Hence, `prop_spatial` and `prop_ext` are set with reference to the random error, and the sum of the two proportions must not be greater than 1.
#### **1.5 Simulation of plot errors**
Finally, we use all the parameters defined above with the function `field_trial_error()` to simulate plot errors for grain yield and plant height in the three test environments:
```{r}
error_ls <- field_trial_error(
ntraits = ntraits,
nenvs = nenvs,
nblocks = nblocks,
block.dir = block_dir,
ncols = ncols,
nrows = nrows,
plot.length = plot_length,
plot.width = plot_width,
varR = varR,
ScorR = ScorR,
EcorR = EcorR,
RcorR = NULL,
spatial.model = spatial_model,
prop.spatial = prop_spatial,
ext.ord = ext_ord,
ext.dir = ext_dir,
prop.ext = prop_ext,
return.effects = TRUE
)
```
**Note:** by default, the function `field_trial_error()` generates a data frame with the following columns: ***environment id*** (environment), ***block id***, ***column id***, ***row id***, and the ***total plot error for each trait***. When `return_effects = TRUE`, ['FieldSimR'](https://cran.r-project.org/package=FieldSimR) returns a list with an additional entry for each trait containing the spatial error, the extraneous effect and the random error.
We will now plot the **total error for grain yield ("e.Trait1") in Environment 1**, as well as the **spatial error component**, the **random error component**, and the **extraneous variation**. Therefore, we first extract the required data from `error_df` and then create an individual graphic for each of the four simulated error terms.
```{r}
e_total_env1 <- error_ls$error.df[error_ls$error.df$env == 1, ]
e_terms_env1 <- error_ls$Trait1[error_ls$Trait1$env == 1, ]
```
**Total plot error**
```{r fig.height = 4, fig.width = 9, fig.align = "center"}
plot_effects(e_total_env1, effect = "e.Trait1", labels = TRUE)
```
**Spatial error** simulated using bivariate interpolation
```{r fig.height = 4, fig.width = 9, fig.align = "center"}
plot_effects(e_terms_env1, effect = "e.spat", labels = TRUE)
```
**Random error**
```{r fig.height = 4, fig.width = 9, fig.align = "center"}
plot_effects(e_terms_env1, effect = "e.rand", labels = TRUE)
```
**Extraneous variation **in the row direction
```{r fig.height = 4, fig.width = 9, fig.align = "center"}
plot_effects(e_terms_env1, effect = "e.ext.row")
```
### **2. Simulation of plot-level phenotypes**
To simulate **grain yield and plant height phenotypes** for our multi-environment maize experiment, we now combine the simulated plot errors with the genetic values stored in the example data frame `gv_df_unstr`. This is done using the function `make_phenotypes()`, which allocates genotypes to plots within blocks according to a randomized complete block design (RCBD).
**Note:** more complex field designs require allocation of genotypes to plots using an external R package or software.
```{r}
gv_df <- gv_df_unstr
pheno_df <- make_phenotypes(
gv_df,
error_ls$error.df,
randomise = TRUE
)
pheno_env1 <- pheno_df[pheno_df$env == 1, ] # Extract phenotypes in environment 1.
```
**Simulated phenotypes**
```{r fig.height = 4, fig.width = 9, fig.align = "center"}
plot_effects(pheno_env1, effect = "y.Trait1")
```
**Histogram showing the phenotypes of the 100 maize hybrids for grain yield in the two blocks in Environment 1**
```{r echo=TRUE, fig.height = 4, fig.width = 9, fig.align = "center"}
ggplot(pheno_env1, aes(x = y.Trait1, fill = factor(block))) +
geom_histogram(color = "#e9ecef", alpha = 0.8, position = "identity", bins = 50) +
scale_fill_manual(values = c("violetred3", "goldenrod3", "skyblue2")) +
labs(x = "Phenotypes for grain yield (t/ha)", y = "Count", fill = "Block")
```