Perform multivariate, one-QTL model fitting for markers on all chromosomes

The function first discards individuals with one or more missing phenotypes or missing covariates. It then infers variance components, Vg and Ve. Both Vg and Ve are d by d covariance matrices. It uses an expectation maximization algorithm, as implemented in the `gemma2` R package. `gemma2` R package is an R implementation of the GEMMA algorithm for multivariate variance component estimation (Zhou & Stephens 2014 Nature methods). Note that variance components are fitted on a model that uses the d-variate phenotype but contains no genetic information. This model does, however, use the specified covariates (after dropping dependent columns in the covariates matrix). These inferred covariance matrices, $\hat{Vg}$ and $\hat{Ve}$, are then used in subsequent model fitting via generalized least squares. Generalized least squares model fitting is applied to every marker on every chromosome. For a single marker, we fit the model: $$vec(Y) = Xvec(B) + vec(G) + vec(E)$$ where $$G \sim MN(0, K, \hat{Vg})$$ and $$E \sim MN(0, I, \hat{Ve})$$ where $MN$ denotes the matrix-variate normal distribution with three parameters: mean matrix, covariance among rows, and covariance among columns. $vec$ denotes the vectorization operation, ie, stacking by columns. $K$ is a kinship matrix, typically calculated by leave-one-chromosome-out methods. $Y$ is the n by d phenotypes matrix. $X$ is a block-diagonal nd by fd matrix consisting of d blocks each of dimension n by f. Each n by f block (on the diagonal) contains a matrix of founder allele probabilities for the n subjects at a single marker. The off-diagonal blocks have only zero entries. The log-likelihood is returned for each model. The outputted object is a tibble with d + 1 columns. The first d columns specify the markers used in the corresponding model fit, while the last column specifies the log-likelihood value at that d-tuple of markers.

scan_multi_oneqtl(
  probs_list,
  pheno,
  kinship_list = NULL,
  addcovar = NULL,
  cores = 1
)

Arguments

probs_list	an list of arrays of founder allele probabilities
pheno	a matrix of phenotypes
kinship_list	a list of kinship matrices, one for each chromosome
addcovar	a matrix, n subjects by c additive covariates
cores	number of cores for parallelization via parallel::mclapply()

Value

a tibble with d + 1 columns. First d columns indicate the genetic data (by listing the marker ids) used in the design matrix; last is log10 likelihood

References

Knott SA, Haley CS (2000) Multitrait least squares for quantitative trait loci detection. Genetics 156: 899–911.

Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111-1127.

Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature methods 11:407-409.

Broman KW, Gatti DM, Simecek P, Furlotte NA, Prins P, Sen S, Yandell BS, Churchill GA (2019) R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multi-parent populations. GENETICS https://www.genetics.org/content/211/2/495.

Examples

# read data
n <- 50
pheno <- matrix(rnorm(2 * n), ncol = 2)
rownames(pheno) <- paste0("s", 1:n)
colnames(pheno) <- paste0("tr", 1:2)
probs <- array(dim = c(n, 2, 5))
probs[ , 1, ] <- rbinom(n * 5, size = 1, prob = 0.2)
probs[ , 2, ] <- 1 - probs[ , 1, ]
rownames(probs) <- paste0("s", 1:n)
colnames(probs) <- LETTERS[1:2]
dimnames(probs)[[3]] <- paste0("m", 1:5)
scan_multi_oneqtl(probs_list = list(probs, probs), pheno = pheno, cores = 1)
#> [[1]]
#> # A tibble: 5 x 4
#>   Var1  Var2  log10lik null_log10lik
#>   <chr> <chr>    <dbl>         <dbl>
#> 1 m1    m1       -60.0         -60.7
#> 2 m2    m2       -60.5         -60.7
#> 3 m3    m3       -60.2         -60.7
#> 4 m4    m4       -60.5         -60.7
#> 5 m5    m5       -60.3         -60.7
#> 
#> [[2]]
#> # A tibble: 5 x 4
#>   Var1  Var2  log10lik null_log10lik
#>   <chr> <chr>    <dbl>         <dbl>
#> 1 m1    m1       -60.0         -60.7
#> 2 m2    m2       -60.5         -60.7
#> 3 m3    m3       -60.2         -60.7
#> 4 m4    m4       -60.5         -60.7
#> 5 m5    m5       -60.3         -60.7
#>