Perform model fitting for all ordered pairs of markers in a genomic region of interest

`scan_pvl` calculates log likelihood for d-variate phenotype model fits. Inputted parameter `start_snp` indicates where in the `probs` object to start the scan.

scan_pvl(
  probs,
  pheno,
  kinship = NULL,
  addcovar = NULL,
  start_snp = 1,
  n_snp,
  max_iter = 10000,
  max_prec = 1/1e+08,
  cores = 1
)

Arguments

probs	an array of founder allele probabilities for a single chromosome
pheno	a matrix of phenotypes
kinship	a kinship matrix for one chromosome
addcovar	a matrix, n subjects by c additive covariates
start_snp	index of where to start the scan within probs
n_snp	the number of (consecutive) markers to include in the scan
max_iter	maximum number of iterations for EM algorithm
max_prec	stepwise precision for EM algorithm. EM stops once incremental difference in log likelihood is less than max_prec
cores	number of cores to use when parallelizing via parallel::mclapply. Set to 1 for no parallelization.

Value

a tibble with d + 1 columns. First d columns indicate the genetic data (by listing the marker ids) used in the design matrix; last is log10 likelihood

Details

The function first discards individuals with one or more missing phenotypes or missing covariates. It then infers variance components, Vg and Ve. Both Vg and Ve are d by d covariance matrices. It uses an expectation maximization algorithm, as implemented in the `gemma2` R package. `gemma2` R package is an R implementation of the GEMMA algorithm for multivariate variance component estimation (Zhou & Stephens 2014 Nature methods). Note that variance components are fitted on a model that uses the d-variate phenotype but contains no genetic information. This model does, however, use the specified covariates (after dropping dependent columns in the covariates matrix). These inferred covariance matrices, and , are then used in subsequent model fitting via generalized least squares. Generalized least squares model fitting is applied to every d-tuple of markers within the specified genomic region for `scan_pvl`. For a single d-tuple of markers, we fit the model:

where

and

where

denotes the matrix-variate normal distribution with three parameters: mean matrix, covariance among rows, and covariance among columns.

denotes the vectorization operation, ie, stacking by columns.

is a kinship matrix, typically calculated by leave-one-chromosome-out methods.

is the n by d phenotypes matrix.

is a block-diagonal nd by fd matrix consisting of d blocks each of dimension n by f. Each n by f block (on the diagonal) contains a matrix of founder allele probabilities for the n subjects at a single marker. The off-diagonal blocks have only zero entries. The log-likelihood is returned for each model. The outputted object is a tibble with d + 1 columns. The first d columns specify the markers used in the corresponding model fit, while the last column specifies the log-likelihood value at that d-tuple of markers.

References

Knott SA, Haley CS (2000) Multitrait least squares for quantitative trait loci detection. Genetics 156: 899–911.

Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111-1127.

Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature methods 11:407-409.

Broman KW, Gatti DM, Simecek P, Furlotte NA, Prins P, Sen S, Yandell BS, Churchill GA (2019) R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multi-parent populations. GENETICS https://www.genetics.org/content/211/2/495.

Examples

# read data
n <- 50
pheno <- matrix(rnorm(2 * n), ncol = 2)
rownames(pheno) <- paste0("s", 1:n)
colnames(pheno) <- paste0("tr", 1:2)
probs <- array(dim = c(n, 2, 5))
probs[ , 1, ] <- rbinom(n * 5, size = 1, prob = 0.2)
probs[ , 2, ] <- 1 - probs[ , 1, ]
rownames(probs) <- paste0("s", 1:n)
colnames(probs) <- LETTERS[1:2]
dimnames(probs)[[3]] <- paste0("m", 1:5)
scan_pvl(probs = probs, pheno = pheno, kinship = NULL,
start_snp = 1, n_snp = 5, cores = 1)
#> # A tibble: 25 x 3
#>    Var1  Var2  log10lik
#>    <chr> <chr>    <dbl>
#>  1 m1    m1       -61.7
#>  2 m2    m1       -61.8
#>  3 m3    m1       -61.7
#>  4 m4    m1       -61.8
#>  5 m5    m1       -61.8
#>  6 m1    m2       -62.2
#>  7 m2    m2       -62.2
#>  8 m3    m2       -62.1
#>  9 m4    m2       -62.2
#> 10 m5    m2       -62.1
#> # … with 15 more rows