R/scan_pvl.R
scan_pvl.Rd
`scan_pvl` calculates log likelihood for d-variate phenotype model fits. Inputted parameter `start_snp` indicates where in the `probs` object to start the scan.
scan_pvl( probs, pheno, kinship = NULL, addcovar = NULL, start_snp = 1, n_snp, max_iter = 10000, max_prec = 1/1e+08, cores = 1 )
probs | an array of founder allele probabilities for a single chromosome |
---|---|
pheno | a matrix of phenotypes |
kinship | a kinship matrix for one chromosome |
addcovar | a matrix, n subjects by c additive covariates |
start_snp | index of where to start the scan within probs |
n_snp | the number of (consecutive) markers to include in the scan |
max_iter | maximum number of iterations for EM algorithm |
max_prec | stepwise precision for EM algorithm. EM stops once incremental difference in log likelihood is less than max_prec |
cores | number of cores to use when parallelizing via parallel::mclapply. Set to 1 for no parallelization. |
a tibble with d + 1 columns. First d columns indicate the genetic data (by listing the marker ids) used in the design matrix; last is log10 likelihood
The function first discards individuals with one or more missing phenotypes or missing covariates. It then infers variance components, Vg and Ve. Both Vg and Ve are d by d covariance matrices. It uses an expectation maximization algorithm, as implemented in the `gemma2` R package. `gemma2` R package is an R implementation of the GEMMA algorithm for multivariate variance component estimation (Zhou & Stephens 2014 Nature methods). Note that variance components are fitted on a model that uses the d-variate phenotype but contains no genetic information. This model does, however, use the specified covariates (after dropping dependent columns in the covariates matrix). These inferred covariance matrices, \(\hat{Vg}\) and \(\hat{Ve}\), are then used in subsequent model fitting via generalized least squares. Generalized least squares model fitting is applied to every d-tuple of markers within the specified genomic region for `scan_pvl`. For a single d-tuple of markers, we fit the model: $$vec(Y) = Xvec(B) + vec(G) + vec(E)$$ where $$G \sim MN(0, K, \hat{Vg})$$ and $$E \sim MN(0, I, \hat{Ve})$$ where \(MN\) denotes the matrix-variate normal distribution with three parameters: mean matrix, covariance among rows, and covariance among columns. \(vec\) denotes the vectorization operation, ie, stacking by columns. \(K\) is a kinship matrix, typically calculated by leave-one-chromosome-out methods. \(Y\) is the n by d phenotypes matrix. \(X\) is a block-diagonal nd by fd matrix consisting of d blocks each of dimension n by f. Each n by f block (on the diagonal) contains a matrix of founder allele probabilities for the n subjects at a single marker. The off-diagonal blocks have only zero entries. The log-likelihood is returned for each model. The outputted object is a tibble with d + 1 columns. The first d columns specify the markers used in the corresponding model fit, while the last column specifies the log-likelihood value at that d-tuple of markers.
Knott SA, Haley CS (2000) Multitrait least squares for quantitative trait loci detection. Genetics 156: 899–911.
Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111-1127.
Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature methods 11:407-409.
Broman KW, Gatti DM, Simecek P, Furlotte NA, Prins P, Sen S, Yandell BS, Churchill GA (2019) R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multi-parent populations. GENETICS https://www.genetics.org/content/211/2/495.
# read data n <- 50 pheno <- matrix(rnorm(2 * n), ncol = 2) rownames(pheno) <- paste0("s", 1:n) colnames(pheno) <- paste0("tr", 1:2) probs <- array(dim = c(n, 2, 5)) probs[ , 1, ] <- rbinom(n * 5, size = 1, prob = 0.2) probs[ , 2, ] <- 1 - probs[ , 1, ] rownames(probs) <- paste0("s", 1:n) colnames(probs) <- LETTERS[1:2] dimnames(probs)[[3]] <- paste0("m", 1:5) scan_pvl(probs = probs, pheno = pheno, kinship = NULL, start_snp = 1, n_snp = 5, cores = 1)#> # A tibble: 25 x 3 #> Var1 Var2 log10lik #> <chr> <chr> <dbl> #> 1 m1 m1 -61.7 #> 2 m2 m1 -61.8 #> 3 m3 m1 -61.7 #> 4 m4 m1 -61.8 #> 5 m5 m1 -61.8 #> 6 m1 m2 -62.2 #> 7 m2 m2 -62.2 #> 8 m3 m2 -62.1 #> 9 m4 m2 -62.2 #> 10 m5 m2 -62.1 #> # … with 15 more rows