For each tissue we have gene expression \(X\) in that an element \(X_{gi}\) measures level of expression of gene \(g\) in individual \(i\). We assume \(\mathbb{E}[X] = UV\) under the null (without genetic effects). Without enforcing the rank of matrix \(U\) and \(V\) to a particular value, we aim to identify a suitable rank through Matrix Factorization with Bayesian Group Lasso (MFGL). Group Lasso prior ¹ has been explored to select a group of features that models response variables in a regression problem. Here we use its Bayesian extension ² to avoid the need of regularization parameter tuning. We could use genetic association to choose appropriate size of the factorization, but this can overfit to given genotype matrix.

\[\mathcal{L} = \prod_{i=1}^{m} P(\mathbf{y}_{i}|\mathbf{U},\mathbf{v}_{i},\sigma^{2}) = \prod_{i=1}^{m} \mathcal{N}(\mathbf{y}_{i}|U\mathbf{v}_{i},\sigma^{2}I)\]

Prior distribution of column vector \(\mathbf{u}_{j}\) is modeled by \[P(\mathbf{u}_{j}|\lambda) = \mathcal{N}(\mathbf{u}_{j}|\mathbf{0},\sigma^{2}\tau_{u}^{2}I).\] and row vector \(\mathbf{v}_{j}\) is by \[P(\mathbf{v}_{j}|\lambda) \propto \exp\left( - \|\mathbf{v}_{j}\| \lambda/\sigma \right).\] A more convenient and equivalent formulation is proposed by previous work ²: \[P(\mathbf{v}_{j}|\tau_{v_{j}}^{2}) = \mathcal{N}(\mathbf{v}_{j}|\mathbf{0},\sigma^{2}\tau_{v_{j}}^{2}I_{m\times{m}})\] and \[P(\tau_{v_j}^{2}|\lambda) = \mathsf{Gam}\left(\tau_{v_j}^{2}|\frac{m +1}{2},\frac{\lambda^{2}}{2}\right).\] We can justify this hierarchical model by checking \(\int d\tau_{v_j}^{2} \, P(\mathbf{v}_{j}|\tau_{v_j}^{2})P(\tau_{v_j}^{2}|\lambda) \propto \exp(-\|\mathbf{v}_{j}\|\lambda/\sigma).\)

Model inference by Gibbs sampling

Given hyperparameters that regularize sparsity the model, we can estimate columns of \(U\) and rows of \(V\) by sampling from \[P(\mathbf{u}_{j}|\cdot) = \mathcal{N}((\mathbf{v}_{j}\mathbf{v}_{j}^{\top} + \tau_{u}^{-2})^{-1}R_{-j}\mathbf{v}_{j}^\top,\sigma^{2}(\mathbf{v}_{j}\mathbf{v}_{j}^{\top}+\tau_{u}^{-2})^{-1}I)\] and \[P(\mathbf{v}_{j}|\cdot) = \mathcal{N}((\mathbf{u}_{j}^{\top}\mathbf{u}_{j} + \tau_{v_j}^{-2})^{-1} \mathbf{u}_{j}^\top R_{-j},\sigma^{2}(\mathbf{v}_{j}\mathbf{v}_{j}^{\top}+\tau_{v_j}^{-2})^{-1}I),\] where \(R_{-j} = Y - UV + \mathbf{u}_{j}\mathbf{v}_{j}\).

Empirical Bayes estimate of regularization parameter

Exact posterior probability of the unknown precision (inverse variance) can be derived analytically as inverse Normal distribution: \[P(\tau_{v_j}^{-2}|\cdot) = \mathcal{IN}\left(\lambda\sigma/\|\mathbf{v}_{j}\|, \lambda^{2} \right)\] where probability density function \(\mathcal{IN}(x|\mu,\gamma) = \gamma^{1/2}(2\pi x^{3})^{-1/2}\exp\left(-\gamma(x-\mu)^{2} / 2\mu^{2}x\right)\).

From \(\mathbb{E}[\tau_{v_j}^{2}] = \|\mathbf{v}_j\|/\lambda\sigma + 1/\lambda^{2}\) under the posterior distribution (inverse Normal), we derive empirical Bayes update of the regularization parameter \(\lambda\) as follows. \[\lambda^{-2} \gets \frac{\sum_{j}\mathbb{E}[\tau_{v_j}^{2}]}{mJ+J} = \frac{\sum_{j}\|\mathbf{v}_{j}\|/\sigma}{(m+1)J} \lambda^{-1} + \frac{1}{m + 1} \lambda^{-2}.\] At stationary point, we reach a rather simple solution \[\hat{\lambda} = \frac{m\sigma}{J^{-1}\sum_{j}\|\mathbf{v}_{j}\|},\,\left(\textrm{or trivially}\,\hat{\lambda} = 0\right).\]

We may use Monte-Carlo EM, but to simplify calculation we give rather radical empirical Bayes estimate of \(\lambda\). For each group \(j\) separately, instead of averaging over all \(J\) groups, we give \(\hat{\lambda}_{j} = m\sigma/\|\mathbf{v}_{j}\|.\) And with this plug-in estimate, we derive \[\mathbb{E}[\tau_{v_{j}}^{-2}|\cdot] \approx \frac{\hat{\lambda}_{j}\sigma}{\|\mathbf{v}_{j}\|} = \min\{\frac{m\sigma^{2}}{\|\mathbf{v}_{j}\|^{2}}, C\}\] where we bounded the estimate by some large constant \(C\) to prevent \(\infty\) caused by \(\|\mathbf{v}_{j}\| = 0\). As with sufficiently large \(C\), the inference results were not affected.

Sparse reconstruction of missing elements

\[\mathbb{E}[\mathbf{m}^{(g)}|\cdot] =\frac{(Y^{(g)}- UV^{(g)})\mathbf{1}}{n_{\textrm{individual}}}\]

\[\mathbb{E}[\mathbf{v}_{j}^{(g)}|\cdot] = \frac{\mathbf{u}_{j}^{\top}Y_{-j}^{(g)}} {\|\mathbf{u}_{j}\|^{2} + \tau_{v_{j}}^{-2}},\,\, \mathbb{V}[\mathbf{v}_{j}^{(g)}|\cdot] = \frac{\sigma^{2}} {\|\mathbf{u}_{j}\|^{2} + \tau_{v_{j}}^{-2}}I\] where \(Y_{-j}^{(g)} = Y^{(g)} - U V^{(g)} - M^{(g)} + \mathbf{u}_{j}\mathbf{v}_{j}^{(g)}\) and \(\mathbb{E}[\tau_{v_{j}}^{-2}] = n_{\textrm{indvidual}}\sigma^{2}/ \|\mathbf{v}_{j}\|^{2}\) by empirical Bayes estimate.

Global parametric inference

\[\mathbb{E}[\mathbf{u}_{j}|\cdot] = \frac{\sum_{g} Y_{-j}^{(g)} {\mathbf{v}_{j}^{(g)}}^\top}{\sum_{g}\|\mathbf{v}_{j}^{(g)}\|^{2} + \tau_{u_j}^{-2}},\quad\textrm{and}\quad \mathbb{V}[\mathbf{u}_{j}|\cdot] = \frac{\sigma^{2}}{\sum_{g}\|\mathbf{v}_{j}^{(g)}\|^{2} + \tau_{u_j}^{-2}}I\] where \(Y_{-j}^{(g)} = Y^{(g)} - M^{(g)} - UV^{(g)} + \mathbf{u}_{j}\mathbf{v}_{j}^{(g)}\).

With accumulated sufficient statistics \(T(\{V(g)\}) = \sum_{g} \left[ Y(g) {V(g)}^{\top} - \mathbf{m}(g) {\bar{\mathbf{v}}(g)}^\top \right]\) and \(S(\{V(g)\}) = \sum_{g} V(g){V(g)}^\top\)

Memoized online learning

Universal and unidirectional tissue factors

\[(Y_{ti}^{(g)} - \sum_{k\neq j} Z_{tk} u_{k} V_{ki}^{(g)} - u_{j} V_{ji}^{(g)})^{2}\]

Let \(Y_{ti,-j}^{(g)} = Y_{ti}^{(g)} - \sum_{k\neq j} Z_{tk} u_{k} V_{ki}^{(g)}\)

First characterize distribution over \(\mathbf{u}_{j} \equiv \mathbf{z}_{j} u_{j}\)

\[\mathbb{E}[u_{j}|\cdot] = \frac{\mathbf{z}_{j}^\top (Y - \sum_{k\neq j} u_{k}\mathbf{z}_{k} \mathbf{v}_{k}) \mathbf{v}_{j}^\top} {\|\mathbf{v}_{j}\|^{2} \sum_{t} Z_{tj}^{2} + n_{\textrm{gene}} \tau_{u_j}^{-2}}\]

\[\mathbb{E}[u_{j}^{(s)}|\cdot] = \frac{\mathbf{z}_{j}^\top (Y\mathbf{v}_{j}^\top) - \mathbf{z}_{j}^\top U (V\mathbf{v}_{j}^\top) + u^{(s-1)}_{j} \|\mathbf{z}_{j}\|^{2} \|\mathbf{v}_{j}\|^{2}} {\|\mathbf{v}_{j}\|^{2} \|\mathbf{z}_{j}\|^{2} + n_{\textrm{gene}} \tau_{u_j}^{-2}}\] where we are in fact dealing with \(n_{\textrm{gene}}\) number of observations.

spike: \[\mathbb{E}[\mathbf{z}_{j}|\cdot]= \mathsf{sigmoid}\left( -\frac{1}{2\sigma^{2}} u_{j}^{2} \|\mathbf{v}_{j}\|^{2} + \frac{1}{\sigma^{2}} u_{j} (Y - \sum_{k\neq j} u_{k} \mathbf{z}_{k} \mathbf{v}_{k})\mathbf{v}_{j}^\top + n_{\textrm{gene}} \log \frac{\pi}{1-\pi}\right)\]

\[\mathbb{E}[\mathbf{z}_{j}^{(s)}|\cdot]= \mathsf{sigmoid}\left( -\frac{1}{2\sigma^{2}} u_{j}^{2} \|\mathbf{v}_{j}\|^{2} + \frac{u_{j}}{\sigma^{2}} \left( Y \mathbf{v}_{j}^\top - U (V \mathbf{v}_{j}^\top) + \mathbf{z}_{j}^{(s-1)} \|\mathbf{v}_{j}\|^{2} \right) + n_{\textrm{gene}} \log \frac{\pi}{1-\pi} \right)\]

\[p(\pi_{tj}|\cdot) \sim \mathsf{beta}\left(a_0/n_{\textrm{tissue}} + \mathbb{E}[z_{tj}], b_0 (n_{\textrm{tissue}}-1)/n_{\textrm{tissue}} + (1- \mathbb{E}[z_{tj}]) \right)\] as shown in ⁴

ModQTL discovery

We have sought to identify global tissue factor matrix \(U\) and locally inferred gene-specific factor loading matrix \(V^{(g)}\), that factorizes each gene matrix as \(Y^{(g)} = U V^{(g)}\). However the factors are not readily interpretable and provide insights into regulatory mechanism of tissue-specific gene expressions. To ascertain biological mechanisms we link these factors to genetic variants, and then genetic variants to regulatory / GWAS SNP annotations that were identified in reference data sets.

Fixing the factor axis, \(j\), we can pull all genes and construct gene \(\times\) individual matrix \(V^{(j)}\). For simplicity we omit the superscript \((j)\) but the algorithm is identical.

We model factor activity of gene \(g\) in individual \(i\) by linear model with genetic \[V_{gi} \sim \mathcal{N}\left(\sum_{s\in S(g)} Z_{s} W_{gs} X_{si},\sigma^{2}\right)\]

Genetic variant selection process is a priori biased by epigenetic / GWAS marks \(A_{ks}\) on SNP \(s\). \[Z_{s} \sim \mathsf{sigmoid}\left(\theta_{0} + \sum_{k} A_{ks} \theta_{k} \right)\] This will choose sparse model along the SNP axis (column) of \(W\) since \(P(W_{gs}|Z_{s} =1)\) have mean zero. At the same time we shrink gene axis of genes using Bayesian group LASSO, \[\mathbf{w}_{g}^\top\sim\mathcal{N}\left(\mathbf{0},\sigma^{2}\tau_{w_{g}}^{2}I\right)\] where \(\tau_{w_{g}}^{2}\sim\mathsf{Gam}\left((|S(g)|+1)/2,\lambda^{2}/2\right)\).

With EB estimate of \(\lambda\) we discussed, \[\hat{\tau}_{w_{g}}^{-2} = \frac{|S(g)|\sigma^{2}}{\sum_{s\in S(g)} W_{gs}^{2}}.\]

\[\mathbb{E}\left[W_{gs}|z_{s} = 1\right] = \frac{\mathbf{v}_{g} \mathbf{x}_{s}^\top - \sum_{t\neq s} z_t W_{gt} \mathbf{x}_{t}\mathbf{x}_{s}^\top }{\|\mathbf{x}_{s}\|^{2} + \tau_{w_{g}}^{-2}}\]

\[\mathbb{V}\left[W_{gs}|z_{s} = 1\right] = \frac{\sigma^{2}}{\|\mathbf{x}_{s}\|^{2} + \tau_{w_{g}}^{-2}}\]

Since for each independent \(g\) we have \[\frac{\mathbb{V}\left[W_{gs}|z_{s} = 0\right]}{\mathbb{V}\left[W_{gs}|z_{s} = 1\right]} =\frac{\|\mathbf{x}_{s}\|^{2} + \tau_{w_g}^{-2}}{\sigma^{2}\tau_{w_g}^{-2}},\] and because of independence log ratio of the determinant is simply \[\Lambda_{s} \equiv \frac{1}{2} \log \frac{\mathsf{det}\left(\mathbb{V}[\mathbf{w}_{s}|z_{s}=0]\right)}{\mathsf{det}\left(\mathbb{V}[\mathbf{w}_{s}|z_{s}=1]\right)} = \frac{1}{2} \sum_{g:\,s \in S(g)} \left[ \log(\|\mathbf{x}_{s}\|^{2}/\tau_{w_g}^{-2} +1 ) - \log \sigma^{2} \right].\]

\[M_{s} = \sigma^{-2}\left[ \mathbf{x}_{s} \left( V - \sum_{t\neq s} z_{t} \mathbf{w}_{t} \mathbf{x}_{t} \right)^\top \mathbf{w}_{s} - \frac{1}{2} \|\mathbf{x}_{s}\|^{2} \|\mathbf{w}_{s}\|^{2}\right]\]

\[P(z_{s}=1|\cdot) = \mathsf{sigmoid}\left(M_{s} + \theta_{0} + \theta^\top \mathbf{a}_{s} + \Lambda_{s} \right)\]

Dirichlet process mixture of trans-association model

Estimation of gene clustering

just coordinate step \[q(H_{gk} = 1 |\cdot) \propto \mathcal{N}\left(\mathbf{y}_{g} \vert X^\top \hat\beta_{k}, \sigma^{2} I \right)\]

\[\propto \exp\left( \sigma^{-2} \hat\beta_{k}^\top(X\mathbf{y}_{g}) - \frac{1}{2} \sigma^{-2} (X^\top \hat\beta_{k})^\top (X^\top \hat\beta_{k}) \right)\]

\(\sigma^{-2} \hat{B}^\top X Y\) will be cluster \(\times\) gene and the second term does not depend on genes.

\[q(H_{gk}=1|\cdot) \propto \int d \beta\, \mathcal{N}\left(\mathbf{y}_{g}|X^\top \beta,\sigma^{2}I\right) \mathcal{N}\left(\beta|\hat\beta_{k},\hat\Sigma_{k}\right)\] where \(\hat\beta_{k} = \mathbf{w}_{g} \circ \mathbf{z}_{k}\) and \(\hat\Sigma_{k} = \sigma^{2}D_{0}^{-1}\), and \(D_{0} \equiv \mathsf{diag}\left[\ldots,\|\mathbf{x}_{s}\|^{2} Z_{sk} + \tau_{w_g}^{-2},\ldots\right]\).

Let \[\tilde{\Lambda} = \sigma^{-2} X X^\top + \sigma^{-2}D_{0}\] and \[\tilde{\mathbf{m}} = \tilde{\Lambda}^{-1} \sigma^{-2} (X\mathbf{y} + D_{0} \hat\beta_{k})\]

After integrating out: \[q \propto \vert \hat{\Sigma}_{k} \vert^{-1/2} \vert\tilde{\Lambda}\vert^{-1/2} \exp\left( -\frac{1}{2} \hat\beta_{k}^\top \Sigma_{k}^{-1} \hat\beta_{k} + \frac{1}{2} \tilde{\mathbf{m}}^\top \tilde{\Lambda} \tilde{\mathbf{m}}\right)\]

Cluster-specific SNP selection and parametric inference

Effect size: \[\mathbb{E}[W_{gs}|Z_{sk}=1] = \frac{\mathbf{y}_{g} \mathbf{x}_{s}^\top - \sum_{s'\neq s} Z_{s'k} W_{gs'} \mathbf{x}_{s'} \mathbf{x}_{s}^\top} {\|\mathbf{x}_{s}\|^{2} + \tau_{w_{g}}^{-2}}\] and \[\mathbb{V}[W_{gs}|Z_{sk}=1] = \frac{\sigma^{2}}{\|\mathbf{x}\|^{2} + \tau_{w_g}^{-2}}\]

SNP selection: \[M_{sk} = \mathbf{x}_{s} \left(Y - \sum_{s' \neq s} Z_{s'k} \tilde{\mathbf{w}}_{s'} \mathbf{x}_{s'}\right)^\top \tilde{\mathbf{w}}_{s} - \frac{1}{2}\|\mathbf{x}_{s}\|^{2} \mathbb{E}\|\tilde{\mathbf{w}}_{s}\|^{2}\] where \(\mathbb{E}\|\tilde{\mathbf{w}}_{s}\|^{2} = \sum_{g} H_{gk} (W_{gs})^{2}\)

\[Z_{sk} | \cdot \sim \mathsf{sigmoid}\left( M_{sk}\sigma^{-2} + \theta_{0} + \theta^\top \mathbf{a}_{s} + \Lambda_{sk}\right)\]

\[\Lambda_{sk} = \frac{1}{2} \sum_{g} H_{gk} \left[\log(\|\mathbf{x}_{s}\|^{2}/\tau_{w_{g}}^{-2} + 1) - \log \sigma^{2}\right]\]

Higher order feature association

\(Z_{s} \sim \mathsf{sigmoid}\left(\theta_{0} + \theta^\top \mathbf{a}_{s}\right)\)

We assume sparse prior \(P(\theta|\lambda) \propto \exp\left(-\|\theta\|_{1}\lambda/2\right)\)

\[\mathbb{E}\left[\tau_{\theta_{k}}^{-2}|\cdot\right] = \frac{\lambda}{\sqrt{\theta_{k}^{2}}}\]

Here \[\mathcal{L} = \sum_{s=1}^{n_{\textrm{SNP}}} \left[ Z_{s} (\theta_{0} + \theta^\top\mathbf{a}_{s}) - \log\left(1+e^{-\theta_{0}-\theta^\top\mathbf{a}_{s}}\right) \right] - \frac{1}{2} \sum_{k} \frac{\theta_{k}^{2}}{\tau_{\theta_{k}}^{2}}\]

Approximate log likelihood: \[\mathcal{L}_{Q} = - \frac{1}{2} \sum_{s} \omega_{s}(\kappa_{s} - \theta_{0} - \theta^\top \mathbf{a}_{s})^{2} - \frac{1}{2}\sum_{k} \frac{\theta_{k}^{2}}{\tau_{\theta_{k}}^{2}}\] where \(p_{s} = \mathsf{sigmoid}\left(\theta_0 + \theta^\top\mathbf{a}_{s}\right)\), \(\omega_{s} = p_{s}(1-p_{s})\), and \[\kappa_{s} = \hat{\theta}_{0} + \hat{\theta}^\top \mathbf{a}_{s} + \frac{z_s - p_s}{p_s(1-p_s)}\]

\[\hat\theta_{0} \gets \frac{\sum_{s} \omega_{s}(\kappa_{s} - \theta^\top\mathbf{a}_{s})}{\sum_{s} \omega_{s}}\]

\[\hat{\theta}_{k} \gets \frac{\sum_{s} \omega_{s} \gamma_{s,-k} A_{ks} }{\sum_{s}\omega_{s} A_{ks}^{2} + \tau_{\theta_k}^{-2}}\] where \(\gamma_{s,-k} = \kappa_{s} - \theta_0 - \sum_{k'\neq k} \theta_{k'}A_{k's}\).

Empirical Bayese solution: \[\lambda^{-2} = \frac{1}{2K}\sum_{k} \mathbb{E}\left[\tau_{\theta_{k}}^{2}|\cdot\right] = \frac{\lambda^{-1}}{2K} \sum_{k} \sqrt{\theta_{k}^{2}} + \frac{1}{2}\lambda^{-2}\] At stationary point: \[\hat{\lambda} = \frac{K}{\sum_{k=1}^{K}\sqrt{\theta_{k}^{2}}}\]

More radical choice could be \(\hat{\lambda}_{k} = \sqrt{\theta_{k}^{2}}\) and \[\mathbb{E}\left[\tau_{\theta_k}^{-2}|\cdot\right] \to 1/\mathbb{E}\left[\theta_{k}^{2}\right] \le \hat{\theta}_{k}^{-2}\]

Let \(\mathbb{V}_{\textrm{MLE}}[\theta_{k}] = (\sum_{s=1}^{n_{\textrm{SNP}}}\omega_{s} A_{ks}^{2})^{-1}\)

We can estimate standard error of \(\theta_{k}\): \[\mathbb{V}\left[\theta_{k}\right]^{-1} = \mathbb{V}_{\textrm{MLE}}^{-1} + \mathbb{E}\left[\tau_{\theta_k}^{-2}|\cdot\right] \le \mathbb{V}_{\textrm{MLE}}^{-1} + \hat{\theta}_k^{-2}\]

Or taking into full account of \(\mathbb{E}\left[\theta_{k}^{2}|\cdot\right] = \hat\theta_{k}^{2} + \mathbb{V}\), we have a stationary point solution: \[\mathbb{V}_{\textrm{EB}} = - \frac{\hat\theta_k^2}{2} + \hat\theta_k\sqrt{\mathbb{V}_{\textrm{MLE}} + \hat\theta_k^2/4}\]

\(\mathbb{V}_{\textrm{EB}}[\theta_{k}] \ge (\mathbb{V}_{\textrm{MLE}}[\theta_{k}]^{-1} + \hat\theta_{k}^{-2})^{-1}\)

If \(\mathbb{V}_{\textrm{MLE}}\to 0\), \(\mathbb{V}_{\textrm{EB}} \to 0\). Also if \(\theta \to 0\), \(\mathbb{V}_{\textrm{EB}} \to 0\).

Results

Confounder correction increases statistical power of eQTL discovery

Comparison with vanilla eQTL and PEER-corrected eQTL

Simulation Results

Missing value imputation on universal tissue axes

ModQTL discovery

ModQTL discovery with missing values

Null distribution by permutation

Tissue activities

Bibliography

1.Yuan, M. & Lin, Y. et al. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, 49–67 (2006).

2.Kyung, M., Gill, J., Ghosh, M. & Casella, G.et al. Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis 5, 369–411 (2010).

3.Hughes, M. C. & Sudderth, E. et al. in Advances in neural information processing systems 26 (eds. Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z. & Weinberger, K. Q.) 1133–1141 (Curran Associates, Inc., 2013). at <http://papers.nips.cc/paper/4969-memoized-online-variational-inference-for-dirichlet-process-mixture-models.pdf>

4.Paisley, J. & Carin, L. et al. Nonparametric factor analysis with beta process priors. in Proceedings of the 26th annual international conference on machine learning - iCML ’09 1–8 (ACM Press, 2009). doi:10.1145/1553374.1553474

5.Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R.et al. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nature protocols 7, 500–7 (2012).

6.Shabalin, A. A. et al. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics (Oxford, England) 28, 1353–8 (2012).

7.Qi, J., Asl, H. F., Björkegren, J. & Michoel, T.et al. kruX: matrix-based non-parametric eQTL discovery. BMC bioinformatics 15, 11 (2014).

Confounder correction for tissue matrix

Bayesian rank selection of matrix factorization

Definitions

Model inference by Gibbs sampling

Empirical Bayes estimate of regularization parameter

Sparse reconstruction of missing elements

Global parametric inference

Memoized online learning

Universal and unidirectional tissue factors

ModQTL discovery

Dirichlet process mixture of trans-association model

Estimation of gene clustering

Cluster-specific SNP selection and parametric inference

Higher order feature association

Results

Confounder correction increases statistical power of eQTL discovery

Comparison with vanilla eQTL and PEER-corrected eQTL

Simulation Results

Missing value imputation on universal tissue axes

ModQTL discovery

ModQTL discovery with missing values

Null distribution by permutation

Tissue activities

Bibliography