GWAS: \(\boldsymbol{Y} = \boldsymbol{X}\beta + \boldsymbol{\epsilon}\)

Example with n=4 \[\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}\cdot \beta + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \end{bmatrix}\]

$ N(, ^2)$

\[\begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \end{bmatrix} \sim N \left(\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} + \sigma_\epsilon^2\cdot \begin{bmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ 0&0&0&1 \end{bmatrix}\right) \]

We estimate \(\beta\) using (typically) linear regression. In fact, the estimated \(\hat{beta}\) is an MLE (maximum likelihood estimate) from linear regression.

Using a random effect to account for population structure. \(\boldsymbol{Y} = \boldsymbol{X}\cdot\beta + u + \boldsymbol{\epsilon}\)

In contrast to \(\beta\) \(\epsilon\) is a fixed effect (not random), \(u\) is a random effect. We demonstrate random effects by their distribution, i.e. the parameters of the distribution of the r.v. It’s common to use unusual distribution for that : \(u_i \sim N(\sigma, \sigma^2_g)\)

The full vector of random effects (one per individual) in n=4 example is:

\[\begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} \sim N \left(\sigma, \sigma^2_g\cdot \begin{bmatrix} k_{11}&k_{12}&k_{13}&k_{14}\\ k_{21}&k_{22}&k_{23}&k_{24}\\ k_{31}&k_{32}&k_{33}&k_{34} \\ k_{41}&k_{42}&k_{43}&k_{44} \\ \end{bmatrix} \right) \]

The authors of EMMAX proposed that if we use the genetic relatedness matrix, then this model will be perfect for accounting for population structure and relatedness. Let’s look at a very simple example where the population structure is given by a random effect that depends only on population membership. \[u = \begin{bmatrix} u_\text{AFR} \\ u_\text{AFR} \\ u_\text{AFR} \\ u_\text{AFR} \end{bmatrix} \] \(u_\text{AFR} \sim N(\sigma, \sigma^2_g)\) \(u_\text{EUR} \sim N(\sigma, \sigma^2_g)\) \(u_\text{AFR} \bot u_\text{EUR}\)

We assume that the first two individuals have AFR ancestry and that the last two people have EUR ancestry; we use \(u_\text{AFR}\) to represent AFR specific random effect and \(u_\text{EUR}\) for the EUR specific random effect. Let’s assume that both have the same variance, \(\sigma^2_g\) and that they are independent of each other. Therefore:

\(E u_\text{AFR} = E u_\text{EUR} = 0\) \(E u_\text{AFR}^2 = E u_\text{EUR}^2 = \sigma^2_g\) \(E u_\text{AFR} \cdot u_\text{EUR} = 0\)

\(\boldsymbol{Y} = \boldsymbol{X}\cdot\beta + u + \boldsymbol{\epsilon}\)

\[\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}\cdot \beta + \begin{bmatrix} u_\text{AFR} \\ u_\text{AFR} \\ u_\text{AFR} \\ u_\text{AFR} \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \end{bmatrix}\] Where \(\beta\) is a mixed effect

\[ u \sim N \left (\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \sigma^2_g\cdot K \right) \] \[ \epsilon \sim N \left (\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \sigma^2_e\cdot\begin{bmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ 0&0&0&1 \end{bmatrix} \right) \] Let’s calculate K now. K is a similarity matrix and is sometimes called a kernel.

Calculate \(K = 1/ \sigma^2_g\) \(var(\bar{u})\) \[ Var \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} = Euu' = E \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} \cdot \begin{bmatrix} u_1 & u_2 & u_3 & u_4 \end{bmatrix} \] Using that \(Var(\hat{u}) = E(u-Eu)(u-Eu)' = Euu'\) Since \[Eu = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}\]

\[ \sigma^2_g \cdot \mathbf{K}= E\cdot \begin{bmatrix} u_{11}&u_{12}&u_{13}&u_{14} \\ u_{21}&u_{22}&u_{23}&u_{24} \\ u_{31}&u_{32}&u_{33}&u_{34}\\ u_{41}&u_{42}&u_{43}&u_{44} \end{bmatrix} = \begin{bmatrix} E\cdot u_{11}&E \cdot u_{12}&E \cdot u_{13}&E \cdot u_{14} \\ E\cdot u_{21}&E \cdot u_{22}&E \cdot u_{23}&E \cdot u_{24} \\ E\cdot u_{31}&E \cdot u_{32}&E \cdot u_{33}&E \cdot u_{34} \\ E\cdot u_{41}&E \cdot u_{42}&E \cdot u_{43}&E \cdot u_{44} \end{bmatrix} \] \[ \sigma^2_{g}\cdot \mathbf{K} = \sigma^2_g\cdot\begin{bmatrix} 1&1&0&0 \\ 1&1&0&0 \\ 0&0&1&1 \\ 0&0&1&1 \ \end{bmatrix} \]

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT.

Suggest changes

If you find any mistakes (including typos) or want to suggest changes, please feel free to edit the source file of this page on Github and create a pull request.

Citation

For attribution, please cite this work as

R package build (2021). Mixed Effects Model to Handle Population Structure. BIOS 25328 Cancer Genomics Class Notes. /post/2021/02/11/mixed-effects-model-to-handle-population-structure/

BibTeX citation

@misc{
  title = "Mixed Effects Model to Handle Population Structure",
  author = "R package build",
  year = "2021",
  journal = "BIOS 25328 Cancer Genomics Class Notes",
  note = "/post/2021/02/11/mixed-effects-model-to-handle-population-structure/"
}