S-PrediXcan analysis

To recap, S-PrediXcan analysis takes GWAS summary statistics, gene expression prediction models (weights for SNPs), and reference LD and provides gene-level association between predicted expression and trait. So, it requires the following three pieces of data:

The complete analysis workflow is:

  1. Pick a expression prediction model (Which dataset, population, tissue?)
  2. Harmonize the GWAS so that it works on the same set of variants as expression prediction model (it may require imputation of GWAS). More details can be found here
  3. Run MetaXcan script

One caveat is that we need to make sure that the GWAS and prediction model are based on the same population.

Here we provide expression prediction models at /project2/hgen47100/data/lab6/predictdb_mashr_eqtl/mashr/ with extension *.db which were built from GTEx V8 data. And these variants are called and labelled specifically by GTEx V8 data. And reference LD is in the same folder with extension *.txt.gz. Luckily we have GWAS results harmonized to GTEx V8 by Alvaro Barbeira at here.

As an example, let’s pick tissue “Whole_Blood” and GWAS “UKB_20002_1223_self_reported_type_2_diabetes”.

python /project2/hgen47100/software2/MetaXcan/software/SPrediXcan.py \
  --model_db_path /project2/hgen47100/data/lab6/predictdb_mashr_eqtl/mashr/mashr_Whole_Blood.db \
  --model_db_snp_key varID \
  --covariance /project2/hgen47100/data/lab6/predictdb_mashr_eqtl/mashr/mashr_Whole_Blood.txt.gz \
  --gwas_file /project2/hgen47100/data/lab6/UKB_20002_1223_self_reported_type_2_diabetes.txt.gz \
  --snp_column panel_variant_id \
  --effect_allele_column effect_allele \
  --non_effect_allele_column non_effect_allele \
  --zscore_column zscore \
  --pvalue_column pvalue \
  --keep_non_rsid \
  --output_file output/spredixcan_UKB_20002_1223_self_reported_type_2_diabetes.csv

Problem 7:

Which gene is the most significant?

Problem 8:

Repeat the same analysis but with tissue liver. Show your command.

Problem 9:

Visualize the results from two tissues by QQ-plot. Plot \(\log(p)\) where expected p-values on x-axis, and observed p-values on y-axis, and color the two tissues differently.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT.

Suggest changes

If you find any mistakes (including typos) or want to suggest changes, please feel free to edit the source file of this page on Github and create a pull request.

Citation

For attribution, please cite this work as

Yanyu Liang, et al (2021). S-PrediXcan. BIOS 25328 Cancer Genomics Class Notes. /post/2021/04/16/s-predixcan/

BibTeX citation

@misc{
  title = "S-PrediXcan",
  author = "Yanyu Liang, et al",
  year = "2021",
  journal = "BIOS 25328 Cancer Genomics Class Notes",
  note = "/post/2021/04/16/s-predixcan/"
}