S-PrediXcan analysis
To recap, S-PrediXcan analysis takes GWAS summary statistics, gene expression prediction models (weights for SNPs), and reference LD and provides gene-level association between predicted expression and trait. So, it requires the following three pieces of data:
- GWAS summary statistics
- Expression prediction model (available through http://predictdb.org/)
- Reference LD (also from http://predictdb.org/)
The complete analysis workflow is:
- Pick a expression prediction model (Which dataset, population, tissue?)
- Harmonize the GWAS so that it works on the same set of variants as expression prediction model (it may require imputation of GWAS). More details can be found here
- Run MetaXcan script
One caveat is that we need to make sure that the GWAS and prediction model are based on the same population.
Here we provide expression prediction models at /project2/hgen47100/data/lab6/predictdb_mashr_eqtl/mashr/
with extension *.db
which were built from GTEx V8 data.
And these variants are called and labelled specifically by GTEx V8 data.
And reference LD is in the same folder with extension *.txt.gz
.
Luckily we have GWAS results harmonized to GTEx V8 by Alvaro Barbeira at here.
As an example, let’s pick tissue “Whole_Blood” and GWAS “UKB_20002_1223_self_reported_type_2_diabetes”.
python /project2/hgen47100/software2/MetaXcan/software/SPrediXcan.py \
--model_db_path /project2/hgen47100/data/lab6/predictdb_mashr_eqtl/mashr/mashr_Whole_Blood.db \
--model_db_snp_key varID \
--covariance /project2/hgen47100/data/lab6/predictdb_mashr_eqtl/mashr/mashr_Whole_Blood.txt.gz \
--gwas_file /project2/hgen47100/data/lab6/UKB_20002_1223_self_reported_type_2_diabetes.txt.gz \
--snp_column panel_variant_id \
--effect_allele_column effect_allele \
--non_effect_allele_column non_effect_allele \
--zscore_column zscore \
--pvalue_column pvalue \
--keep_non_rsid \
--output_file output/spredixcan_UKB_20002_1223_self_reported_type_2_diabetes.csv
Problem 7:
Which gene is the most significant?
Problem 8:
Repeat the same analysis but with tissue liver. Show your command.
Problem 9:
Visualize the results from two tissues by QQ-plot. Plot \(\log(p)\) where expected p-values on x-axis, and observed p-values on y-axis, and color the two tissues differently.