Learning Objectives
- run example GWAS using plink
- recognize plink data formats
- perform QC with plink
- interpret output of plink
This Lab will cover the quality batch portion of the QC and will be using the 1_Main_scripts_QC_GWAS.txt from Marees et al A tutorial on conducting genome‐wide association studies: Quality control and statistical analysis.
Material
We will follow the tutorial published here by Marees et al A tutorial on conducting genome‐wide association studies: Quality control and statistical analysis
See the README file of the tutorial here
Plink
The majority of the code in the lab is written in Plink. Plink is a command line software that is used for quality control, population stratification, and GWAS. Below is the general syntax used for all of plink’s commands. If you get lost use the –help to get a list of available commands. For the purposes of this lab we are going to use Plink 1.9. All of plink’s commands for this lab will be run in the terminal.
Lab
Tutorial files
For this lab we will be conducting the quality control on a subset of the hapmap3 data. It will cover how to adjust for missingness, relatedness, MAF, and HWE. Feel free to check back with the tutorial paper as a reference for definitions and additional command explanations. We will only be doing the first part of the tutorial, 1_QC_GWAS. The 1_QC_GWAS folder will be our working directory for the lab so be sure to navigate there before you begin.
For convenience, I’ve copied the tutorial’s scripts below
1. GWAS QC
- 1_Main_script_QC_GWAS.txt
- check_heterozygosity_rate.R
- Relatedness.R
- hist_miss.R
- pops_HapMap_3_r3
- hwe.R
- MAF_check.R
- gender_check.R
- heterozygosity_outliers_list.R
- inversion.txt
References
Marees, AT, de Kluiver, H, Stringer, S, et al. A tutorial on conducting genome‐wide association studies: Quality control and statistical analysis. Int J Methods Psychiatr Res. 2018; 27:e1608. https://doi.org/10.1002/mpr.1608