Methods to identify gene-disease associations primarily rely on clinical trials or observational cohorts and, more recently, Electronic Medical Record-linked DNA Biobanks. At Vanderbilt, we have used an EMR-linked DNA biobank called BioVU to derive case and controls populations using data within the EMR to define clinical phenotypes. Genetic data for these EMR-linked association studies are redeposited into BioVU for future EMR-linked studies. This has opened the possibility of "reverse GWAS" or "Phenome-wide association studies" (PheWAS).
PheWAS using ICD9 codes
Our EMR-based PheWAS uses a custom-developed grouping of International Classification of Disease, 9th edition (ICD9) codes. These groupings loosely follow the 3-digit (category) and section groupings defined with the ICD9 code system itself, but vary to include, for example, all hypertension codes (401-405) as one grouping. Each custom PheWAS code group also has an associated control group that excludes other related conditions (e.g., a patient with Graves disease cannot be a control for thyroiditis).
Our original PheWAS in 2010 using ICD9 codes replicated previously known gene-disease associations for 4/7 diseases (see publication) using records from BioVU, the Vanderbilt DNA biobank. Replicated associations included multiple sclerosis, rheumatoid arthritis, Crohn's disease, and ischemic heart disease. The original PheWAS had 744 clinical case groups.
A 2013 study using this revised model with 1645 hierarchically arranged phenotypes analyzed 3144 SNPs, replicated 210/751 associations (including 66% of those with adequate sample size), and noted 63 new, potentially pleiotropic associations. See http://phewascatalog.org , an online catalog of these results.
Running Your Own PheWAS
The R PheWAS Package is the preferred method to run PheWAS currently. It allows for adjustment and supports the hierarchical model of the PheWAS codes. Click here to go to that page.
Alternatively, you can perform PheWAS running a perl script. The files necessary are available in this zip file : perl_phewas.zip and described below.
- code translation file (original 2010 PheWAS): This file groups ICD9 codes into "phewas codes" of like ICD9 codes. It also defines control ranges ("phewas exclude range") for each "phewas code".
- phewas.pl: A PERL script that takes as its input tab-delimited genotype files, a file containing all ICD9 files for an individual, and a file with race and gender for each individual. It has various options available in the header of the file.
- code_translation_updated.txt: This file contains the latest PheWAS code groupings (~1600 code groups), now arranged hierarchically. A Boolean value "rollup" defines whether the code can be rolled-up to the parent number above it (e.g., "427.3" can be rolled up to "427"). Note: Rollup functionality is not supported in the PERL script currently available. Please use the PheWAS R package that supports the newest hierarchy as well as provide graphing options.
PheWAS by Josh Denny, MD MS is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- Denny JC, Ritchie MD, Basford M, Pulley J, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM, Crawford DC. PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010 Mar 24. PMID: 20335276
- Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, Field JR, Pulley JM, Ramirez AH, Bowton E, Basford MA, Carrell DS, Peissig PL, Kho AN, Pacheco JA, Rasmussen LV, Crosslin DR, Crane PK, Pathak J, Bielinski SJ, Pendergrass SA, Xu H, Hindorff LA, Li R, Manolio TA, Chute CG, Chisholm RL, Larson EB, Jarvik GP, Brilliant MH, McCarty CA, Kullo IJ, Haines JL, Crawford DC, Masys DR, Roden DM. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013 Dec;31(12):1102-10
- Carroll RJ, Bastarache L, Denny JCR PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics. 2014 Aug 15;30(16):2375-6