Scalable Data-driven Phenotypes via Unsupervised Feature Learning.

Inferring precise phenotypic patterns from population-scale clinical data is a critical computational task of personalized medicine. The dominant approach uses supervised learning, in which a human expert specifies which patterns to look for (by designating a learning task and class labels) and where to look for them (by constructing input features). This scales poorly and misses the unexpected patterns, which are the most informative.

Design patterns for the development of electronic health record-driven phenotype extraction algorithms.

Design patterns, in the context of software development and ontologies, provide generalized approaches and guidance to solving commonly occurring problems, or addressing common situations typically informed by intuition, heuristics and experience. While the biomedical literature contains broad coverage of specific phenotype algorithm implementations, no work to date has attempted to generalize common approaches into design patterns, which may then be distributed to the informatics community to efficiently develop more accurate phenotype algorithms.

Design and anticipated outcomes of the eMERGE-PGx project: a multicenter pilot for preemptive pharmacogenomics in electronic health record systems.

  • Rasmussen-Torvik LJ, Stallings SC, Gordon AS, Almoguera B, Basford MA, Bielinski SJ, Brautbar A, Brilliant MH, Carrell DS, Connolly JJ, Crosslin DR, Doheny KF, Gallego CJ, Gottesman O, Kim DS, Leppig KA, Li R, Lin S, Manzi S, Mejia AR, Pacheco JA, Pan V, Pathak J, Perry CL, Peterson JF, Prows CA, Ralston J, Rasmussen LV, Ritchie MD, Sadhasivam S, Scott SA, Smith M, Vega A, Vinks AA, Volpi S, Wolf WA, Bottinger E, Chisholm RL, Chute CG, Haines JL, Harley JB, Keating B, Holm IA, Kullo IJ, Jarvik GP, Larson EB, Manolio T, McCarty CA, Nickerson DA, Scherer SE, Williams MS, Roden DM, Denny JC. Design and anticipated outcomes of the eMERGE-PGx project: a multicenter pilot for preemptive pharmacogenomics in electronic health record systems. Clinical pharmacology and therapeutics. 2014 Oct;96(96). 482-9. PMID: 24960519 [PubMed] PMCID: PMC4169732 NIHMSID: NIHMS605974.

We describe here the design and initial implementation of the eMERGE-PGx project.

Extracting research-quality phenotypes from electronic health records to support precision medicine.

The convergence of two rapidly developing technologies - high-throughput genotyping and electronic health records (EHRs) - gives scientists an unprecedented opportunity to utilize routine healthcare data to accelerate genomic discovery. Institutions and healthcare systems have been building EHR-linked DNA biobanks to enable such a vision. However, the precise extraction of detailed disease and drug-response phenotype information hidden in EHRs is not an easy task.

Platelet Inhibitors Reduce Rupture in a Mouse Model of Established Abdominal Aortic Aneurysm.

Rupture of abdominal aortic aneurysms causes a high morbidity and mortality in the elderly population. Platelet-rich thrombi form on the surface of aneurysms and may contribute to disease progression. In this study, we used a pharmacological approach to examine a role of platelets in established aneurysms induced by angiotensin II infusion into hypercholesterolemic mice.

A Robust e-Epidemiology Tool in Phenotyping Heart Failure with Differentiation for Preserved and Reduced Ejection Fraction: the Electronic Medical Records and Genomics (eMERGE) Network.

Identifying populations of heart failure (HF) patients is paramount to research efforts aimed at developing strategies to effectively reduce the burden of this disease. The use of electronic medical record (EMR) data for this purpose is challenging given the syndromic nature of HF and the need to distinguish HF with preserved or reduced ejection fraction. Using a gold standard cohort of manually abstracted cases, an EMR-driven phenotype algorithm based on structured and unstructured data was developed to identify all the cases.

Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research.

To review and evaluate available software tools for electronic health record-driven phenotype authoring in order to identify gaps and needs for future development.

The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future.

  • Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA, Sanderson SC, Kannry J, Zinberg R, Basford MA, Brilliant M, Carey DJ, Chisholm RL, Chute CG, Connolly JJ, Crosslin D, Denny JC, Gallego CJ, Haines JL, Hakonarson H, Harley J, Jarvik GP, Kohane I, Kullo IJ, Larson EB, McCarty C, Ritchie MD, Roden DM, Smith ME, Böttinger EP, Williams MS. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genetics in medicine : official journal of the American College of Medical Genetics. 2013 Oct;15(15). 761-71. PMID: 23743551 [PubMed] PMCID: PMC3795928 NIHMSID: NIHMS495335.

The Electronic Medical Records and Genomics Network is a National Human Genome Research Institute-funded consortium engaged in the development of methods and best practices for using the electronic medical record as a tool for genomic research. Now in its sixth year and second funding cycle, and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from electronic medical records can be used successfully for genomic research.

Development of an ensemble resource linking MEDications to their Indications (MEDI).

Understanding of medications-disease relationships is critical to distinguish indications from adverse effects, and medication exposures serve as important markers of disease and severity in electronic medical records (EMR). We created a computable medication-indication (MEDI) resource by applying natural language processing and ontology relationships to four public medication resources. Physicians evaluated accuracy of medication-indication relationships.

A natural language processing algorithm to define a venous thromboembolism phenotype.

Deep venous thrombosis and pulmonary embolism are diseases associated with significant morbidity and mortality. Known risk factors are attributed for only slight majority of venous thromboembolic disease (VTE) with the remainder of risk presumably related to unidentified genetic factors. We designed a general purpose Natural Language (NLP) algorithm to retrospectively capture both acute and historical cases of thromboembolic disease in a de-identified electronic health record.