Informatics | Center for Precision Medicine

Chapter 13: Mining electronic health records in the genomics era.

Denny JC. Chapter 13: Mining electronic health records in the genomics era. PLoS computational biology. 8(8). e1002823. PMID: 23300414 [PubMed] PMCID: PMC3531280

The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR) systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DNA biobanks to enable genomic and pharmacogenomic research, using EHR data for phenotypic information.

A study of transportability of an existing smoking status detection module across institutions.

Liu M, Shah A, Jiang M, Peterson NB, Dai Q, Aldrich MC, Chen Q, Bowton EA, Liu H, Denny JC, Xu H. A study of transportability of an existing smoking status detection module across institutions. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2012(2012). 577-86. PMID: 23304330 [PubMed] PMCID: PMC3540509

Electronic Medical Records (EMRs) are valuable resources for clinical observational studies. Smoking status of a patient is one of the key factors for many diseases, but it is often embedded in narrative text. Natural language processing (NLP) systems have been developed for this specific task, such as the smoking status detection module in the clinical Text Analysis and Knowledge Extraction System (cTAKES). This study examined transportability of the smoking module in cTAKES on the Vanderbilt University Hospital's EMR data.

An evaluation of the NQF Quality Data Model for representing Electronic Health Record driven phenotyping algorithms.

Thompson WK, Rasmussen LV, Pacheco JA, Peissig PL, Denny JC, Kho AN, Miller A, Pathak J. An evaluation of the NQF Quality Data Model for representing Electronic Health Record driven phenotyping algorithms. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2012(2012). 911-20. PMID: 23304366 [PubMed] PMCID: PMC3540514

The development of Electronic Health Record (EHR)-based phenotype selection algorithms is a non-trivial and highly iterative process involving domain experts and informaticians. To make it easier to port algorithms across institutions, it is desirable to represent them using an unambiguous formal specification language. For this purpose we evaluated the recently developed National Quality Forum (NQF) information model designed for EHR-based quality measures: the Quality Data Model (QDM).

A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries.

Wu Y, Denny JC, Rosenbloom ST, Miller RA, Giuse DA, Xu H. A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2012(2012). 997-1003. PMID: 23304375 [PubMed] PMCID: PMC3540461

Clinical Natural Language Processing (NLP) systems extract clinical information from narrative clinical texts in many settings. Previous research mentions the challenges of handling abbreviations in clinical texts, but provides little insight into how well current NLP systems correctly recognize and interpret abbreviations. In this paper, we compared performance of three existing clinical NLP systems in handling abbreviations: MetaMap, MedLEE, and cTAKES.

Enabling genomic-phenomic association discovery without sacrificing anonymity.

Heatherly RD, Loukides G, Denny JC, Haines JL, Roden DM, Malin BA. Enabling genomic-phenomic association discovery without sacrificing anonymity. PloS one. 8(8). e53875. PMID: 23405076 [PubMed] PMCID: PMC3566194

Health information technologies facilitate the collection of massive quantities of patient-level data. A growing body of research demonstrates that such information can support novel, large-scale biomedical investigations at a fraction of the cost of traditional prospective studies. While healthcare organizations are being encouraged to share these data in a de-identified form, there is hesitation over concerns that it will allow corresponding patients to be re-identified.

A hybrid system for temporal information extraction from clinical text.

Tang B, Wu Y, Jiang M, Chen Y, Denny JC, Xu H. A hybrid system for temporal information extraction from clinical text. Journal of the American Medical Informatics Association : JAMIA. 20(20). 828-35. PMID: 23571849 [PubMed] PMCID: PMC3756274

To develop a comprehensive temporal information extraction system that can identify events, temporal expressions, and their temporal relations in clinical text. This project was part of the 2012 i2b2 clinical natural language processing (NLP) challenge on temporal information extraction.

Assessment of a pharmacogenomic marker panel in a polypharmacy population identified from electronic medical records.

Oetjens MT, Denny JC, Ritchie MD, Gillani NB, Richardson DM, Restrepo NA, Pulley JM, Dilks HH, Basford MA, Bowton E, Masys DR, Wilke RA, Roden DM, Crawford DC. Assessment of a pharmacogenomic marker panel in a polypharmacy population identified from electronic medical records. Pharmacogenomics. 2013 May;14(14). 735-44. PMID: 23651022 [PubMed] PMCID: PMC3725600 NIHMSID: NIHMS493002.

The ADME Core Panel assays 184 variants across 34 pharmacogenes, many of which are difficult to accurately genotype with standard multiplexing methods.

Applying active learning to high-throughput phenotyping algorithms for electronic health records data.

Chen Y, Carroll RJ, Hinz ER, Shah A, Eyler AE, Denny JC, Xu H. Applying active learning to high-throughput phenotyping algorithms for electronic health records data. Journal of the American Medical Informatics Association : JAMIA. 2013 Dec;20(20). e253-9. PMID: 23851443 [PubMed] PMCID: PMC3861916

Generalizable, high-throughput phenotyping methods based on supervised machine learning (ML) algorithms could significantly accelerate the use of electronic health records data for clinical and translational research. However, they often require large numbers of annotated samples, which are costly and time-consuming to review. We investigated the use of active learning (AL) in ML-based phenotyping algorithms.

Response to 'Use of an algorithm for identifying hidden drug-drug interactions in adverse event reports' by Gooden et al.

Tatonetti NP, Denny JC, Altman RB. Response to 'Use of an algorithm for identifying hidden drug-drug interactions in adverse event reports' by Gooden et al. Journal of the American Medical Informatics Association : JAMIA. 2013 May;20(20). 591 p. PMID: 23876381 [PubMed] PMCID: PMC3628071

Analyzing differences between chinese and english clinical text: a cross-institution comparison of discharge summaries in two languages.

Wu Y, Lei J, Wei WQ, Tang B, Denny JC, Rosenbloom ST, Miller RA, Giuse DA, Zheng K, Xu H. Analyzing differences between chinese and english clinical text: a cross-institution comparison of discharge summaries in two languages. Studies in health technology and informatics. 192(192). 662-6. PMID: 23920639 [PubMed]

Worldwide adoption of Electronic Medical Records (EMRs) databases in health care have generated an unprecedented amount of clinical data available electronically. There has been an increasing trend in US and western institutions towards collaborating with China on medical research using EMR data. However, few studies have investigated characteristics of EMR data in China and their differences with the data in US hospitals.

RSS: