NLP | Center for Precision Medicine

Tracking medical students' clinical experiences using natural language processing.

Denny JC, Bastarache L, Sastre EA, Spickard A. Tracking medical students' clinical experiences using natural language processing. Journal of biomedical informatics. 2009 Oct;42(42). 781-9. PMID: 19236956 [PubMed]

Graduate medical students must demonstrate competency in clinical skills. Current tracking methods rely either on manual efforts or on simple electronic entry to record clinical experience. We evaluated automated methods to locate 10 institution-defined core clinical problems from three medical students' clinical notes (n=290). Each note was processed with section header identification algorithms and the KnowledgeMap concept identifier to locate Unified Medical Language System (UMLS) concepts.

Evaluation of a method to identify and categorize section headers in clinical documents.

Denny JC, Spickard A, Johnson KB, Peterson NB, Peterson JF, Miller RA. Evaluation of a method to identify and categorize section headers in clinical documents. Journal of the American Medical Informatics Association : JAMIA. 16(16). 806-15. PMID: 19717800 [PubMed] PMCID: PMC3002123

Clinical notes, typically written in natural language, often contain substructure that divides them into sections, such as "History of Present Illness" or "Family Medical History." The authors designed and evaluated an algorithm ("SecTag") to identify both labeled and unlabeled (implied) note section headers in "history and physical examination" documents ("H&P notes").

MedEx: a medication information extraction system for clinical narratives.

Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. Journal of the American Medical Informatics Association : JAMIA. 17(17). 19-24. PMID: 20064797 [PubMed] PMCID: PMC2995636

Medication information is one of the most important types of clinical data in electronic medical records. It is critical for healthcare safety and quality, as well as for clinical research that uses electronic medical record data. However, medication data are often recorded in clinical notes as free-text. As such, they are not accessible to other computerized applications that rely on coded data. We describe a new natural language processing system (MedEx), which extracts medication information from clinical notes. MedEx was initially developed using discharge summaries.

An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records.

Schildcrout JS, Basford MA, Pulley JM, Masys DR, Roden DM, Wang D, Chute CG, Kullo IJ, Carrell D, Peissig P, Kho A, Denny JC. An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records. Journal of biomedical informatics. 2010 Dec;43(43). 914-23. PMID: 20688191 [PubMed] PMCID: PMC2991387 NIHMSID: NIHMS231785.

We describe a two-stage analytical approach for characterizing morbidity profile dissimilarity among patient cohorts using electronic medical records. We capture morbidities using the International Statistical Classification of Diseases and Related Health Problems (ICD-9) codes. In the first stage of the approach separate logistic regression analyses for ICD-9 sections (e.g., "hypertensive disease" or "appendicitis") are conducted, and the odds ratios that describe adjusted differences in prevalence between two cohorts are displayed graphically.

Identification of genomic predictors of atrioventricular conduction: using electronic medical records as a tool for genome science.

Denny JC, Ritchie MD, Crawford DC, Schildcrout JS, Ramirez AH, Pulley JM, Basford MA, Masys DR, Haines JL, Roden DM. Identification of genomic predictors of atrioventricular conduction: using electronic medical records as a tool for genome science. Circulation. 2010 Nov 16;122(122). 2016-21. PMID: 21041692 [PubMed] PMCID: PMC2991609 NIHMSID: NIHMS244204.

Recent genome-wide association studies in which selected community populations are used have identified genomic signals in SCN10A influencing PR duration. The extent to which this can be demonstrated in cohorts derived from electronic medical records is unknown.

Comparing content coverage in medical curriculum to trainee-authored clinical notes.

Denny JC, Speltz P, Maddox R, Stein G, Xu H, Spickard A. Comparing content coverage in medical curriculum to trainee-authored clinical notes. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2010(2010). 157-61. PMID: 21346960 [PubMed] PMCID: PMC3041398

Accurate assessment and evaluation of medical curricula has long been a goal of medical educators. Current methods rely on manually-entered keywords and trainee-recorded logs of case exposure. In this study, we used natural language processing to compare the clinical content coverage in a four-year medical curriculum to the electronic medical record notes written by clinical trainees. The content coverage was compared for each of 25 agreed-upon core clinical problems (CCPs) and seven categories of infectious diseases. Most CCPs were covered in both corpora.

An automated approach to calculating the daily dose of tacrolimus in electronic health records.

Xu H, Doan S, Birdwell KA, Cowan JD, Vincz AJ, Haas DW, Basford MA, Denny JC. An automated approach to calculating the daily dose of tacrolimus in electronic health records. AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science. 2010(2010). 71-5. PMID: 21347153 [PubMed] PMCID: PMC3041548

Clinical research often requires extracting detailed drug information, such as medication names and dosages, from Electronic Health Records (EHR). Since medication information is often recorded as both structured and unstructured formats in the EHR, extracting all the relevant drug mentions and determining the daily dose of a medication for a selected patient at a given date can be a challenging and time-consuming task.

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC, Xu H. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. Journal of the American Medical Informatics Association : JAMIA. 18(18). 601-6. PMID: 21508414 [PubMed] PMCID: PMC3168315

The authors' goal was to develop and evaluate machine-learning-based approaches to extracting clinical entities-including medical problems, tests, and treatments, as well as their asserted status-from hospital discharge summaries written using natural language. This project was part of the 2010 Center of Informatics for Integrating Biology and the Bedside/Veterans Affairs (VA) natural-language-processing challenge.

Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin.

Xu H, Jiang M, Oetjens M, Bowton EA, Ramirez AH, Jeff JM, Basford MA, Pulley JM, Cowan JD, Wang X, Ritchie MD, Masys DR, Roden DM, Crawford DC, Denny JC. Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin. Journal of the American Medical Informatics Association : JAMIA. 18(18). 387-91. PMID: 21672908 [PubMed] PMCID: PMC3128409

DNA biobanks linked to comprehensive electronic health records systems are potentially powerful resources for pharmacogenetic studies. This study sought to develop natural-language-processing algorithms to extract drug-dose information from clinical text, and to assess the capabilities of such tools to automate the data-extraction process for pharmacogenetic studies.

Applying semantic-based probabilistic context-free grammar to medical language processing--a preliminary study on parsing medication sentences.

Xu H, AbdelRahman S, Lu Y, Denny JC, Doan S. Applying semantic-based probabilistic context-free grammar to medical language processing--a preliminary study on parsing medication sentences. Journal of biomedical informatics. 2011 Dec;44(44). 1068-75. PMID: 21856440 [PubMed] PMCID: PMC3226929 NIHMSID: NIHMS318637.

Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently result in two or more parse trees. One possible solution, which has not been extensively explored previously, is to augment productions in medical sublanguage grammars with probabilities to resolve the ambiguity.

RSS: