Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis.

Electronic Health Records (EHRs) provide a real-world patient cohort for clinical and genomic research. Phenotype identification using informatics algorithms has been shown to replicate known genetic associations found in clinical trials and observational cohorts. However, development of accurate phenotype identification methods can be challenging, requiring significant time and effort.

Detecting abbreviations in discharge summaries using machine learning methods.

Recognition and identification of abbreviations is an important, challenging task in clinical natural language processing (NLP). A comprehensive lexical resource comprised of all common, useful clinical abbreviations would have great applicability. The authors present a corpus-based method to create a lexical resource of clinical abbreviations using machine-learning (ML) methods, and tested its ability to automatically detect abbreviations from hospital discharge summaries.

Predicting warfarin dosage in European-Americans and African-Americans using DNA samples linked to an electronic health record.

Warfarin pharmacogenomic algorithms reduce dosing error, but perform poorly in non-European-Americans. Electronic health record (EHR) systems linked to biobanks may allow for pharmacogenomic analysis, but they have not yet been used for this purpose.

Detecting temporal expressions in medical narratives.

Clinical practice and epidemiological information aggregation require knowing when, how long, and in what sequence medically relevant events occur. The Temporal Awareness and Reasoning Systems for Question Interpretation (TARSQI) Toolkit (TTK) is a complete, open source software package for the temporal ordering of events within narrative text documents. TTK was developed on newspaper articles. We extended TTK to support medical notes using veterans' affairs (VA) clinical notes and compared it to TTK.

A study of transportability of an existing smoking status detection module across institutions.

Electronic Medical Records (EMRs) are valuable resources for clinical observational studies. Smoking status of a patient is one of the key factors for many diseases, but it is often embedded in narrative text. Natural language processing (NLP) systems have been developed for this specific task, such as the smoking status detection module in the clinical Text Analysis and Knowledge Extraction System (cTAKES). This study examined transportability of the smoking module in cTAKES on the Vanderbilt University Hospital's EMR data.

A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries.

Clinical Natural Language Processing (NLP) systems extract clinical information from narrative clinical texts in many settings. Previous research mentions the challenges of handling abbreviations in clinical texts, but provides little insight into how well current NLP systems correctly recognize and interpret abbreviations. In this paper, we compared performance of three existing clinical NLP systems in handling abbreviations: MetaMap, MedLEE, and cTAKES.

A hybrid system for temporal information extraction from clinical text.

To develop a comprehensive temporal information extraction system that can identify events, temporal expressions, and their temporal relations in clinical text. This project was part of the 2012 i2b2 clinical natural language processing (NLP) challenge on temporal information extraction.

Automated identification of drug and food allergies entered using non-standard terminology.

An accurate computable representation of food and drug allergy is essential for safe healthcare. Our goal was to develop a high-performance, easily maintained algorithm to identify medication and food allergies and sensitivities from unstructured allergy entries in electronic health record (EHR) systems.

Applying active learning to high-throughput phenotyping algorithms for electronic health records data.

Generalizable, high-throughput phenotyping methods based on supervised machine learning (ML) algorithms could significantly accelerate the use of electronic health records data for clinical and translational research. However, they often require large numbers of annotated samples, which are costly and time-consuming to review. We investigated the use of active learning (AL) in ML-based phenotyping algorithms.