A Robust e-Epidemiology Tool in Phenotyping Heart Failure with Differentiation for Preserved and Reduced Ejection Fraction: the Electronic Medical Records and Genomics (eMERGE) Network.

Identifying populations of heart failure (HF) patients is paramount to research efforts aimed at developing strategies to effectively reduce the burden of this disease. The use of electronic medical record (EMR) data for this purpose is challenging given the syndromic nature of HF and the need to distinguish HF with preserved or reduced ejection fraction. Using a gold standard cohort of manually abstracted cases, an EMR-driven phenotype algorithm based on structured and unstructured data was developed to identify all the cases.

A randomized study of feedback on student write-ups using an electronic portfolio.

Traditional methods allowing medical students and residents to review their work and receive feedback are lacking. We developed a web-based portfolio system that collects all clinical documentation and allows teachers to give feedback electronically. In a randomized control trial, we found that this system significantly increased feedback to students, often exceeding clerkship expectations. Seventy-five percent of students found the system a "valuable teaching tool". Students in control and portfolio groups agreed that the system increased feedback.

Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases.

Identification of a cohort of patients with specific diseases is an important step for clinical research that is based on electronic health records (EHRs). Informatics approaches combining structured EHR data, such as billing records, with narrative text data have demonstrated utility for such tasks. This paper describes an algorithm combining machine learning and natural language processing to detect patients with colorectal cancer (CRC) from entire EHRs at Vanderbilt University Hospital.

Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method.

Semantic lexicons that link words and phrases to specific semantic types such as diseases are valuable assets for clinical natural language processing (NLP) systems. Although terminological terms with predefined semantic types can be generated easily from existing knowledge bases such as the Unified Medical Language Systems (UMLS), they are often limited and do not have good coverage for narrative clinical text. In this study, we developed a method for building semantic lexicons from clinical corpus.

Development of a natural language processing system to identify timing and status of colonoscopy testing in electronic medical records.

Colorectal cancer (CRC) screening rates are low despite proven benefits. We developed natural language processing (NLP) algorithms to identify temporal expressions and status indicators, such as "patient refused" or "test scheduled." The authors incorporated the algorithms into the KnowledgeMap Concept Identifier system in order to detect references to completed colonoscopies within electronic text. The modified NLP system was evaluated using 200 randomly selected electronic medical records (EMRs) from a primary care population aged >/=50 years.

Extracting timing and status descriptors for colonoscopy testing from electronic medical records.

Colorectal cancer (CRC) screening rates are low despite confirmed benefits. The authors investigated the use of natural language processing (NLP) to identify previous colonoscopy screening in electronic records from a random sample of 200 patients at least 50 years old. The authors developed algorithms to recognize temporal expressions and 'status indicators', such as 'patient refused', or 'test scheduled'.

Integrating existing natural language processing tools for medication extraction from discharge summaries.

To develop an automated system to extract medications and related information from discharge summaries as part of the 2009 i2b2 natural language processing (NLP) challenge. This task required accurate recognition of medication name, dosage, mode, frequency, duration, and reason for drug administration.

Natural language processing improves identification of colorectal cancer testing in the electronic medical record.

Difficulty identifying patients in need of colorectal cancer (CRC) screening contributes to low screening rates.

Development and evaluation of an ensemble resource linking medications to their indications.

To create a computable MEDication Indication resource (MEDI) to support primary and secondary use of electronic medical records (EMRs).