An automated approach to calculating the daily dose of tacrolimus in electronic health records.

Clinical research often requires extracting detailed drug information, such as medication names and dosages, from Electronic Health Records (EHR). Since medication information is often recorded as both structured and unstructured formats in the EHR, extracting all the relevant drug mentions and determining the daily dose of a medication for a selected patient at a given date can be a challenging and time-consuming task.

Electronic medical records for genetic research: results of the eMERGE consortium.

Clinical data in electronic medical records (EMRs) are a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network (eMERGE) investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome-wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73 to 98% and negative predictive values of 98 to 100%.

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

The authors' goal was to develop and evaluate machine-learning-based approaches to extracting clinical entities-including medical problems, tests, and treatments, as well as their asserted status-from hospital discharge summaries written using natural language. This project was part of the 2010 Center of Informatics for Integrating Biology and the Bedside/Veterans Affairs (VA) natural-language-processing challenge.

Detecting drug interactions from adverse-event reports: interaction between paroxetine and pravastatin increases blood glucose levels.

The lipid-lowering agent pravastatin and the antidepressant paroxetine are among the most widely prescribed drugs in the world. Unexpected interactions between them could have important public health implications. We mined the US Food and Drug Administration's (FDA's) Adverse Event Reporting System (AERS) for side-effect profiles involving glucose homeostasis and found a surprisingly strong signal for comedication with pravastatin and paroxetine.

Analyses of longitudinal, hospital clinical laboratory data with application to blood glucose concentrations.

Electronic medical record (EMR) systems afford researchers with opportunities to investigate a broad range of scientific questions. In contrast to purposeful study designs, however, EMR data acquisition procedures typically do not align with any specific hypothesis. Subsequent investigations therefore require detailed characterization of clinical procedures and protocols that underlie EMR data, as well as careful consideration of model choice. For example, many intensive care units currently implement insulin infusion protocols to better control patients' blood glucose levels.

Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study.

Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype-phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems.

Predicting clopidogrel response using DNA samples linked to an electronic health record.

Variants in ABCB1 and CYP2C19 have been identified as predictors of cardiac events during clopidogrel therapy initiated after myocardial infarction (MI) or percutaneous coronary intervention (PCI). In addition, PON1 has recently been associated with stent thrombosis. The reported effects of these variants have not yet been replicated in a real-world setting.

Modeling drug exposure data in electronic medical records: an application to warfarin.

Identification of patients' drug exposure information is critical to drug-related research that is based on electronic medical records (EMRs). Drug information is often embedded in clinical narratives and drug regimens change frequently because of various reasons like intolerance or insurance issues, making accurate modeling challenging. Here, we developed an informatics framework to determine patient drug exposure histories from EMRs by combining natural language processing (NLP) and machine learning (ML) technologies.

Detecting abbreviations in discharge summaries using machine learning methods.

Recognition and identification of abbreviations is an important, challenging task in clinical natural language processing (NLP). A comprehensive lexical resource comprised of all common, useful clinical abbreviations would have great applicability. The authors present a corpus-based method to create a lexical resource of clinical abbreviations using machine-learning (ML) methods, and tested its ability to automatically detect abbreviations from hospital discharge summaries.