Natural language processing improves identification of colorectal cancer testing in the electronic medical record.

Difficulty identifying patients in need of colorectal cancer (CRC) screening contributes to low screening rates.

Development and evaluation of an ensemble resource linking medications to their indications.

To create a computable MEDication Indication resource (MEDI) to support primary and secondary use of electronic medical records (EMRs).

The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future.

  • Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA, Sanderson SC, Kannry J, Zinberg R, Basford MA, Brilliant M, Carey DJ, Chisholm RL, Chute CG, Connolly JJ, Crosslin D, Denny JC, Gallego CJ, Haines JL, Hakonarson H, Harley J, Jarvik GP, Kohane I, Kullo IJ, Larson EB, McCarty C, Ritchie MD, Roden DM, Smith ME, Böttinger EP, Williams MS. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genetics in medicine : official journal of the American College of Medical Genetics. 2013 Oct;15(15). 761-71. PMID: 23743551 [PubMed] PMCID: PMC3795928 NIHMSID: NIHMS495335.

The Electronic Medical Records and Genomics Network is a National Human Genome Research Institute-funded consortium engaged in the development of methods and best practices for using the electronic medical record as a tool for genomic research. Now in its sixth year and second funding cycle, and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from electronic medical records can be used successfully for genomic research.

Development of an ensemble resource linking MEDications to their Indications (MEDI).

Understanding of medications-disease relationships is critical to distinguish indications from adverse effects, and medication exposures serve as important markers of disease and severity in electronic medical records (EMR). We created a computable medication-indication (MEDI) resource by applying natural language processing and ontology relationships to four public medication resources. Physicians evaluated accuracy of medication-indication relationships.

Validation and enhancement of a computable medication indication resource (MEDI) using a large practice-based dataset.

Linking medications with their indications is important for clinical care and research. We have recently developed a freely-available, computable medication-indication resource, called MEDI, which links RxNorm medications to indications mapped to ICD9 codes. In this paper, we identified the medications and diagnoses for 1.3 million individuals at Vanderbilt University Medical Center to evaluate the medication coverage of MEDI and then to calculate the prevalence for each indication for each medication. Our results demonstrated MEDI covered 97.3% of medications recorded in medical records.

Electronic health record design and implementation for pharmacogenomics: a local perspective.

The design of electronic health records to translate genomic medicine into clinical care is crucial to successful introduction of new genomic services, yet there are few published guides to implementation.

Integrating EMR-linked and in vivo functional genetic data to identify new genotype-phenotype associations.

The coupling of electronic medical records (EMR) with genetic data has created the potential for implementing reverse genetic approaches in humans, whereby the function of a gene is inferred from the shared pattern of morbidity among homozygotes of a genetic variant. We explored the feasibility of this approach to identify phenotypes associated with low frequency variants using Vanderbilt's EMR-based BioVU resource. We analyzed 1,658 low frequency non-synonymous SNPs (nsSNPs) with a minor allele frequency (MAF)<10% collected on 8,546 subjects.

Parsing clinical text: how good are the state-of-the-art parsers?

Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain.

Size matters: how population size influences genotype-phenotype association studies in anonymized data.

Electronic medical records (EMRs) data is increasingly incorporated into genome-phenome association studies. Investigators hope to share data, but there are concerns it may be "re-identified" through the exploitation of various features, such as combinations of standardized clinical codes. Formal anonymization algorithms (e.g., k-anonymization) can prevent such violations, but prior studies suggest that the size of the population available for anonymization may influence the utility of the resulting data.

Limestone: high-throughput candidate phenotype generation via tensor factorization.

The rapidly increasing availability of electronic health records (EHRs) from multiple heterogeneous sources has spearheaded the adoption of data-driven approaches for improved clinical research, decision making, prognosis, and patient management. Unfortunately, EHR data do not always directly and reliably map to medical concepts that clinical researchers need or use.