Developing Electronic Health Record Algorithms That Accurately Identify Patients With Systemic Lupus Erythematosus.

To study systemic lupus erythematosus (SLE) in the electronic health record (EHR), we must accurately identify patients with SLE. Our objective was to develop and validate novel EHR algorithms that use International Classification of Diseases, Ninth Revision (ICD-9), Clinical Modification codes, laboratory testing, and medications to identify SLE patients.

The phenotypic legacy of admixture between modern humans and Neandertals.

Many modern human genomes retain DNA inherited from interbreeding with archaic hominins, such as Neandertals, yet the influence of this admixture on human traits is largely unknown. We analyzed the contribution of common Neandertal variants to over 1000 electronic health record (EHR)-derived phenotypes in ~28,000 adults of European ancestry. We discovered and replicated associations of Neandertal alleles with neurological, psychiatric, immunological, and dermatological phenotypes.

A gene-based association method for mapping traits using reference transcriptome data.

Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype.

Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance.

To evaluate the phenotyping performance of three major electronic health record (EHR) components: International Classification of Disease (ICD) diagnosis codes, primary notes, and specific medications.

Desiderata for computable representations of electronic health records-driven phenotype algorithms.

Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM).

Identifying UMLS concepts from ECG Impressions using KnowledgeMap.

Electrocardiogram (ECG) impressions represent a wealth of medical information for potential decision support and drug-effect discovery. Much of this information is inaccessible to automated methods in the free-text portion of the ECG report. We studied the application of the KnowledgeMap concept identifier (KMCI) to map Unified Medical Language System (UMLS) concepts from ECG impressions. ECGs were processed by KMCI and the results scored for accuracy by multiple raters. Reviewers also recorded unidentified concepts through the scoring interface.

Identifying QT prolongation from ECG impressions using natural language processing and negation detection.

Electrocardiogram (ECG) impressions provide significant information for decision support and clinical research. We investigated the presence of QT prolongation, an important risk factor for sudden cardiac death, compared to the automated calculation of corrected QT (QTc) by ECG machines. We integrated a negation tagging algorithm into the KnowledgeMap concept identifier (KMCI), then applied it to impressions from 44,080 ECGs to identify Unified Medical Language System concepts. We compared the instances of QT prolongation identified by KMCI to the calculated QTc.

Identifying QT prolongation from ECG impressions using a general-purpose Natural Language Processor.

Typically detected via electrocardiograms (ECGs), QT interval prolongation is a known risk factor for sudden cardiac death. Since medications can promote or exacerbate the condition, detection of QT interval prolongation is important for clinical decision support. We investigated the accuracy of natural language processing (NLP) for identifying QT prolongation from cardiologist-generated, free-text ECG impressions compared to corrected QT (QTc) thresholds reported by ECG machines.