PheMAP–high-throughput Phenotyping by Measured, Automated Profile

PheMAP is a general, automatic, and portable approach to enable accurate high-throughput phenotyping within electronic health records (EHR). PheMAP quantifies relationships between phenotypes and relevant clinical concepts represented by standard medical terminologies. For each individual, PheMAP assigns a score and probability of having a particular phenotype from identified related concepts within EHRs.

We parsed phenotype descriptions from multiple publicly available resources (e.g.,MedlinePlus, MedicineNet, and Wikipedia) using natural language processing (NLP). We mapped the identified concepts to concept unique identifiers (CUIs) from the United Medical Language System (UMLS) and to codes of standard clinical terminologies(e.g., ICD-9-CM, ICD-10-CM, SNOMED CT, CPT, LOINC, and RxNorm). We then weighted each concept relative to a phenotype to reflect how important the concept is to the phenotype in a collection of all phenotype documents.

PheMAP is available for free and is ready to be implemented for 1400 unique phenotypes with EHRs in the OMOP Common Data model. The knowledge base is provided for download as well as a Python script for calculating phenotype scores and phenotype probabilities.

Please contact neil.zheng@vumc.org or wei-qi.wei@vumc.org with any questions.

Citation:

Zheng NS, Feng QP, Kerchberger VE, Zhao J, Edwards TL, Cox NJ, Stein CM, Roden DM, Denny JC, Wei WQ. PheMap: a Multi-resource Knowledgebase for High-throughput Phenotyping within Electronic Health Records. Journal of American Medical Informatics Association. https://doi.org/10.1093/jamia/ocaa104

PheMAP Knowledgebase Downloads:

PheMap_Mapped_Terminologies_1.1.csv – The main knowledge base file containing weighted concepts mapped to standard medical terminologies, e.g.,ICDs, SNOMED CT, CPT, LOINC, and RxNorm.

PheMap_UMLS_Concepts_1.1.csv – The raw PheMAP knowledge base containing weighted concepts mapped to CUIs from UMLS.

ICD_to_Phecode_mapping.csv – Mapping of ICD9CM and ICD10CM to phecode (used in phemap_phenotyping.py).

Phecode_Relationship.csv – The hierarchical relationship mapping between phecodes (used in phemap_phenotyping.py).

README.txt – Description of data elements in the above files.

Scripts:

phemap_phenotyping.py – Python script that calculates PheMap phenotype score and probabilities for EHRs structured with OMOP Common Data Model. The script is meant to be run line-by-line.

Changelog:

PheMap v1.1(07/07/20)

  • Added Medscape as an additional resource for concept extraction and weighting.
  • Improved article text to phecode mapping, increasing available unique phenotypes from 841 to 1400.
  • Improved UMLS CUI to ICD mapping, capturing additional ICD diagnosis codes.

PheMap v1.0 (05/14/20)

  • First release of PheMap built from information from five publicly available resources: Mayo Clinic, MedlinePlus, MedicineNet, WikiDoc, Wikipedia.
  • Contains PheMap quantified concepts for 841 phenotypes.
  • Please refer to original paper for more details.