Mining Biomedical Literature for Terms related to Epidemiologic Exposures.

Abstract

Epidemiologic studies contribute greatly to evidence-based medicine by identifying risk factors for diseases and determining optimal treatments for clinical practice. However, there is very limited effort on automatic extraction of knowledge from epidemiologic articles, such as exposures, outcomes, and their relations. In this initial study, we developed a system that consists of a natural language processing (NLP) engine and a rule-based classifier, to automatically extract exposure-related terms from titles of epidemiologic articles. The evaluation using 450 titles annotated by an epidemiologist showed the highest F-measure of 0.646 (Precision 0.610 and Recall 0.688) using in-exact matching, which indicated the feasibility of automated methods on mining epidemiologic literature. Further analysis of terms related to epidemiologic exposures suggested that although UMLS would have reasonable coverage, more appropriate semantic classifications of epidemiologic exposures would be required.