Replicating known SNP-disease associations using an EMR
(reported odds ratios 1.14-2.36) in at least two previous studies. We developed automated phenotype identification algorithms that used NLP techniques (to identify key findings, medication names, and family history), billing code queries, and structured data elements (such as laboratory results) to identify cases (n=70-698) and controls (n=808-3818). Final algorithms achieved positive predictive values (PPV) of ≥97% for cases and 100% for controls on randomly selected cases and controls.