Natural Language De-identification

This project focuses on the development of natural language processing strategies for redacting identifiers from clinical text. The project focuses on 1) machine learning strategies (e.g., conditional random fields), 2) rules (e.g., dictionary of patient names), and 3) surrogation (i.e., replacement of real identifiers with fake identifiers). This project is sponsored through funding from the NSF Team for Research in Ubiquitous Secure Technologies (TRUST) and an R01 from the National Library of Medicine through a collaboration at the Group Health Research Institute.