Data Science

Click on the different topics within Data Science below to learn more: 

  • The goal of this project is to develop and evaluate data mining methods for detecting suspicious accesses to electronic medical record systems. These methods include user-level profiling (e.g., Who access what? When?), relational or social network analysis of users, and temporal workflows of patients. This project is sponsored as an R01 from the National Library of Medicine and is supplemented through funds from the National Science Foundation TRUST program.

  • This project focuses on the development and application of formal computational and statistical models for the protection of patient information from re-identification. Unlike text de-identification, which leaves potential inferences to be exploited, these approaches provide explicit guarantees about the extent to which data can be linked to external resources for resolution of named patients. This project is mainly concerned with how patient information can be anonymized to support genome-phenome association studies. This project is sponsored, in part, by several U01 grants from the NHGRI/NIH and Canada and Australia.

  • The goal of this project is to evolve access control schemas for clinical information systems through the auditing of user actions. This is a collaborative project sponsored by the NSF with University of Illinois and Northwestern University.

  • This project focuses on the development of natural language processing strategies for redacting identifiers from clinical text. The project focuses on 1) machine learning strategies (e.g., conditional random fields), 2) rules (e.g., dictionary of patient names), and 3) surrogation (i.e., replacement of real identifiers with fake identifiers). This project is sponsored through funding from the NSF Team for Research in Ubiquitous Secure Technologies (TRUST) and an R01 from the National Library of Medicine through a collaboration at the Group Health Research Institute.