Software

The following are software tools our team assisted, or led, the development of for open source dissemination.

 

ARX.  This software program a comprehensive open source software for anonymizing sensitive personal data. It supports: 1) Risk-based anonymization using super-population models, strict-average risk and k-map; 2) Syntactic privacy models, such as k-anonymity, ℓ-diversity, t-closeness, δ-disclosure privacy and δ-presence, 3) Semantic privacy models, such as (ɛ, δ)-differential privacy, 4) Methods for optimizing the profitability of data publishing based on monetary cost-benefit analyses, 5) Data transformation with generalization, suppression, microaggregation and top/bottom-coding as well as global and local recoding, 6) Methods for analyzing data utility, 7) Methods for analyzing re-identification risks. The software is able to handle very large datasets on commodity hardware and it features an intuitive cross-platform graphical user interface.

 

Genomic Summary Data Game Solver.  This software program finds the best solution for sharing genomic summary statistics (minor allele frequencies of SNPs) under an economically motivated recipient (adversary)'s inference attack based on a Stackelberg game model we refer to as the Genomic Privacy Game. The goal of the attack is to infer if the DNA record of a targed individual is in a genome pool with published summary statistics (i.e. minor allele frequency of SNPs). The sharer's strategy set is (relaxed) not limited to releasing only top SNPs. This software implements a genetic algorithm (GA) to find the solution that is scalable to large scale datasets.

 

MITRE Identification Scrubber (MIST).  This software incorporates a suite of tools for identifying and redacting personally identifiable information (PII) in free-text medical records. MIST helps replace these PII either with obscuring fillers, such as [NAME], or with artificial, synthesized, but realistic English fillers.

 

Secure Meta-Analysis (SecureMA).  This software implements a secure framework for performing multi-site genomic association studies across large consortia, without violating privacy/confidentiality of individual participants or substudy sites.  Further details can be found in our Bioinformatics paper.