Data Resources

VUMC offers multiple resources for investigators to extract or obtain datasets. These resources include options for both fully identified and de-identified data. The Vanderbilt Institute for Clinical and Translational Research (VICTR) maintains two data resources – the Synthetic Derivative (de-identified) and the Research Derivative (fully identified). Each of these resources can be accessed via StarBRITE. VICTR resources use the OMOP data model, allowing for queries to be easily shared between sites whose data warehouses are mapped to OMOP. Epic also has two data models for research data – Clarity and Caboodle. Both Epic resources use the Epic data model, which allows queries to be transferred between Epic sites. In general, VICTR resources contain most clinical data available in the health record, and data are offered longitudinally from the past 15+ years. The Epic resources contain all data from the health record, but only since the migration to Epic in November 2017 for most data. Epic resources also include EHR-related metadata, including user access logs and build data. 

  

 VICTR Resources 

 Epic Resources 

 

 Synthetic Derivative 

 Research Derivative 

 Hyperspace 

 Clarity 

 Caboodle 

 Access Options 

 Record Counter 
 SD Discover 
 Custom Data Extract 

 RD Discover 
 Custom Data Extract 

 eStar 
 Reporting Workbench 

 SQL Queries 
 Custom Data Extract 

 SlicerDicer 
 Custom Data Extract 

 

 

 

 

 

 

 Data Format 

 OMOP 

 OMOP 

 Chronicles (Epic) 

 Epic Data Model 

 Epic Data Model 

 De-identified? 

 Yes 

 No 

 No 

 No 

 No 

 Update Frequency 

 Monthly 

 Monthly 

 Immediately 

 Nightly 

 Nightly 

 Contains pre-eStar data? 

 Yes 

 Yes 

 No 

 No 

 No 

 Contains external data sources? 

 Yes 

 Yes 

 No 

 No 

 Yes 

 Cost 

 Free* 

 Free* 

 Free 

 Yes 

 Free* 

 Turnaround Time 

 Immediate 

 Immediate 

 Immediate 

 Days 

 Immediate 

 Volume of eStar data 

 Some 

 Some 

 All 

 All 

 Most 

*Custom data extracts incur costs which are charged through the respective VUMC core 

VICTR Data Resources

Synthetic Derivative

The Synthetic Derivative (SD) is Vanderbilt’s de-identified research data repository which contains longitudinal data from over 15 years. Researchers most commonly access the SD through either the SD Discover or Record Counter tools, which are maintained by VICTR. Both SD Discover and Record Counter are similar in functionality, each utilizing a drag-and-drop graphical interface to quickly query data. The Record Counter tool is primarily used to explore cohorts for future studies and yields numeric counts of patients according to specified selection or exclusion criteria. IRB approval is not required to use the Record Counter. In contrast, the SD Discover tool offers similar functionality, but yields de-identified patient records for the respective selection criteria which can be used for research studies. The SD also links to BioVU – Vanderbilt’s de-identified DNA biorepostory. Access to the SD is free for all Vanderbilt researchers, but access to the SD Discover tool requires a data use agreement and IRB approval. Highly complex queries that require multiple linkages to different data sources may benefit from the SD Custom Extract service, available from the IDASC core.  

Research Derivative

The Research Derivative (RD) is similar in functionality to the SD, but it contains fully identifiable longitudinal data from over 15 years. Researchers most commonly access the RD through the RD Discover tool, which is maintained by VICTR. RD Discover utilizes a drag-and-drop interface to quickly query data, yielding fully identified patient records for the respective selection criteria. Access to RD Discover is free for all Vanderbilt researchers, but it requires a data use agreement and IRB approval prior to use. Highly complex queries that require multiple linkages from different data sources may benefit from the RD Custom Extract service, available from the IDASC core. 

 

Record Counter 

In the video below, Dr. Steitz demonstrates how to use VUMC’s Record Counter tool to quickly assess record numbers available after applying inclusion and exclusion criteria. This is helpful when planning a research study and needing to assess if there’s a large enough sample for the work to be feasible.

 

Epic Data Resources

Hyperspace 

Hyperspace is the front-end interface to Epic. While most familiar to practicing clinicians, there are several tools—SlicerDicer and Reporting Workbench – that are accessed through Hyperspace and important to the Vanderbilt research data ecosystem. SlicerDicer is Epic’s data exploration tool that allows users to quickly visualize data on large patient populations, but it does not allow for record-level data extracts. Data in SlicerDicer are queried from the Caboodle data model and available with a one-day delay. Hyperspace also provides the Reporting Workbench utility – a self-service data extraction tool that provides datasets in a tabular format, with optional summaries and visualizations, for research and operational use. Reporting Workbench extracts data directly from Hyperspace (Chronicles data model), so data are available immediately once they are entered in the EHR. Both SlicerDicer and Reporting Workbench are free to Vanderbilt researchers with access to Epic, but access to both the tools and specific data elements may be limited by individual security permissions Highly complex reports with custom criteria and columns may benefit from the EHR build service from the VCLIC core. 

Clarity 

Clarity is Epic’s data warehouse, which contains all data from the EHR and is updated nightly. Clarity contains the most comprehensive corpus of EHR data at VUMC, but data are primarily only available since November 2017. Data in Clarity are fully identifiable, and unlike the RD and SD, do not contain linkages to outside data sources. Access to Clarity is available through VCLIC’s Data Access Work Group (VDAWGS), which requires certification or proficiency in Epic’s Clinical Data Model. Many researchers may choose to obtain Clarity data from the VCLIC core, which offers professional data extraction services.  

Caboodle

Caboodle is also an Epic data warehouse, focused specifically on writing efficient reports. Caboodle also contains most, but not all data available in Clarity. Data in Caboodle are fully identifiable and updated nightly. However, unlike Clarity, Caboodle contains linkages to external (non-eStar) data to help simplify data extracts. Caboodle is most commonly accessed using SlicerDicer, which is available through Hyperspace. There is also a SQL interface that is available to VDAWGS members. 

 

Reporting Workbench

Reporting Workbench is a tool within Epic that allows users to run administrator-created reports, build reports from templates, and take actions based on the results of these reports. Reporting Workbench queries Epic’s Chronicles database, which means it can retrieve the most up-to-date information available. This is different from other tools like Slicer Dicer, which uses Caboodle and Clarity database which are extracted from Chronicles nightly. All reports within Reporting Workbench are created from templates which in part determine the master file searched, such as patients or orders. Reports themselves include search criteria and display elements. If you chart search for My Reports you view the library of available templates and reports which can vary based on security and access permissions. In the video below using Epic's playground environment, Dr. McCoy shows how to run a report, view results, and take actions like sending a bulk message to select patients (no real patient data shown). She also shows how to create your own reports and options to consider for different needs.

 

SlicerDicer

SlicerDicer is a self-service reporting tool within Epic that allows users to investigate data and generate visualizations without the need to run reports. In the video below, Dr. McCoy demonstrates how to use SlicerDicer, including how to include or exclude results from the various data models, like Patients, and slice them into buckets to look for patterns in the data.

 

Data Management Exercise Using Excel

In the video below, Dr. Steitz guides users through an exercise demonstrating the power of Excel for handling and wrangling large data sets, including performing description statistics and generating some presentation-ready data plots.

Excel Exercise Data

 

Back to VCLIPS