Preparing for Big Data in Biomedical Research
These two free software programs will be used throughout the course. Please download them before July 13:
Also, please bookmark these online tools:
If you have limited experience with using R and Linux, please review these materials prior to the start of the course:
Participants who would like a primer on biomedical terminology are encouraged to consult the Beginner’s Guide to RNA-seq or similar resources.
Preparing for Regression and Modeling in R
Before the course starts, students not already familiar with R should practice data importing and simple programming with R and RStudio (both available as free downloads).
An introduction to R that many previous students found helpful can be found at https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf (Venables, Smith, and the R Core Team). See also DataCamp's Quick-R tutorial and Frank E. Harrell Jr.'s Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis.
Preparing for Bayesian Adaptive Clinical Trials for the 21st Century
The course will involve small-group breakout activities to facilitate learning. It is recommended that you download and install the following free software on your laptop prior to the course:
Preparing for Terra-Based Cloud Computing
This course uses Visual Studio Code (VS Code) for navigating course materials and conducting hands-on exercises.
- The community version of VS Code is free and can be downloaded at https://code.visualstudio.com/Download
- Required extensions: Remote-SSH, Python, Jupyter, R
Students must have the following accounts set up by the start of this course:
- GitHub – for code version control
- Personal GitHub accounts created with non-VU/non-VUMC emails are acceptable.
- You must provide your GitHub account username to Dr. Sheng at least one week prior to the start of the course.
- Docker Hub – for publishing Docker images
- Personal Docker accounts created with non-VU or non-VUMC emails are acceptable.
- Google Cloud Platform (GCP)
- Your GCP account must be registered to your VU (vanderbilt.edu) or VUMC (vumc.org) email address.
- You must provide your Google Cloud Platform (GCP) account email to Dr. Sheng at least one week prior to the start of the course.
- Terra
- Your Terra account must be linked to your GCP account (with your VU or VUMC email address).
- When you register for Terra, select “Sign in with Google” (not “Sign in with Microsoft”) to ensure proper access to AGD (Alliance for Genomic Discovery) genomics data, the Synthetic Derivative (SD) BigQuery database, and the introductory data portal.
For students with limited or no experience in Python or SQL, Dr. Sheng strongly recommends reviewing the following resources prior to the start of the course:
- Wes McKinney’s Python for Data Analysis, 3rd Edition Python is utilized in the course for data manipulation and querying the SD BigQuery database.
- SQL. You will use SQL during the course to extract and analyze phenotype data from the SD BigQuery database.
Need to brush up on the prerequisites? Dr. Sheng recommends the following:
- Knowledge of genomics and GWAS fundamentals
- Familiarity with Python and Jupyter Notebook
- Basic proficiency in SQL
- Experience using Linux command-line interfaces
Return to the main Summer Institute page. | Return to Eventbrite