Skip to content

Statistical Programming and Computation for Health Data

Course Number
CHL5233H
Series
5200 (Biostatistics)
Course Instructor(s)
Aya Mitani

Course Description

This course that covers essential R programming and computational tools in health sciences research. Topics include data manipulation, data visualization, loops and functions, optimization, and package development with an emphasis on reproducibility and replicability.

Course Objectives

Students who complete this course will be able to:

  • Develop a reproducible workflow while integrating version control for conducting research in health data sciences;
  • Use R to prepare an analytical data set and perform descriptive analysis;
  • Produce a visual (graphical or tabular) display of the data that effectively communicates the trend or pattern based on the research question of interest;
  • Write efficient reproducible code throughout the data science project;
  • Write an efficient simulation program;
  • Develop an R package and deploy it on GitHub;
  • Create reproducible professional grade documents (reports, articles, blog posts, presentation slides) using R;
  • Use computational tools (bootstrap, multiple imputation, etc.) to aid in statistical analyses.

Methods of Assessment

Attendance and participation 10%
Pre-class assignments and quizzes 40%
Final project 30%
Final presentation 20%

General Requirements

Students are expected to have some experience with a statistical programming language, preferably R, and a basic understanding of linear regression and logistic regression.