Modern Biostatistics and Statistical Learning

Course Number: CHL5229H
Series: 5200 (Biostatistics)
Course Instructor(s): Rafal Kustra

Course Description

This course will introduce students to the statistical methods suitable for analysing large observational data, data constructed from multiple institutional databases, webbased data, and any data that may benefit from nonclassical approaches. The theory will be presented as an extension of classical tools such as linear and logistic regression, parametric hypothesis testing, multivariate Gaussian theory, to make it more intuitive and accessible.

Course Objectives

At the end of the course students should be aware of:

Distinction between, and application of, supervised and unsupervised statistical learning problems;
Classification problems, similarities between classifiers and regression models;
Non-classical regression and classification tools: loess and spline smoothing, treebased methods, and kernel-based methods;
Importance and implementation of prediction error control in statistical modelling using v-fold cross-validation and leave-one-out bootstrap; and
Importance of, and tools for, complex data handling, testing, and manipulation.

Methods of Assessment

Assignments (2)	50%
Midterm	25%
Paper presentation	15%
Participation	10%