Directed Reading: Modern Biostatistics and Data Mining

Course Number
CHL7001H S8
7000 (Reading Courses & Research Projects)
Course Instructor(s)
Rafal Kustra

course description

This course will  introduce students to the statistical methods suitable for analysing large observational data, data constructed from multiple institutional databases, web­based data, and any data that may benefit from non­classical approaches. The theory will be presented as an extension of classical tools such as linear and logistic regression, parametric hypothesis testing, multivariate Gaussian theory, to make it more intuitive and accessible.

course objectives

At the end of the course students should be aware of:

1. distinction between, and application of, supervised and unsupervised statistical learning
2. classification problems, similarities between classifiers and regression models;
3. non-­classical regression and classification tools: loess and spline smoothing, tree-­based
methods, and kernel­-based methods;
4. Importance and implementation of prediction error control in statistical modelling using v-fold cross­-validation and leave-one-­out bootstrap; and
5. Importance of, and tools for, complex data handling, testing, and manipulation.

methods of assessment

Assignments (2): 50%
Midterm: 25%
Paper presentation: 15%
Participation: 10%