Skip to content
Rooms 10031 & 10032 700 University Avenue, Toronto, ON M5G 1X6
, , ,
  • May 22-23, 2024 from 3:30pm to 4:00pm


2024 Distinguished Lecture Series in Statistical Sciences: Susan Holmes

Free Hybrid Event (Virtual/In person) | Registration Required

Join us for this year’s Distinguished Lecture Series in Statistical Sciences with:

Susan Holmes
Department of Statistics
Stanford University

Speaker Profile
Susan Holmes has been working in non parametric multivariate statistics applied to Biology since 1985. She started her research career in France at the INRAE institute in Montpellier. She has taught at MIT, Harvard and was an Associate Professor of Biometry at Cornell before moving to Stanford in 1998. She likes working on big messy data sets, mostly from the areas of Immunology, Cancer Biology and Microbial Ecology and her group developed the popular Bioconductor packages phyloseq and dada2 for microbiome data analyses.

Professor Holmes has co-authored an open access book with Wolfgang Huber (EMBL) published by Cambridge University Press on Modern Statistics for Modern Biology based on a popular course she teaches at Stanford. Her work is funded by the NIH and the Bill and Melinda Gates foundation. Her theoretical interests include applied probability, MCMC (Monte Carlo Markov chains), Graph Limit Theory, Differential Geometry and the topology of the space of Phylogenetic Trees.

Hourly Schedule

Wednesday, May 22, 2024

3:30 – 3:35 | CANSSI Ontario Introduction & Welcome

3:35 – 4:30 | Talk Title: Hidden variables: using statistics to decode heterogeneous microbiome data
Abstract: Most studies of clinical or environmental microbiota involve data that are heterogenous at multiple levels. Some of the studies involve response variables that we aim to predict and understand, preterm birth, growth rates in undernourished children, insulin levels in diabetes are some examples. Standard statistical methods usefully separate unknown parameters from the data themselves and provide insight into the optimality properties of some standard estimates. This clarification provides useful insight into uncertainty quantification and enables optimized downstream experimental design. Analogies with methods in textual analyses (Natural Language Processing) such as the use of latent variables methods provides useful interpretations as shown by Sankaran and Holmes, 2018. Testing in the context of combined heterogeneous longitudinal data in perturbation studies of the human microbiome can be even more challenging because there are often a small number of samples with strong dependencies as well as a large number of features from multiple domains. These provide interesting data science challenges where mathematical models of the underlying factors can be plagued with non-identifiability that can make effective uncertainty quantification difficult. We have shown that Bayesian and Bootstrap approaches can provide nonparametric answers to the statistical challenges and have supplemented these with effective visualization techniques distributed as R packages (phyloseq, agPCA, treelapse, bootLong, dada2). This presentation will include joint work with Kris Sankaran, Julia Fukuyama, Ben Callahan, Claire Donnat, Joey McMurdie, Pratheepa Jeganathan, Lan Huong Nguyen and David Relman’s group at Stanford.

4:30 – 5:30 | Reception

Thursday, May 23, 2024

3:30 – 4:30 | Talk Title: Statistics and Geometry for Heterogeneous Data
Abstract: Today’s challenges in immunology and microbiology center around the quantification of uncertainty and the design of experiments for heterogeneous multimodal data. We often have tens of thousands of features and only a few hundred samples. We need to create embeddings for graphs, trees and other non Euclidean objects. Using the sample/feature duality in the data can often provide effective low dimensional representations. However some of the nonlinearities in the underlying factors and non-uniformity in the sampling pose extra challenges. Using local methods inspired by differential geometry, special maps and transformations can enable us to construct accompanying uncertainty contours even for data on curved manifolds. This talk gives examples where we have built software and geometrical tools that provide consensus spaces where we can build the uncertainty maps that we need when designing follow-up experiments. This contains joint work with my past lab members: Lan Huong Nguyen, Elisabeth Purdom, Christof Seiler, Nina Miolane, Claire Donnat, Kris Sankaran and Laura Symul.