Skip to content
Location
Virtual
Series/Type
Dates
  • November 29, 2021 from 3:30pm to 4:30pm

Links

Talk Title: Policy Questions, Messy Data: Three approaches to turning messy data into information for public policy

Abstract:

This talk describes 3 projects demonstrating the development of statistical methods to use messy data on people to draw conclusions for public health or policy. It concludes with some comments on the contribution of statistics to such settings.

For the first part consider: How do you sample populations such as people who inject drugs? You find a few, then get them to recruit their friends. But how do you make inference from the resulting sample? Respondent-Driven Sampling attempts to allow for statistical inference in this challenging data setting. In particular, we address the question of how to cluster the network tree-structured data collected using RDS based on covariates and partial network observation.

In the second part, we address the challenge of estimating the number of killings in the Syrian conflict. The challenge would be easy if there were a list of killings, and several groups are working on creating such lists. Unfortunately, none of the lists are complete. We introduce a method to use hierarchical clustering to characterize the partial overlap between 4 separately-collected lists of killings to estimate the total killings. This is an extension of the classical capture-recapture or multiple systems estimation methods.

Finally, we discuss a difficult issue in analysis of network data: In many cases, networks are undirected (if I had lunch with you, you also had lunch with me), but two parties reporting on the same relation may give conflicting reports. We address this problem in the context of a longitudinal study of (several types of) social relations and health behaviors among middle school students. Leveraging the multiple networks observed for each student, we estimate the false reporting rates of each student to infer the true network structure.