Private Hypothesis Testing over Sensitive Groups
As companies work to provide the best possible experience for members, users, and customers, it is crucial to understand how different people – particularly individuals from sensitive groups – have different experiences. For example, do women visit our platform less frequently than members of other genders? Or perhaps, are people with disabilities disproportionately affected by a change to our user interface? However, to run these statistical tests or form estimates to answer these questions, we need to know sensitive attributes. When dealing with personal data, privacy techniques should be considered, especially when we are dealing with sensitive groups, e.g. race/ethnicity or gender. We study a new privacy model where users belong to certain sensitive groups, and we show how to conduct statistical inference on whether there are significant differences in outcomes between the various groups. We introduce a general chi-squared test that accounts for differential privacy in group membership, and show how this covers a broad set of hypothesis tests, improving statistical power over tests that ignore the noise due to privacy. The presentation is based on joint work with Ryan Rogers.
Rina Friedberg is a senior data scientist at LinkedIn, specializing in data privacy research. She focuses on practical problems in data science, including hypothesis testing under differential privacy and quantifying demographic representation changes in large-scale A/B testing systems. She has previously worked in global health statistics, studying clinical trials to evaluate gender-based violence prevention programs for adolescents in Nairobi. Friedberg holds a PhD in statistics from Stanford University.