Virtual Seminar, Daniela Witten, 'Selective Inference for Trees'

Monday, November 1, 2021 - 11:00am to 12:00pm
Daniela Witten is a coauthor of the popular text Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani.  Since earning her PhD at Stanford in 2010, Professor Witten has had many impressive accomplishments.  At the 2018 Women in Data Science Conference, Professor Witten gave a lecture watched by 100,000 via livestream. 

Abstract, via Professor Witten:

As datasets grow in size, the focus of data collection has increasingly shifted away from testing pre-specified hypotheses, and towards hypothesis generation. Researchers are often interested in performing an exploratory data analysis to generate hypotheses, and then testing those hypotheses on the same data. Unfortunately, this type of 'double dipping' can lead to highly-inflated Type 1 errors. In this talk, I will consider double-dipping on trees. First, I will focus on trees generated via hierarchical clustering, and will consider testing the null hypothesis of equality of cluster means. I will propose a test for a difference in means between estimated clusters that accounts for the cluster estimation process, using a selective inference framework. Second, I'll consider trees generated using the CART procedure, and will again use selective inference to conduct inference on the means of the terminal nodes. Applications include single-cell RNA-sequencing data and the Box Lunch Study. This is collaborative work with Lucy Gao (U. Waterloo), Anna Neufeld (U. Washington), and Jacob Bien (USC).