Theoretical and Applied Data Science Lunch-n-learn - Yan Wang

Thursday, August 20, 2020 - 12:00pm to 1:00pm
Event Type: 

Presenter: Yan Wang

Yan Wang

Title: Wasserstein subsampling: Theory and Empirical Performance  

Abstract: Predictive models are typically constructed based on a set of data points, for each of which both the covariates and the response are available. However, how to make predictions with only partial information is a challenging problem in statistics and machine learning. We in this work consider a situation where all the covariates for $n$ data points are available while the response can only be obtained for a fraction of $m$ data points with $m<n$. The Wasserstein distance is proposed as a metric for subsampling such $m$ points. Risk bounds are established in terms of the Wasserstein distance and the Kullback-Leibler divergence. The performance of this method is evaluated on several real-world datasets.

Bio: Yan Wang is a PhD student in statistics. He earned his first PhD in physics at Beijing Normal University in 2010, then he served as a faculty member at China University of Petroleum before coming to Ames in 2017. His research interests are mainly about applying stochastic and statistical methods in science and engineering.

After the presentation, there will be a short time for discussion and questions afterwards.