You are here
Theoretical and Applied Data Science Lunch-n-learn - Yan Wang
Presenter: Yan Wang
Title: Wasserstein subsampling: Theory and Empirical Performance
Abstract: Predictive models are typically constructed based on a set of data points, for each of which both the covariates and the response are available. However, how to make predictions with only partial information is a challenging problem in statistics and machine learning. We in this work consider a situation where all the covariates for $n$ data points are available while the response can only be obtained for a fraction of $m$ data points with $m<n$. The Wasserstein distance is proposed as a metric for subsampling such $m$ points. Risk bounds are established in terms of the Wasserstein distance and the Kullback-Leibler divergence. The performance of this method is evaluated on several real-world datasets.
Bio: Yan Wang is a PhD student in statistics. He earned his first PhD in physics at Beijing Normal University in 2010, then he served as a faculty member at China University of Petroleum before coming to Ames in 2017. His research interests are mainly about applying stochastic and statistical methods in science and engineering.
After the presentation, there will be a short time for discussion and questions afterwards.