You are here
Theoretical and Applied Data Science Lunch-n-learn - In-Ho Cho
Presenter: In-Ho Cho
Title: Parallel Fractional Hot Deck Imputation for Large/Big Missing Data Curing for Improving Machine Learning and Statistical Inference
Abstract: This talk I will present a recently developed parallel fractional hot deck imputation (P-FHDI) platform that is capable of curing big-n (large instances) or big-p (many variables) incomplete data. The prudent data curing can positively affect the subsequent machine learning and statistical inference, but most of the available statistical methods and tools are not suitable for large/big incomplete data. This talk will cover how to expand the well-proven fractional hot deck imputation method that is free from statistical or distributional assumptions, thereby enabling the solely data-oriented imputation. This talk will briefly cover the central notions and theories of the P-FHDI and offer the publicly available program P-FHDI with scalability performances using examples of big-n data sets (larger than millions of instances) and big-p data sets (larger than 10,000 variables). This work is being supported by NSF CSSI and jointly conducted with Jae-kwang Kim, Professor of Department of Statistics, ISU.
Bio: In-Ho Cho received the PhD degree in civil engineering and minor in Computational Sci and Eng from California Institute of Technology, USA in 2012. He is currently an associate professor of the department of CCEE at ISU. His research interests include data-driven engineering and science, computational statistics, parallel computing, parallel multi-scale finite element analysis, computational and engineering mechanics for soft micro-robotics and nanoscale materials.
After the presentation, there will be a short time for discussion and questions afterwards.