You are here
Theoretical and Applied Data Science Seminar - Vipin Kumar
Presenter: Vipin Kumar
Title: Physics Guided Machine Learning: A New Framework for Accelerating Scientific Discovery
Abstract: Physics-based models of dynamical systems are often used to study engineering and environmental systems. Despite their extensive use, these models have several well-known limitations due to incomplete or inaccurate representations of the physical processes being modeled. Given rapid data growth due to advances in sensor technologies, there is a tremendous opportunity to systematically advance modeling in these domains by using machine learning (ML) methods. However, capturing this opportunity is contingent on a paradigm shift in data-intensive scientific discovery since the “black box” use of ML often leads to serious false discoveries in scientific applications. Because the hypothesis space of scientific applications is often complex and exponentially large, an uninformed data-driven search can easily select a highly complex model that is neither generalizable nor physically interpretable, resulting in the discovery of spurious relationships, predictors, and patterns. This problem becomes worse when there is a scarcity of labeled samples, which is quite common in science and engineering domains.
This talk makes a case that in a real-world systems that are governed by physical processes, there is an opportunity to take advantage of fundamental physical principles to inform the search of a physically meaningful and accurate ML model. Even though this will be illustrated for a few problems in the domain of aquatic sciences and hydrology, the paradigm has the potential to greatly advance the pace of discovery in a number of scientific and engineering disciplines where physics-based models are used, e.g., power engineering, climate science, weather forecasting, materials science, and biomedicine.
Bio: Vipin Kumar is a Regents Professor and holds William Norris Chair in the department of Computer Science and Engineering at the University of Minnesota. His research interests include data mining, high-performance computing, and their applications in Climate/Ecosystems and health care. He also served as the Director of Army High Performance Computing Research Center (AHPCRC) from 1998 to 2005. His research has resulted in the development of the concept of isoefficiency metric for evaluating the scalability of parallel algorithms, as well as highly efficient parallel algorithms and software for sparse matrix factorization (PSPASES) and graph partitioning (METIS, ParMetis, hMetis). He has authored over 300 research articles, and co-edited or coauthored 10 books including the widely used text book "Introduction to Parallel Computing", and "Introduction to Data Mining".
Kumar's current major research focus is on bringing the power of big data and machine learning to understand the impact of human induced changes on the Earth and its environment. Kumar’s research on this topic is funded by NSF’s BIGDATA, INFEWS, and HDR programs, as well as DARPA and USGS He has recently finished serving as the Lead PI of a 5-year, $10 Million project, "Understanding Climate Change - A Data Driven Approach", funded by the NSF's Expeditions in Computing program that is aimed at pushing the boundaries of computer science research. Kumar is a Fellow of the ACM, IEEE, AAAS, and SIAM. Kumar's foundational research in data mining and high performance computing has been honored by the ACM SIGKDD 2012 Innovation Award, which is the highest award for technical excellence in the field of Knowledge Discovery and Data Mining (KDD), and the 2016 IEEE Computer Society Sidney Fernbach Award, one of IEEE Computer Society's highest awards in high performance computing.
After the presentation, there will be a short time for discussion and questions afterwards.