Venue: Iowa State University Memorial Union (Various Rooms - see schedule below)
May 16th (Memorial Union - Great Hall):
8:30 AM: Opening Remarks
8:45–10:00 AM: Invited Talk, Natural Language Trends in Visual Analysis, Dr. Vidya Setlur, Tableau Research
10:00 AM—Noon: Introduction to Data Science with Python, Dr. Hongyang Gao
1:30–4:30 PM: Data Science with R, Dr. Heike Hoffman
May 17th (Memorial Union - Great Hall):
9:00 AM—10:00 AM: Invited Talk, Robust Routing Using Electrical Flows, Dr. Ali Sinop, Google Research
10:00 AM—Noon: Mathematics of Data Science, Dr. Man Basnet
1:30–4:30 PM: Introduction to Data Visualization, Dr. Heike Hoffman
May 18th (Memorial Union - Great Hall):
9:00 AM—Noon: A Deep Dive into Deep Learning Algorithms and Architectures by Dr. Aditya Balu
1:30–4:00 PM: A Deep Dive into Deep Learning Algorithms and Architectures by Dr. Aditya Balu
May 19th - Tracks (Concurrent):
Fairness and Accountability in Data Science (Memorial Union - Oak Room)
8:30 AM - 10:00 AM: Facilitating Public Trust in Automated Decision-Making: A Los Angeles Police Department (LAPD) Case Study by Dr. Shannon Harper.
10:30 AM - 12:00 PM: Learning Twitter Policy Agenda of State Legislatures by Dr. Wallapak Tavanapong and Dr. David Peterson
1:30 PM - 3:00 PM: Challenges in Content Moderation: detecting disinformation, hate speech, and offensive imagery by Dr. Matthew Lease
3:15 PM - 4:45 PM: Public Trust in AI and Data Science Lifecycles by Dr. Eric Weber
Digital Agriculture (Memorial Union - Cardinal Room)
8:30 AM - 10:00 AM: From 3D optical sensing to 4D hyperspectral imaging by Dr. Beiwen Li
10:30 AM - 12:00 PM: Deep Learning and Computer Vision in Agriculture by Harman Singh Sangha
1:30 PM - 3:00 PM: YOLOv5 Custom Training Tutorial for Object Detection by Ryan Jeon
3:15 PM - 4:45 PM: Simulation of Realistic Granular Agricultural Material Behavior Using a Physics-Based Game Engine by Joy Li and Dr. Hantao He
Data Mining and Machine Learning (Memorial Union - Campanile Room)
8:30 AM - 10:00 AM: Genomic data resources and data mining for everyone by Dr. Krishna Kalari
10:30 AM - 12:00 PM: Information Surgery: Faking Multimedia Fake News for Real Fake News Detection by Dr. Heng Ji
1:30 PM - 3:00 PM: Harnessing AI for Software Testing and Repair by Dr. Myra Cohen
3:15 PM - 4:45 PM: Multimodal Computer Vision: Foundations, Applications, and Societal Implications by Dr. Chris Thomas
Tutorial will cover some or all of the following as time permits:
Introduction on Data Science
Basic data exploration and visualization techniques
Foundational Statistics concepts for data analysis
Python programming for Data Science: data types, Selections, Iterations, Functions, and Working with Lists
Working with Data Sets and Data Frames: Working with Pandas Dataframes
Data Manipulation and Visualization
Data Science Process: How to do the Data Project
Introduction to Interactive Data Visualization with Bokeh
Intro to Machine Learning with scikit-learn: Classification and Regression
Hongyang Gao received his Ph.D. degree from Texas A&M University in College Station, Texas, in 2020. Currently, he is an Assistant Professor in the Department of Computer Science at Iowa State University in Ames, Iowa. His research interests include machine learning, deep learning, and data mining. Before his Ph.D. work, he received his M.S. from Tsinghua University in 2012 and his B.S. from Peking University in 2009.
Topics include some or all of the following as time permits:
- Set theory and logic
- Variables, functions, equations, and graphs
- Review of Calculus
- Review of Mathematical Statistics
- Review of Discrete Mathematics
- Optimization used in Data Science, including gradient descent algorithm
- Basics of Linear Algebra, Matrices, Eigenvalues, and Eigenvectors
- Vector spaces, inner products, different types of norms, and distances
- Application of Eigenvalues and Eigenvectors in Data Science
- Some advanced topics such as sparsity
Man Basnet, Ph.D., is the Associate Teaching Professor in the Department of Mathematics and Data Science at Iowa State University. He teaches data science and mathematics courses, and coordinates the differential equation. He is also involved in designing Mathematical Methods in Data Science (Math 408x) and guides undergraduate and university honor students with their undergraduate research in data sciences. He has received grants from ELO/LAS for developing online courses and attending conferences in learning analytics and data science. He has received teaching excellence awards from the College of LAS and the Math Department at Iowa State University.
- In this tutorial, we will be covering the theory and practical implementation of modern deep learning algorithms. We will divide the tutorial into two sessions. The first session covers the building blocks of neural networks such as back-propagation and optimization algorithms and some hands-on supervised deep learning coding. The second session will be to more deep dive into the nuts and bolts of deep learning, and more modern architectures such as ResNets, Transformers etc. and algorithms such as Generative Adversarial Networks, Reinforcement Learning etc. We will finally conclude with some emerging topics in deep learning.
Aditya Balu is currently a Data Scientist at the Translational AI Center at Iowa State University, where he is working on generative designs physics-aware deep learning methodologies. Previously, he finished his Ph.D. in Deep Learning and GPU computing for Design and Manufacturing at Iowa State University. During his graduate studies, he also interned at ANSYS Inc., where he contributed to Deep Learning-based Topology Optimization. Before his graduate studies, he has also worked in the oil and gas industry at FMC Technologies (now known as Technip FMC), as a product design engineer where he was designing connectors and manifolds for subsea high pressure and high-temperature applications.
- Come and learn about the basics of Computer Vision and Machine Learning. The workshop will include a simple end-to-end machine learning computer vision project, image data collection best practices, deep learning with computer vision as well as deep learning packages in the python coding library. The participants should have basic operation knowledge of Python programming language. A laptop with a web browser is required for this workshop.
Harman is a third-year Ph.D. candidate in the Agricultural and Biosystems Engineering department at Iowa State. His research focus is the development of machine learning and computer vision systems for agricultural machinery. He has worked on several projects optimizing object detection, classification, and monocular depth estimation deep learning models for agricultural applications.
- In this class, I will cover genomics data resources that are publically available for all of us to query and efficiently conduct data analytics. Specifically, I will focus on cancer and neuro genomics data resources that are freely available. I will walk the users through how to query the data efficiently and conduct simple analysis on their own using web browsers and databases.
Dr. Kalari graduated from the University of Iowa with a Ph.D. in Biomedical Engineering. She is now an Associate Professor in the department of quantitative health sciences at the Mayo Clinic. She leads a computational group that studies problems at the intersection of cancer, pharmacogenomics, and individualized medicine. Her research covers topics including cancer genomics, individual variation to drug response, gene regulation, mutational processes, machine learning, kinetics modeling, and microenvironment. Her primary knowledge is in the area of developing cutting-edge computational and statistical genomics approaches to analyze high-throughput datasets. Her group has developed and published several algorithms and computational workflows to investigate omics datasets, and she currently leads a team of research scientists and graduate students
Dr. Vidya Setlur, Tableau Research, speaking on 'Natural Language Trends in Visual Analysis'
'Natural language processing has garnered interest in helping people interact with computer systems to make sense and meaning of the world. In the area of visual analytics, natural language has been shown to help improve the overall cognition of visualization tasks. In this talk, Vidya will discuss how natural language can be leveraged in various aspects of the analytical workflow ranging from smarter data transformations, visual encodings, and autocompletion to supporting analytical intent. More recently, chatbot systems have garnered interest as conversational interfaces for a variety of tasks. Vidya will explore the implications of these data-driven approaches in broadening the scope for visual analysis workflows. She will also discuss the future directions for research and innovation in this space.'
Vidya Setlur is the director of Tableau Research. She leads an interdisciplinary team of research scientists in areas including data visualization, multimodal interaction, statistics, applied ML, and NLP. She earned her doctorate in Computer Graphics in 2005 at Northwestern University. Prior to joining Tableau in 2012, she worked as a principal research scientist at the Nokia Research Center for seven years. Her personal research interests lie at the intersection of natural language processing and computer graphics. She combines concepts and methods from information retrieval, human perception, and cognitive science to help users effectively interact with devices and information in their environment. A significant portion of her work covers the investigation, prototyping, and evaluation of such novel concepts.
Dr. Ali Sinop, Google Research, speaking on 'Robust Routing Using Electrical Flows'
'Generating alternative routes in road networks is an application of significant interest for online navigation systems. A high quality set of diverse alternate routes offers two functionalities - a) support multiple (unknown) preferences that the user may have; and b) robustness to changes in network conditions. In this talk, I will first describe an approach for computing alternate routes using electrical flows; and other uses of electrical flows in data mining. Then I will talk about the core component of our algorithm, which is a fast solver for electrical flows: Our algorithm allows one to extract the subgraph that carries the most amount of electrical flow in sub-linear time.'
Ali Kemal Sinop is a senior research scientist at Google Research. He received a Ph.D. degree in Computer Science from Carnegie Mellon University in 2012. His research interests are in theoretical computer science, linear algebra, and computational mobility.
Chris Thomas, Columbia University, Speaking on 'Multimodal Computer Vision: Foundations, Applications, and Societal Implications'
'This talk will begin with a crash course on machine learning and computer vision. We will begin by providing a a high-level introduction to deep neural networks. Students will get an intuitive understanding of how modern computer vision models are trained and how they work. Next, we will discuss how these models can be extended to also reason about other modalities of data (e.g. text), in addition to visual data. We will then spotlight a few exciting projects that apply the ideas we have discussed, including disinformation detection, image synthesis, and others. We will then show practical use cases, including how vision algorithms can be used to help non-computer scientists (e.g. journalists). We will also discuss the practical, ethical, and societal implications of this work. Finally, we will point students to additional resources where they can learn more and get involved.'
Chris Thomas is a postdoctoral researcher at Columbia University working with Professor Shih-Fu Chang. He will be joining Virginia Tech as an assistant professor in Fall 2022. He currently represents Columbia as part of the DARPA Semantic Forensics project focused on detecting, attributing, and characterizing disinformation. His research lies at the intersection of computer vision, natural language processing, and machine learning. He received his PhD in Computer Science from the University of Pittsburgh in 2020, where he was advised by Professor Adriana Kovashka. His work has appeared in top conferences and journals, including CVPR, NeurIPS, ECCV, and IJCV.
Shannon B. Harper, Ph.D., Department of Sociology and Criminal Justice, Iowa State University, speaking on 'Facilitating Public Trust in Automated Decision-Making: A Los Angeles Police Department (LAPD) Case Study'
Automated decision-making systems are being increasingly deployed and affect the public in a multitude of positive and negative ways. Governmental and private institutions use these systems to process information according to certain human-devised rules in order to address social problems or organizational challenges. Both research and real-world experience indicate that the public has a lack of trust in automated decision-making systems and the institutions that deploy them. The recreancy theorem argues that the public is more likely to trust and support decisions made or influenced by automated decision-making systems if the institutions that administer them meet their fiduciary responsibility. However, often the public is never informed of how these systems operate and resultant institutional decisions are made. A “black box” effect of automated decision-making systems reduces the public’s perceptions of integrity and trustworthiness. Consequently, the institutions administering these systems are less able to assess whether the decisions suggested are just; and the public loses the capacity to identify, challenge, and rectify unfairness or the costs associated with the loss of public goods or benefits. This presentation will define and explain the role of fiduciary responsibility within an automated decision-making system. To instantiate fiduciary responsibility, a Los Angeles Police Department (LAPD) predictive policing case study is examined, and large dataset options that can be used to model this argument within a criminal justice context will be explored.
Dr. Shannon Harper’s research explores the relationship between intimate partner violence (IPV) and intimate partner homicide and the gendered and intersectional contexts through which both are experienced and occur. Her work utilizes big data and advanced statistical models to examine homicide and IPV homicide trends across U.S. cities. Dr. Harper also applies mixed-methods to examine the community social, cultural, and structural factors that shape/affect women’s victimization and offending experiences, and the ways in which survivors perceive the usefulness of the criminal justice system and community resources in facilitating justice and safety.
Matthew Lease, Ph.D., Department of Information Science, University of Texas, speaking on 'Challenges in Content Moderation: Detecting disinformation, hate speech, and offensive imagery'
While technology now enables us to share content with one another more quickly and easily than ever before, some content can be individually or collectively harmful, such as fake news, hate speech, and disturbing imagery and multimedia content. This necessitates content moderation: detection and filtering of objectionable content. How can we moderate content accurately, scalably, affordably, safely, and fairly? Because even state-of-the-art AI is imperfect and the stakes can be high, approaches span automated AI, human-in-the-loop hybrid systems, and manual, human review. The challenges are vast: potential human harm from exposure (to researchers, annotators, testers, end-users, etc.), bias in how raw data is sampled and annotated, legal limitations on possessing or sharing data, how much context is required for effective task representation and annotation, lack of clear definitions and disagreement in identifying problematic content, dataset incompatibility for multi-dataset training or testing, repeat offenders being prominent but also biasing dataset coverage and posing new risks of user profiling, societal polarization amplifying challenges in using or trusting blackbox AI models, politics complicating source modeling of unreliable information providers, and different groups and often vulnerable populations being disproportionately targeted and impacted by disinformation. This talk will review such challenges and potential ways for addressing them.
Matthew Lease is an Amazon Scholar and an Associate Professor in the School of Information at the University of Texas at Austin. He is a faculty leader of Good Systems (http://goodsystems.utexas.edu/), a university-wide Grand Challenge "moonshot" to design responsible AI technologies. Lease's research spans the fields of crowdsourcing / human computation, information retrieval, and natural language processing. His work combines human-centered and system-centered approaches to build quality datasets, create fair and explainable predictive models, design human-in-the-loop systems, and find win-wins for data annotators as well as requesters. Lease is the recipient of three early career awards (NSF, DARPA, and IMLS) and several paper awards. Lease's industry experience also includes stints at Intel Research, Computer game company HyperBole Studios, image compression startup LizardTech, and crowdsourcing startup CrowdFlower. For more information, please see his homepage: https://www.ischool.utexas.edu/~ml/.
Wallapak Tavanapong, Department of Computer Science, Iowa State University and David Peterson, Department of Political Science, Iowa State University, speaking on 'Learning Twitter Policy Agenda of State Legislatures'
Legislators and legislatures have an incentive to communicate their policy preferences and which issues are at the top of their agendas. Twitter provides a unique tool for communicating directly to an audience and tweets serve an important role as the public-facing agenda of their policy attention. We collect the Twitter activity of every state legislator in America to measure the attention that state legislatures pay to the categories developed in the Policy Agenda Project (PAP). We apply our recently developed machine learning tool to measure the proportion of tweets in the PAP policy categories for every state legislature. Our results show that the legislatures from states that have similar geography, politics, institutional capacity, and populations have similar public-facing agendas. These results further our understandings of state politics, legislator communications, and agenda setting. This talk will also describe our deep quantification learning framework (DQN). DQN provides a more accurate estimation of the proportion of documents (tweets) in the categories than recent classification-based quantification methods.
David Peterson is the Lucken Professor of Political Science and former editor of Political Behavior. His research focuses on American politics, particularly elections, public opinion, and voting behavior. His recent book Ignored Racism was published by Cambridge University Press and was the co-recipient of the 2021 Best Book Award from the Race, Ethnicity, and Politics Section of the American Political Science Association.
Wallapak Tavanapong is a Professor of Computer Science. She is also a co-founder and Chief Technology Officer of EndoMetric Corporation, offering computer-aided technology to improve patient care. Her research has been partially funded by the National Science Foundation, National Institutes of Health, and others. She was awarded a U.S. patent on Colonoscopy Video Processing for Quality Metrics Determination in 2011 and the Association for Education in Journalism and Mass Communication “Top Teaching Paper” award in 2017. Her recent IEEE journal article is Artificial Intelligence for Colonoscopy: Past, Present, and Future.
Dr. Beiwen Li, Department of Mechanical Engineering and High-dimensional Optical Sensing Laboratory, Iowa State University, 'From 3D optical sensing to 4D hyperspectral imaging'
Recent major advancements in computing power and graphics have allowed 3D optical sensing to dramatically benefit many fields including engineering, medicine, and entertainment. Capabilities (e.g., speeds and accuracies) that were barely possible a few years ago, are now being introduced into the commercial marketplace. Despite the remarkable progress, the majority of research in 3D optical sensing is still centered on extracting accurate surface geometric profiles. Yet, obtaining solely the surface topographical information may not satisfy the need in certain applications where additional physicochemical properties need to be characterized. This talk will provide an overview of the recent progresses of 3D optical sensing research developed in my research lab as well as a newly developed 4D hyperspectral imaging system which can obtain high-resolution spectral information and accurate geometrical information. The success of this proposed method will be demonstrated through its application to the non-destructive evaluation of spinach leave samples. Such method has the potential for future applications in the food industry.
Dr. Beiwen Li is a William and Virginia Binger Assistant Professor of Mechanical Engineering and the director of High-dimensional Optical Sensing Laboratory at Iowa State University. He received his Ph.D. degree in Mechanical Engineering from Purdue University in 2017. His research focuses on superfast kilohertz 3D optical sensing, precision 3D optical metrology, 3D point cloud data analysis and in-situ monitoring for additive manufacturing. Several of his research works have been highlighted on the cover page of prestigious journals including Optics Express, Applied Optics and Geotechnique Letters. He is the recipient of 2020 SPIE Defense & Commercial Sensing Rising Researcher Award and named as a 2021 Emerging Leader by Measurement Science and Technology Journal.
Ryan Jeon, Department of Agricultural and Biosystems Engineering, Iowa State University, 'YOLOv5 Custom Training Tutorial for Object Detection'
This talk will present key findings from a computer vision-based approach to automatically determine leg angles from images of sows. In this study, a deep learning model was trained to classify and detect sow body landmarks from the side and rear view. The demo will feature a short tutorial on custom object detection using YOLOv5. This approach will be of interest to data scientists looking to detect and localize objects in images.
Ryan Jeon is a PhD student from Iowa State University’s department of Agricultural and Biosystems Engineering. At ISU, Ryan explored different computer vision techniques for objectively and automatically assessing reproductive traits in sows.
Dr. Myra Cohen, Department of Computer Science, Iowa State University, 'Harnessing AI for Software Testing and Repair'
Many tasks in software engineering can be formulated as an optimization problem where the goal is to find the best solution while obeying a given set of constraints. One way to solve such a problem is to enumerate all feasible solutions, visit each in some order, and select the best. However, the space of solutions is often exponential with respect to the size of the inputs meaning this approach will not scale to most real software engineering tasks. Instead, we can leverage meta-heuristic search algorithms, a class of artificial intelligence algorithms that often mimic biological processes such as evolution or insect swarming and can guide us efficiently towards good solutions. In this talk I will present an overview of evolutionary algorithms and describe how they have been used successfully to automate two difficult tasks in software engineering, test generation and program repair.
Myra Cohen is a Professor and the Lanh and Oanh Nguyen Chair in Software Engineering in the Department of Computer Science at Iowa State University. Her research interests are in software testing of highly configurable software, search-based software engineering, applications of combinatorial designs, and synergies between software engineering, and systems and synthetic biology. She was the recipient of both an NSF CAREER and AFOSR Young Investigator Award and has received 4 ACM distinguished paper awards. She is a Fellow of the IEEE/ACM International
Conference on Automated Software Engineering and is active in several software engineering conference organizational roles and steering committees. She was the program co-chair for ESEC/FSE 2020 and ICST 2019 and the general chair of Automated Software Engineering in 2015. She is an ACM Distinguished Scientist.
Dr. Heng Ji, Department of Computer Science and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 'Information Surgery: Faking Multimedia Fake News for Real Fake News Detection'
We are living in an era of information pollution. The dissemination of falsified information can cause chaos, hatred, and trust issues among humans, and can eventually hinder the development of society. In particular, human-written disinformation, which is often used to manipulate certain populations, had catastrophic impact on multiple events, such as the 2016 US Presidential Election, Brexit, the COVID-19 pandemic, and the recent Russia's assault on Ukraine. Hence, we are in urgent need of a defending mechanism against human-written disinformation. While there has been a lot of research and many recent advances in neural fake news detection, there are many challenges remaining. In particular, the accuracy of existing techniques at detecting human-written fake news is barely above random. In this talk I will present our recent attempts at tackling four unique challenges in the frontline of combating fake news written by both machines and humans: (1) Define a new task on knowledge element level misinformation detection based on cross-media knowledge extraction and reasoning to make the detector more accurate and explainable; (2) Generate training data for the detector based on knowledge graph manipulation and knowledge graph guided natural language generation; (3) Use Natural Language Inference to ensure the fake information cannot be inferred from the rest of the real document; (4) Propose the first work to generate propaganda for more robust detection of human-written fake news.
Heng Ji is a professor at Computer Science Department, and an affiliated faculty member at Electrical and Computer Engineering Department of University of Illinois at Urbana-Champaign. She is an Amazon Scholar. She received her B.A. and M. A. in Computational Linguistics from Tsinghua University, and her M.S. and Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Multimedia Multilingual Information Extraction, Knowledge Base Population and Knowledge-driven Generation. She was selected as "Young Scientist" and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017. The awards she received include "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, Google Research Award in 2009 and 2014, IBM Watson Faculty Award in 2012 and 2014, Bosch Research Award in 2014-2018, Best-of-ICDM2013 Paper, Best-of-SDM2013 Paper, ACL2020 Best Demo Paper Award, and NAACL2021 Best Demo Paper Award. She is elected as the North American Chapter of the Association for Computational Linguistics (NAACL) secretary 2020-2021. She has served as the Program Committee Co-Chair of many conferences including NAACL-HLT2018 and AACL2022, and she has been the coordinator for the NIST TAC Knowledge Base Population track since 2010.
Joy Li and Dr. Hantao He, 'Simulation of Realistic Granular Agricultural Material Behavior Using a Physics-Based Game Engine'
Dr. Eric Weber, Department of Mathematics, Iowa State Unviersity, 'Public Trust in AI and Data Science Lifecycles'
Artificial intelligence and other data science methods are being increasingly deployed and affect the public in a multitude of positive and negative ways. Governmental and private institutions use these systems to process information according to certain human-devised rules in order to address social problems or organizational challenges. Both research and real-world experience indicate that the public has a lack of trust in automated decision-making systems and the institutions that deploy them. How can individuals and institutions that deploy data-driven algorithms gain the trust of the public? We will present a literature review, discussing several methods for formalizing public trust in data science lifecycles. We will also discuss possible methods for obtaining public trust in these lifecycles as well as how institutions have failed to establish the trust of the public.
Dr. Eric Weber holds a Ph.D. in Mathematics from the University of Colorado. His research interests include harmonic analysis, approximation theory and data science. Past research includes developing novel wavelet transforms for image processing, and reproducing kernel methods for the harmonic analysis of fractals. Current research projects include the development of new algorithms for processing distributed spatiotemporal datasets; extending alternating projection methods for optimization in non-Euclidean geometries; using harmonic analysis techniques for understanding the approximation properties of neural networks; formalizing public trust in data science lifecycles; and developing machine learning techniques to improve the diagnosis of severe wind occurrences.
Please check back later for updates and full schedules for this year's Midwest Big Data Summer School!