Abstract. Deep learning is machine learning using neural networks with many hidden layers, and it has become a primary tool in a wide variety of practical learning tasks. In this talk, we begin with a simple optimization problem and show how it can be reformulated as gradient flows, which in turn lead to different optimization solvers. We further introduce the mathematical formulation of deep residual neural networks as a PDE optimal control problem. We state and prove optimality conditions for the inverse deep learning problem, using the Hamilton-Jacobi-Bellmann equation and the Pontryagin maximum principle.