Math 565: Optimization for Machine Learning

Course Description

This course will offer a systematic treatment of optimization methods with a focus on applications in machine learning (ML). There are key differences between optimization as used in ML and traditional optimization—for instance, the performance of ML models are assessed based on how well they generalize to test data (rather than on the whole dataset). For instance, the widely used stochastic gradient-descent methods could have lower accuracy than gradient-descent on training data but often perform better on test data!

We will cover a selection of relevant topics from the book titled Linear Algebra and Optimization for Machine Learning by Charu Aggarwal, including gradient and stochastic gradient descent, Newton method in ML problems such as regression and support vector machines (SVM), Lagrangian relaxation and duality for SVM, penalty-based methods, optimization in computational graphs including neural networks (NNs), and backpropagation in NNs.

Homework assignments will include proof-type problems as well as ones needing use of software. Students will also work on a computational project. No exams will be given. This course will (ideally) be a follow-up of Math564: Nonlinear Optimization (offered in Fall 2025). Students who took that course will be best prepared to do well in Math 565. Independent of Math564, prerequisites for Math565 are familiarity with analysis and linear algebra with proof-based work at the undergraduate (400-) level, or obtain the permission of the instructor. Familiarity with computer programming languages or packages such as Matlab or Python will also be expected.

Syllabus

Announcements

Mon, Jan 12: The class will meet in VECS 120 (Vancouver) and Sloan 7 (Pullman).