|
This course will offer a systematic treatment of
optimization methods with a focus on applications
in machine learning (ML). There are key
differences between optimization as used in ML and
traditional optimization—for instance, the
performance of ML models are assessed based on how
well they generalize to test data (rather
than on the whole dataset). For instance, the
widely used stochastic gradient-descent methods
could have lower accuracy than gradient-descent on
training data but often perform better on test
data!
We will cover a selection of relevant topics from
the book
titled Linear
Algebra and Optimization for Machine Learning
by Charu Aggarwal, including gradient and
stochastic gradient descent, Newton method in ML
problems such as regression and support vector
machines (SVM), Lagrangian relaxation and duality
for SVM, penalty-based methods, optimization in
computational graphs including neural networks
(NNs), and backpropagation in NNs.
Homework assignments will include proof-type
problems as well as ones needing use of software.
Students will also work on a computational
project. No exams will be given. This course
will (ideally) be a follow-up of Math564:
Nonlinear Optimization (offered in Fall 2025).
Students who took that course will be best
prepared to do well in Math 565. Independent of
Math564, prerequisites for Math565 are familiarity
with analysis and linear algebra with proof-based
work at the undergraduate (400-) level, or
obtain the permission of the instructor.
Familiarity with computer programming languages or
packages such as Matlab or Python will also be
expected.
|