Math 565: Lecture Notes and Videos on Optimization for Machine Learning
|
Copyright: I (Bala Krishnamoorthy) hold the copyright
for all lecture scribes/notes, documents, and other
materials including videos posted on these course web
pages. These materials might not be used for commercial
purposes without my consent.
|
Scribes from all lectures so far (as a single big file)
| Lec | Date | Topic(s) | Scribe | Video |
| 1 |
Jan 13 |
syllabus, logistics, ML problems: clustering, classification, regression, optimization in 1D, regression via minimization
|
Scb1
|
Vid1
|
| 2 |
Jan 15 |
\(\nabla J = \mathbf{0} \Rightarrow D^TD \mathbf{w} =
D^T \mathbf{y}\), optimization in graphs, using \(D =
QR\), Tikhonov regularization, binary classification
|
Scb2
|
Vid2
|
| 3 |
Jan 20 |
support vector machine (SVM), Taylor expansion, local
optimality in 1D, gradient descent
(python),
optimality in \(d\)-dim
|
Scb3
|
Vid3
|
| 4 |
Jan 22 |
local optimality: second order conditions, convex (cvx) sets + functions, properties, \(f(g(\mathbf{w}))\) cvx when \(f\) cvx + \(g\) linear
|
Scb4
|
Vid4
|
| 5 |
Jan 27 |
local min of cvx \(f \Rightarrow\) global min,
first+second derivative cndn of convexity, strict
convexity, computing \(\nabla J\), updating
\(\alpha_t\)
|
Scb5
|
Vid5
|
| 6 |
Jan 29 |
second order cndtns example, line search for
\(\alpha_t\), additively separable loss
\(J\)\(=\)\(\sum_i J_i\), stochastic gradient descent
(SGD)
|
Scb6
|
Vid6
|
| 7 |
Feb 3 |
optimization:general vs ML, hyperparameter tuning,
cross validation, SGD for regression, hinge &
\(L_2\)-SVM loss (for \(\pm 1\))
|
Scb7
|
Vid7
|
| 8 |
Feb 5 |
logistic regression loss \(J_{LR}\) smooth & strictly
convex w/o regularizer, coordinate descent (CD),
linear regression with CD
|
Scb8
|
Vid8
|
| 9 |
Feb 10 |
block coordinate decscent (BCD), k-means
clustering as BCD, CD challenges, momentum-based
learning
\(\mathbf{v}\)\(\leftarrow\)\(\beta\mathbf{v}\)\(-\)\(\alpha
\nabla J\)
|
Scb9
|
Vid9
|
| 10 |
Feb 12 |
variants of GD: AdaGrad, RMSProp, AdaM (Adam), Newton
method \(\mathbf{w} \leftarrow \mathbf{w} - H^{-1}
\nabla J\), optimal for quadratic \(J\)
|
Scb10
|
Vid10
|
| 11 |
Feb 17 |
Newton increases \(J\) for nonquadratic \(J\), line
search for Newton method, Newton method in regression
and in \(L_2\)-SVM
|
Scb11
|
Vid11
|
| 12 |
Feb 19 |
Newton for logistic regression SVM,
probability/uncertainty in misclassification, summary
of Newton for ML problems
|
Scb12
|
Vid12
|
| 13 |
Feb 24 |
Newton challenges: ill-conditioned Hessian, saddle
points, convergence problems, trust region method,
trust radius \(\delta_t\)
|
Scb13
|
Vid13
|
| 14 |
Feb 26 |
conjugate gradient method (CGM), \(H\)-orthogonality
(conjugacy): \(\mathbf{q}_i^T H \mathbf{q}_j = 0\),
projections of \(H\) using finite differences
|
Scb14
|
Vid14
|
Last modified: Thu Feb 26 22:48:18 PST 2026
|