Math 565: Lecture Notes and Videos on Optimization for Machine Learning

Copyright: I (Bala Krishnamoorthy) hold the copyright for all lecture scribes/notes, documents, and other materials including videos posted on these course web pages. These materials might not be used for commercial purposes without my consent.

Scribes from all lectures so far (as a single big file)

Lec Date Topic(s) Scribe Video
1 Jan 13 syllabus, logistics, ML problems: clustering, classification, regression, optimization in 1D, regression via minimization Scb1 Vid1
2 Jan 15 \(\nabla J = \mathbf{0} \Rightarrow D^TD \mathbf{w} = D^T \mathbf{y}\), optimization in graphs, using \(D = QR\), Tikhonov regularization, binary classification Scb2 Vid2
3 Jan 20 support vector machine (SVM), Taylor expansion, local optimality in 1D, gradient descent (python), optimality in \(d\)-dim Scb3 Vid3
4 Jan 22 local optimality: second order conditions, convex (cvx) sets + functions, properties, \(f(g(\mathbf{w}))\) cvx when \(f\) cvx + \(g\) linear Scb4 Vid4
5 Jan 27 local min of cvx \(f \Rightarrow\) global min, first+second derivative cndn of convexity, strict convexity, computing \(\nabla J\), updating \(\alpha_t\) Scb5 Vid5
6 Jan 29 second order cndtns example, line search for \(\alpha_t\), additively separable loss \(J\)\(=\)\(\sum_i J_i\), stochastic gradient descent (SGD) Scb6 Vid6
7 Feb  3 optimization:general vs ML, hyperparameter tuning, cross validation, SGD for regression, hinge & \(L_2\)-SVM loss (for \(\pm 1\)) Scb7 Vid7
8 Feb  5 logistic regression loss \(J_{LR}\) smooth & strictly convex w/o regularizer, coordinate descent (CD), linear regression with CD Scb8 Vid8
9 Feb 10 block coordinate decscent (BCD), k-means clustering as BCD, CD challenges, momentum-based learning \(\mathbf{v}\)\(\leftarrow\)\(\beta\mathbf{v}\)\(-\)\(\alpha \nabla J\) Scb9 Vid9
10 Feb 12 variants of GD: AdaGrad, RMSProp, AdaM (Adam), Newton method \(\mathbf{w} \leftarrow \mathbf{w} - H^{-1} \nabla J\), optimal for quadratic \(J\) Scb10 Vid10
11 Feb 17 Newton increases \(J\) for nonquadratic \(J\), line search for Newton method, Newton method in regression and in \(L_2\)-SVM Scb11 Vid11
12 Feb 19 Newton for logistic regression SVM, probability/uncertainty in misclassification, summary of Newton for ML problems Scb12 Vid12
13 Feb 24 Newton challenges: ill-conditioned Hessian, saddle points, convergence problems, trust region method, trust radius \(\delta_t\) Scb13 Vid13
14 Feb 26 conjugate gradient method (CGM), \(H\)-orthogonality (conjugacy): \(\mathbf{q}_i^T H \mathbf{q}_j = 0\), projections of \(H\) using finite differences Scb14 Vid14


Last modified: Thu Feb 26 22:48:18 PST 2026