Math 565: Lecture Notes and Videos on Optimization for Machine Learning

Copyright: I (Bala Krishnamoorthy) hold the copyright for all lecture scribes/notes, documents, and other materials including videos posted on these course web pages. These materials might not be used for commercial purposes without my consent.

Scribes from all lectures so far (as a single big file)

Lec	Date	Topic(s)	Scribe	Video
1	Jan 13	syllabus, logistics, ML problems: clustering, classification, regression, optimization in 1D, regression via minimization	Scb1	Vid1
2	Jan 15	\(\nabla J = \mathbf{0} \Rightarrow D^TD \mathbf{w} = D^T \mathbf{y}\), optimization in graphs, using \(D = QR\), Tikhonov regularization, binary classification	Scb2	Vid2
3	Jan 20	support vector machine (SVM), Taylor expansion, local optimality in 1D, gradient descent (python), optimality in \(d\)-dim	Scb3	Vid3
4	Jan 22	local optimality: second order conditions, convex (cvx) sets + functions, properties, \(f(g(\mathbf{w}))\) cvx when \(f\) cvx + \(g\) linear	Scb4	Vid4
5	Jan 27	local min of cvx \(f \Rightarrow\) global min, first+second derivative cndn of convexity, strict convexity, computing \(\nabla J\), updating \(\alpha_t\)	Scb5	Vid5
6	Jan 29	second order cndtns example, line search for \(\alpha_t\), additively separable loss \(J\)\(=\)\(\sum_i J_i\), stochastic gradient descent (SGD)	Scb6	Vid6
7	Feb 3	optimization:general vs ML, hyperparameter tuning, cross validation, SGD for regression, hinge & \(L_2\)-SVM loss (for \(\pm 1\))	Scb7	Vid7
8	Feb 5	logistic regression loss \(J_{LR}\) smooth & strictly convex w/o regularizer, coordinate descent (CD), linear regression with CD	Scb8	Vid8
9	Feb 10	block coordinate decscent (BCD), k-means clustering as BCD, CD challenges, momentum-based learning \(\mathbf{v}\)\(\leftarrow\)\(\beta\mathbf{v}\)\(-\)\(\alpha \nabla J\)	Scb9	Vid9
10	Feb 12	variants of GD: AdaGrad, RMSProp, AdaM (Adam), Newton method \(\mathbf{w} \leftarrow \mathbf{w} - H^{-1} \nabla J\), optimal for quadratic \(J\)	Scb10	Vid10
11	Feb 17	Newton increases \(J\) for nonquadratic \(J\), line search for Newton method, Newton method in regression and in \(L_2\)-SVM	Scb11	Vid11
12	Feb 19	Newton for logistic regression SVM, probability/uncertainty in misclassification, summary of Newton for ML problems	Scb12	Vid12
13	Feb 24	Newton challenges: ill-conditioned Hessian, saddle points, convergence problems, trust region method, trust radius \(\delta_t\)	Scb13	Vid13
14	Feb 26	conjugate gradient method (CGM), \(H\)-orthogonality (conjugacy): \(\mathbf{q}_i^T H \mathbf{q}_j = 0\), projections of \(H\) using finite differences	Scb14	Vid14

Last modified: Thu Feb 26 22:48:18 PST 2026