Unsupervised learning 
Introduction to ML or introductory statistics 

Understands the difference between supervised and unsupervised learning. Understands the principle of probabilistic learning. 

Optimization 
Vector analysis 
Knows the definition of the gradient method and the idea of local vs. global maxima. Can derive the gradient method in basic cases. 
Understands the gradient method and its variants (projected, stochastic). Can define Newton's method. Can derive practical algorithms based on these. 
Understands the Lagrangian, and how to derive the projected gradient from it. Can derive and reproduce some variant of the conjugate gradient algorithm. 
Principal Component Analysis (PCA) and Factor Analysis 
Linear algebra I&II, Introduction to probability theory 
Can give the definition of PCA and explain the main uses of PCA. Can describe the computation using the eigenvectors of the covariance matrix. Can define the factor analysis model. Understands the connection between PCA and factor analysis. Can formulate the factor rotation problem. 
Can derive the PCA as solution to one or more optimization problems. Can see if the solution is unique or not. Able to show the connection between PCA and factor analysis. Knows at least one basic solution to the factor rotation problem. 

Understands the connection to singular value decomposition. Knows more than one classic factor rotation method, and can compare them. 
Independent Component Analysis (ICA) 
All the above 
Can reproduce the definition of ICA. Understands the uniqueness result and the relevance of nongaussianity. Knows two basic applications of ICA.Understands how ICA estimation is related to maximization of nongaussianity based on the central limit theorem; can formulate at least two measures of nongaussianity. Can reproduce the basic formulae for the likelihood and mutual information; can show how they are related, and how they are related to nongaussianity. /td> 
Can show the effect of whitening. Can show the problem is impossible to solve for gaussian data. Can derive methods for computationally maximizing measures of nongaussianity. Can compare different measures of nongaussianity. Can derive a practical algorithm for maximizing likelihood, including a simple family of density models. 
Can show in more than one way that the problem is impossible for gaussian data. Can reproduce the optimality proof of kurtosis. Understands the nonparametric nature of the likelihood, and different methods for tackling it. 
Clustering 
Intro to ML or introductory statistics; Introduction to probability theory. 
Can reproduce the kmeans algorithm. Understands the gaussian mixture model and the basic idea of the EM algorithm. Can explain the differences of kmeans and gaussian mixture model using EM. 
Can derive the likelihood of the gaussian mixture model. Can derive (with some help) the EM algorithm for that model. 
Understands the theory of the EM algorithm. 
Nonlinear projections 
PCA&FA section above 
Understands the concept of nonlinear projections. Can explain at least two of the following methods: linear MDS, kernel PCA, Laplacian Eigenmaps, IsoMap. Understands the connection of these methods to PCA. 
Can show the equivalence of linear MDS and PCA. Can reproduce the following algorithms: kernel PCA, Laplacian Eigenmaps, IsoMap, SOM. 
Knows further algorithms such as curvilinear component analysis, local linear embeddings. 