\[\newcommand{\bm}[1]{\boldsymbol{#1}}\]

Modeling

Classification vs Regression

Parametric Models vs Non-parametric Models

Non-parametric Models: SVM, CART, KNN

Probabilistic Models vs Non-probabilistic Models

Non-probabilistic Models: SVM, K-means, KNN

  • Non-probabilistic Models可以在supervised learning中用cross validation做model selection,而在unsupervised learning中不行。Favor probabilistic model

Discriminative Models vs Generative Models

我认识中的机器学习可以概括为:已知一些变量$\bm{x}$的取值,我们希望得到另一些变量$\bm{y}$的情况。当我们可以用$x$和$y$的显式的函数关系建模时,便可以选择Linear Regression (for regression) 或Logistic Regression (for classification)。这两个模型可以都属于Generalized Linear Model (GLM),在GLM中输出的分布属于exponential family,且模型的mean parameter是输入的线性组合(可能外加一个非线性激活函数)。

Sparse Linear Models Kernel: SVM Gaussian Processes (GP) CART Ensemble learning: bagging, boosting, stacking

除了用discriminative model对已知和未知变量的关系进行显式建模,我们还可以用generative model对变量的生成和依赖关系进行隐式建模。Generative model的核心问题是,如何表示joint distribution $p(\bm{x} \mid \bm{\theta})$,以及如何做inference和learning。Graphical model通过作出conditional independence (CI) assumption来表示joint distribution。Directed Graphical Model (DGM),即Bayes Net,用有向边表示依赖关系,每个节点只依赖于它的immediate parents。Naive Bayes classifier假设所有features之间的independence,将label和features的生成建模为$p(y, \bm{x}) = p(y) \prod{p(x_i \mid y)}$。Markov chain和Hidden Markov Model (HMM) 也属于DGM。

描述变量之间的依赖关系,一种是通过DGM中图的边,而另一种是Latent Variable Model (LVM),记为$\bm{z}_i \rightarrow \bm{x}_i$。最简单的LVM是mixture model,通过一个discrete variable控制子模型的选择:$p(\bm{x} \mid \bm{\theta}) = \sum{\pi_k p_k(\bm{x} \mid \bm{\theta})}$。自模型可以是Gaussian, multinoulli, experts等。在模型存在latent variables或数据中存在丢失时,我们可以用EM算法来求模型的MLE/MAP estimate。

Undirected Graphical Model (UGM): Markov Random Field (MRF), CRF

  • Ising Model, Hopfield Network, Potts Model, Gaussian MRF, Markov Logic Network
  • CRF, MEMM, SSVM

Learning and Inference

Supervised Learning, Unsupervised Learning, Semi-supervised Learning, Self-supervised Learning Multi-task Learning Few-shot Learning, Zero-shot Learning

Frequetist vs Bayesian Frequentist – MLE, MAP Bayesian – VI, MCMC

Unsupervised learning

  • PCA
  • Clustering

从Probabilistic Approach的角度看,inference是指求一个未知变量对已知变量的posterior。比如在LVM中,我们想从observed variable推出latent variable,即是求$p(z | x, \theta)$。 Inference

  • Exact inference
  • VI
  • MCMC

Structured Data vs Unstructured Data