Coursera|Coursera 学习笔记|Machine Learning by Standford University - 吴恩达
/ 20220404 Week 1 - 2 /
Chapter 1 - Introduction
1.1 Definition
- Arthur Samuel
The field of study that gives computers the ability to learn without being explicitly programmed. - Tom Mitchell
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
- Supervised Learning 监督学习:given a labeled data set;
already know what a correct output/result should look like
- Regression 回归:continuous output
- Classification 分类:discrete output
- Unsupervised Learning 无监督学习:given an unlabeled data set or an data set with the same labels;
group the data by ourselves
- Clustering 聚类:group the data into different clusters
- Non-Clustering 非聚类
- Others: Reinforcement Learning, Recommender Systems...
- 【Coursera|Coursera 学习笔记|Machine Learning by Standford University - 吴恩达】Training Set 训练集
\[\begin{matrix} x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\ x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\ \vdots&\vdots&\ddots&\vdots&&\vdots\\ x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)} \end{matrix}\]
- 符号说明
\(m=\) the number of training examples 训练样本的数量 - 行数
\(n=\) the number of features 特征数量 - 列数
\(x=\) input variable/feature 输入变量/特征
\(y=\) output variable/target variable 输出变量/目标变量
\((x^{(i)}_j,y^{(i)})\) :第\(j\)个特征的第 \(i\) 个训练样本,其中 \(i=1, ..., m\),\(j=1, ..., n\)
1.2.4 Gradient Descent 梯度下降
Chapter 2 - Linear Regression 线性回归
\[\begin{matrix} x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\ x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\ \vdots&\vdots&\vdots&\ddots&\vdots&&\vdots\\ x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)}\\ \\ \theta_0&\theta_1&\theta_2&\cdots&\theta_n&& \end{matrix}\]
2.1 Linear Regression with One Variable 单元线性回归
- Hypothesis Function
\[h_{\theta}(x)=\theta_0+\theta_1x \]
- Cost Function - Square Error Cost Function 平方误差代价函数
\[J(\theta_0,\theta_1)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2 \]
- Goal
\[\min_{(\theta_0,\theta_1)}J(\theta_0,\theta_1) \]
- Hypothesis Function
\[\theta= \left[ \begin{matrix} \theta_0\\ \theta_1\\ \vdots\\ \theta_n \end{matrix} \right],\ x= \left[ \begin{matrix} x_0\\ x_1\\ \vdots\\ x_n \end{matrix} \right]\]
\[\begin{aligned}h_\theta(x)&=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n\\ &=\theta^Tx \end{aligned}\]
- Cost Function
\[J(\theta^T)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2 \]
- Goal
\[\min_{\theta^T}J(\theta^T) \]
- 算法过程
Repeat until convergence(simultaneous update for each \(j=1, ..., n\))
\[\begin{aligned} \theta_j &:=\theta_j-\alpha{\partial\over\partial\theta_j}J(\theta^T)\\ &:=\theta_j-\alpha{1\over{m}}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_j \end{aligned}\]
- Feature Scaling 特征缩放
对每个特征 \(x_j\) 有$$x_j={{x_j-\mu_j}\over{s_j}}$$
其中 \(\mu_j\) 为 \(m\) 个特征 \(x_j\) 的平均值,\(s_j\) 为 \(m\) 个特征 \(x_j\) 的范围(最大值与最小值之差)或标准差。 - Learning Rate 学习率
令
\[X=\left[ \begin{matrix} x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n\\ x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n\\ \end{matrix} \right],\ y=\left[ \begin{matrix} y^{(1)}\\ y^{(2)}\\ \vdots\\ y^{(m)}\\ \end{matrix} \right]\]
其中 \(X\) 为 \(m\times(n+1)\) 维矩阵,\(y\) 为 \(m\) 维的列向量。则
\[\theta=(X^TX)^{-1}X^Ty \]
如果 \(X^TX\) 不可逆(noninvertible),可能是因为:
- Redundant features 冗余特征:存在线性相关的两个特征,需要删除其中一个;
- 特征过多,如 \(m\leq n\):需要删除一些特征,或对其进行正规化(regularization)处理。
e.g.
- \(h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2,\ x_2=x_1^2\)
- \(h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2+\theta_3x_1^3,\ x_2=x_1^2,\ x_3=x_1^3\)
- \(h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2\sqrt{x_1},\ x_2=\sqrt{x_1}\)
推荐阅读
- 深入学习springCloud——服务注册中心之eureka
- 学习|java-->方法案例(公司迟到措施)
- 算法笔记|HDU - 2041 超级楼梯 (简单DP)
- 数据结构学习指导|数据结构初阶(八大排序)
- 机器学习算法系列(二十)-梯度提升决策树算法(Gradient Boosted Decision Trees / GBDT)
- 李宏毅机器学习课程|李宏毅机器学习课程笔记4(CNN、Why Deep、Semi-supervised)
- 深度学习|【李宏毅机器学习】Convolutiona Neural Network 卷积神经网络(p17) 学习笔记
- 机器学习|【学习笔记】李宏毅2021春机器学习课程第三节(卷积神经网络(CNN))
- 李宏毅2021年机器学习笔记———卷积神经网络
- 李宏毅2022ML学习笔记|李宏毅老师2022机器学习课程笔记 01 Introduction of Deep Learning