研究一个因变量、与两个或两个以上自变量的回归。亦称为多元线性回归,是反映一种现象或事物的数量依多种现象或事物的数量的变动而相应地变动的规律。建立多个变量之间线性或非线性数学模型数量关系式的统计方法。
相关数据:相关概念这里不做过多的解释,需要的可以自行查找,这里只提供机器学习该模型的用法:
链接: https://pan.baidu.com/s/1Qv9OieI5R5zu-jbKU3bLZg?
pwd=eyzh 提取码: eyzh
复制这段内容后打开百度网盘手机App,操作更方便哦
以预测波士顿房价为例:
1.获取数据:"D:\mlData\house_data.csv"文件存放的地址,df.head()指定记录数
# 1、读取数据
df=pd.read_csv("D:\mlData\house_data.csv")df.head(10) #指定前十条记录数
CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | MEDV | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.00632 | 18.0 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1 | 296 | 15.3 | 396.90 | 4.98 | 24.0 |
1 | 0.02731 | 0.0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 396.90 | 9.14 | 21.6 |
2 | 0.02729 | 0.0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 392.83 | 4.03 | 34.7 |
3 | 0.03237 | 0.0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222 | 18.7 | 394.63 | 2.94 | 33.4 |
4 | 0.06905 | 0.0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 396.90 | 5.33 | 36.2 |
5 | 0.02985 | 0.0 | 2.18 | 0 | 0.458 | 6.430 | 58.7 | 6.0622 | 3 | 222 | 18.7 | 394.12 | 5.21 | 28.7 |
6 | 0.08829 | 12.5 | 7.87 | 0 | 0.524 | 6.012 | 66.6 | 5.5605 | 5 | 311 | 15.2 | 395.60 | 12.43 | 22.9 |
7 | 0.14455 | 12.5 | 7.87 | 0 | 0.524 | 6.172 | 96.1 | 5.9505 | 5 | 311 | 15.2 | 396.90 | 19.15 | 27.1 |
8 | 0.21124 | 12.5 | 7.87 | 0 | 0.524 | 5.631 | 100.0 | 6.0821 | 5 | 311 | 15.2 | 386.63 | 29.93 | 16.5 |
9 | 0.17004 | 12.5 | 7.87 | 0 | 0.524 | 6.004 | 85.9 | 6.5921 | 5 | 311 | 15.2 | 386.71 | 17.10 | 18.9 |
2.数据特征工程处理:ydata提取“MEDV”的数据,xdata删除“MEDV”的数据并删除整一列
import matplotlib.pyplot as plt
ydata=https://www.it610.com/article/df['MEDV']
xdata=https://www.it610.com/article/df.drop('MEDV',axis=1)
xdata
CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.00632 | 18.0 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1 | 296 | 15.3 | 396.90 | 4.98 |
1 | 0.02731 | 0.0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 396.90 | 9.14 |
2 | 0.02729 | 0.0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 392.83 | 4.03 |
3 | 0.03237 | 0.0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222 | 18.7 | 394.63 | 2.94 |
4 | 0.06905 | 0.0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 396.90 | 5.33 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
501 | 0.06263 | 0.0 | 11.93 | 0 | 0.573 | 6.593 | 69.1 | 2.4786 | 1 | 273 | 21.0 | 391.99 | 9.67 |
502 | 0.04527 | 0.0 | 11.93 | 0 | 0.573 | 6.120 | 76.7 | 2.2875 | 1 | 273 | 21.0 | 396.90 | 9.08 |
503 | 0.06076 | 0.0 | 11.93 | 0 | 0.573 | 6.976 | 91.0 | 2.1675 | 1 | 273 | 21.0 | 396.90 | 5.64 |
504 | 0.10959 | 0.0 | 11.93 | 0 | 0.573 | 6.794 | 89.3 | 2.3889 | 1 | 273 | 21.0 | 393.45 | 6.48 |
505 | 0.04741 | 0.0 | 11.93 | 0 | 0.573 | 6.030 | 80.8 | 2.5050 | 1 | 273 | 21.0 | 396.90 | 7.88 |
3.数据集划分:对数据集进行划分一般为训练数据集与测试数据集是8:2
#3.数据集的划分
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(xdata,ydata,test_size=0.2,random_state=33)
print(ytrain,ytest,xtrain,xtest)
22931.5 29627.1 4258.3 49113.6 4188.8 ... 14615.6 6619.4 21623.3 39123.2 2013.6 Name: MEDV, Length: 404, dtype: float64 12220.5 4005.6 42313.4 44712.6 4421.2 ... 16525.0 10619.5 47019.9 14915.4 11021.7 Name: MEDV, Length: 102, dtype: float64CRIMZNINDUSCHASNOXRMAGEDISRADTAX\ 2290.441780.06.2000.5046.55221.43.37518307 2960.053720.013.9200.4376.54951.05.96044289 42515.860300.018.1000.6795.89695.41.909624666 4910.105740.027.7400.6095.98398.81.86814711 41873.534100.018.1000.6795.957100.01.802624666 ................................ 1462.155050.019.5800.8715.628100.01.51665403 660.0437980.03.3700.3985.78731.16.61154337 2160.045600.013.8910.5505.88856.03.11215276 3915.293050.018.1000.7006.05182.52.167824666 201.251790.08.1400.5385.57098.13.79794307PTRATIOBLSTAT 22917.4380.343.76 29616.0392.857.39 42520.27.6824.39 49120.1390.1118.07 41820.216.4520.62 ........... 14614.7169.2716.65 6616.1396.9010.24 21616.4392.8013.51 39120.2378.3818.76 2021.0376.5721.02[404 rows x 13 columns]CRIMZNINDUSCHASNOXRMAGEDISRADTAX\ 1220.092990.025.6500.5815.96192.92.08692188 40025.046100.018.1000.6935.987100.01.588824666 4237.050420.018.1000.6146.10385.12.021824666 4479.924850.018.1000.7406.25196.62.198024666 440.122690.06.9100.4486.06940.05.72093233 ................................ 1652.924000.019.5800.6056.10193.02.28345403 1060.171200.08.5600.5205.83691.92.21105384 4704.348790.018.1000.5806.16784.03.033424666 1492.733970.019.5800.8715.59794.91.52575403 1100.107930.08.5600.5206.19554.42.77785384PTRATIOBLSTAT 12219.1378.0917.93 40020.2396.9026.77 42320.22.5223.29 44720.2388.5216.44 4417.9389.399.55 ........... 16514.7240.169.81 10620.9395.6718.66 47020.2396.9016.29 14914.7351.8521.45 11020.9393.4913.00[102 rows x 13 columns]
4. 模型训练:得出截距为33.046064463200565
#模型训练--预估器
#xtrain
#ytrain
#导入线性回归库
from sklearn.linear_model import LinearRegression
#创建回归对象lr= LinearRegression()
#模型训练--a权重 b截距
lr.fit(xtrain,ytrain)
#权重系数
#y=w0+w1*x+w2*x+.....wn*x
#求 w0
#求 w1.....
lr.coef_#截距
lr.intercept_
5.模型预测:对测试数据进行模型预测
#模型的预测
y_predict=lr.predict(xtest)
y_predict
#真实数据
ytest
12220.5 4005.6 42313.4 44712.6 4421.2 ... 16525.0 10619.5 47019.9 14915.4 11021.7 Name: MEDV, Length: 102, dtype: float64
6.模型评估
#模型评估 MSE
from sklearn.metrics import mean_squared_error
mse=mean_squared_error(ytest,y_predict)
mse
【机器学习|机器学习—多元线性回归案例】
推荐阅读
- 大数据——机器学习|机器学习之用解析解求解多元线性回归模型
- 机器学习基础知识|机器学习之多元线性回归
- 机器学习|吴恩达机器学习作业一(利用多元线性回归模型实现房价预测(python实现))
- 机器学习|多元线性回归模型
- 机器学习|【机器学习】线性回归(超详细)
- 机器学习|线性回归学习心得
- 机器学习|K-近邻算法学习
- 机器学习|Pandas 学习
- 机器学习|Numpy学习