机器学习|机器学习—多元线性回归案例

研究一个因变量、与两个或两个以上自变量的回归。亦称为多元线性回归,是反映一种现象或事物的数量依多种现象或事物的数量的变动而相应地变动的规律。建立多个变量之间线性或非线性数学模型数量关系式的统计方法。

相关数据:
链接: https://pan.baidu.com/s/1Qv9OieI5R5zu-jbKU3bLZg?
pwd=eyzh 提取码: eyzh
复制这段内容后打开百度网盘手机App,操作更方便哦
相关概念这里不做过多的解释,需要的可以自行查找,这里只提供机器学习该模型的用法:
以预测波士顿房价为例:
1.获取数据:"D:\mlData\house_data.csv"文件存放的地址,df.head()指定记录数
# 1、读取数据 df=pd.read_csv("D:\mlData\house_data.csv")df.head(10) #指定前十条记录数

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2
5 0.02985 0.0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21 28.7
6 0.08829 12.5 7.87 0 0.524 6.012 66.6 5.5605 5 311 15.2 395.60 12.43 22.9
7 0.14455 12.5 7.87 0 0.524 6.172 96.1 5.9505 5 311 15.2 396.90 19.15 27.1
8 0.21124 12.5 7.87 0 0.524 5.631 100.0 6.0821 5 311 15.2 386.63 29.93 16.5
9 0.17004 12.5 7.87 0 0.524 6.004 85.9 6.5921 5 311 15.2 386.71 17.10 18.9
2.数据特征工程处理:ydata提取“MEDV”的数据,xdata删除“MEDV”的数据并删除整一列
import matplotlib.pyplot as plt ydata=https://www.it610.com/article/df['MEDV'] xdata=https://www.it610.com/article/df.drop('MEDV',axis=1) xdata

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
... ... ... ... ... ... ... ... ... ... ... ... ... ...
501 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99 9.67
502 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90 9.08
503 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90 5.64
504 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45 6.48
505 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21.0 396.90 7.88
506 rows × 13 columns
3.数据集划分:对数据集进行划分一般为训练数据集与测试数据集是8:2
#3.数据集的划分 from sklearn.model_selection import train_test_split xtrain,xtest,ytrain,ytest=train_test_split(xdata,ydata,test_size=0.2,random_state=33) print(ytrain,ytest,xtrain,xtest)

22931.5 29627.1 4258.3 49113.6 4188.8 ... 14615.6 6619.4 21623.3 39123.2 2013.6 Name: MEDV, Length: 404, dtype: float64 12220.5 4005.6 42313.4 44712.6 4421.2 ... 16525.0 10619.5 47019.9 14915.4 11021.7 Name: MEDV, Length: 102, dtype: float64CRIMZNINDUSCHASNOXRMAGEDISRADTAX\ 2290.441780.06.2000.5046.55221.43.37518307 2960.053720.013.9200.4376.54951.05.96044289 42515.860300.018.1000.6795.89695.41.909624666 4910.105740.027.7400.6095.98398.81.86814711 41873.534100.018.1000.6795.957100.01.802624666 ................................ 1462.155050.019.5800.8715.628100.01.51665403 660.0437980.03.3700.3985.78731.16.61154337 2160.045600.013.8910.5505.88856.03.11215276 3915.293050.018.1000.7006.05182.52.167824666 201.251790.08.1400.5385.57098.13.79794307PTRATIOBLSTAT 22917.4380.343.76 29616.0392.857.39 42520.27.6824.39 49120.1390.1118.07 41820.216.4520.62 ........... 14614.7169.2716.65 6616.1396.9010.24 21616.4392.8013.51 39120.2378.3818.76 2021.0376.5721.02[404 rows x 13 columns]CRIMZNINDUSCHASNOXRMAGEDISRADTAX\ 1220.092990.025.6500.5815.96192.92.08692188 40025.046100.018.1000.6935.987100.01.588824666 4237.050420.018.1000.6146.10385.12.021824666 4479.924850.018.1000.7406.25196.62.198024666 440.122690.06.9100.4486.06940.05.72093233 ................................ 1652.924000.019.5800.6056.10193.02.28345403 1060.171200.08.5600.5205.83691.92.21105384 4704.348790.018.1000.5806.16784.03.033424666 1492.733970.019.5800.8715.59794.91.52575403 1100.107930.08.5600.5206.19554.42.77785384PTRATIOBLSTAT 12219.1378.0917.93 40020.2396.9026.77 42320.22.5223.29 44720.2388.5216.44 4417.9389.399.55 ........... 16514.7240.169.81 10620.9395.6718.66 47020.2396.9016.29 14914.7351.8521.45 11020.9393.4913.00[102 rows x 13 columns]

4. 模型训练:得出截距为33.046064463200565
#模型训练--预估器 #xtrain #ytrain #导入线性回归库 from sklearn.linear_model import LinearRegression #创建回归对象lr= LinearRegression() #模型训练--a权重 b截距 lr.fit(xtrain,ytrain) #权重系数 #y=w0+w1*x+w2*x+.....wn*x #求 w0 #求 w1..... lr.coef_#截距 lr.intercept_

5.模型预测:对测试数据进行模型预测
#模型的预测 y_predict=lr.predict(xtest) y_predict #真实数据 ytest

12220.5 4005.6 42313.4 44712.6 4421.2 ... 16525.0 10619.5 47019.9 14915.4 11021.7 Name: MEDV, Length: 102, dtype: float64

6.模型评估
#模型评估 MSE from sklearn.metrics import mean_squared_error mse=mean_squared_error(ytest,y_predict) mse


【机器学习|机器学习—多元线性回归案例】

    推荐阅读