python回归输出函数 python 回归( 三 ) _变量

4、线性回归模型
优点：快速；没有调节参数；可轻易解释；可理解。
缺点：相比其他复杂一些的模型，其预测准确率不是太高，因为它假设特征和响应之间存在确定的线性关系，这种假设对于非线性的关系，线性回归模型显然不能很好的对这种数据建模。
线性模型表达式： y=β0+β1x1+β2x2+...+βnxn 其中
y是响应
β0是截距
β1是x1的系数，以此类推
在这个案例中： y=β0+β1?TV+β2?Radio+...+βn?Newspaper
(1)、使用pandas来构建X(特征向量)和y(标签列)
scikit-learn要求X是一个特征矩阵，y是一个NumPy向量。
pandas构建在NumPy之上。
因此，X可以是pandas的DataFrame ， y可以是pandas的Series，scikit-learn可以理解这种结构。
[python] view plain copy
#create a python list of feature names
feature_cols = ['TV', 'Radio', 'Newspaper']
# use the list to select a subset of the original DataFrame
X = data[feature_cols]
# equivalent command to do this in one line
X = data[['TV', 'Radio', 'Newspaper']]
# print the first 5 rows
print X.head()
# check the type and shape of X
print type(X)
print X.shape
输出结果如下：
TVRadioNewspaper
0230.137.869.2
144.539.345.1
217.245.969.3
3151.541.358.5
4180.810.858.4
class 'pandas.core.frame.DataFrame'
(200, 3)
[python] view plain copy
# select a Series from the DataFrame
y = data['Sales']
# equivalent command that works if there are no spaces in the column name
y = data.Sales
# print the first 5 values
print y.head()
输出的结果如下：
022.1
110.4
29.3
318.5
412.9
Name: Sales
（2）、构建训练集与测试集
[html] view plain copy
pre name="code" class="python"span style="font-size:14px;"##构造训练集和测试集
from sklearn.cross_validation import train_test_split#这里是引用了交叉验证
X_train,X_test, y_train, y_test = train_test_split(X, y, random_state=1)
#default split is 75% for training and 25% for testing
[html] view plain copy
print X_train.shape
print y_train.shape
print X_test.shape
print y_test.shape
输出结果如下：
(150, 3)
(150,)
(50, 3)
(50,)
注：上面的结果是由train_test_spilit()得到的，但是我不知道为什么我的版本的sklearn包中居然报错：
ImportErrorTraceback (most recent call last)ipython-input-182-3eee51fcba5a in module()1 ###构造训练集和测试集---- 2 from sklearn.cross_validation import train_test_split3 #import sklearn.cross_validation4 X_train,X_test, y_train, y_test = train_test_split(X, y, random_state=1)5 # default split is 75% for training and 25% for testingImportError: cannot import name train_test_split
处理方法：1、我后来重新安装sklearn包。再一次调用时就没有错误了。
2、自己写函数来认为的随机构造训练集和测试集。(这个代码我会在最后附上。)
（3）sklearn的线性回归
[html] view plain copy
from sklearn.linear_model import LinearRegression
linreg = LinearRegression()
model=linreg.fit(X_train, y_train)
print model
print linreg.intercept_
print linreg.coef_
输出的结果如下：
LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
2.66816623043
[ 0.046410010.19272538 -0.00349015]
[html] view plain copy
# pair the feature names with the coefficients
zip(feature_cols, linreg.coef_)
输出如下：
[('TV', 0.046410010869663267),
('Radio', 0.19272538367491721),
('Newspaper', -0.0034901506098328305)]
y=2.668+0.0464?TV+0.192?Radio-0.00349?Newspaper

python回归输出函数 python 回归( 三 )

推荐阅读

如何炒海带丝

广东博物馆元旦开放吗 2023广东省博物馆元宵节开放吗

俄罗斯军中“地震”,普京一口气解除11名将军的职务,绍伊古难道没有责任吗？

佳能废墨收集器怎么更换滤芯-佳能M2700的墨水收集器在哪里？想自己拆

考研哪些可以加分

冬季高血压病患如何保平安

vb.net锁屏 vb锁定窗口大小

光阴不负——领导的智慧

广州出入境签注广州往来港澳人才签注办理指南

首席执行官|注册启动！2021药明康德全球论坛：拥抱未来20年的创新

室外游泳池水温不得低于游泳池冬季如何保持水温，游泳池的水温应该是多少度

家长怎样塑造孩子的性格

电视剧古剑奇谭剧情介绍古剑奇谭剧情介绍

淘宝退货原因对买家有影响吗

企业客户需求分析,汽车销售客户需求分析

医生|1天排3次与3天排1次，哪种排便频率更健康？医生为你解惑

雪佛兰沃兰多使用的是什么机油雪佛兰沃兰多加什么机油

生活中的谐字音有哪些？

支付宝里的余额宝消费红包如何使用_余额宝消费红包使用教程【图】

羊肚蘑养殖技术和栽培要点