python回归输出函数 python 回归( 三 )


4、线性回归模型
优点:快速;没有调节参数;可轻易解释;可理解 。
缺点:相比其他复杂一些的模型,其预测准确率不是太高,因为它假设特征和响应之间存在确定的线性关系 , 这种假设对于非线性的关系,线性回归模型显然不能很好的对这种数据建模 。
线性模型表达式: y=β0+β1x1+β2x2+...+βnxn 其中
y是响应
β0是截距
β1是x1的系数,以此类推
在这个案例中: y=β0+β1?TV+β2?Radio+...+βn?Newspaper
(1)、使用pandas来构建X(特征向量)和y(标签列)
scikit-learn要求X是一个特征矩阵,y是一个NumPy向量 。
pandas构建在NumPy之上 。
因此,X可以是pandas的DataFrame , y可以是pandas的Series,scikit-learn可以理解这种结构 。
[python] view plain copy
#create a python list of feature names
feature_cols = ['TV', 'Radio', 'Newspaper']
# use the list to select a subset of the original DataFrame
X = data[feature_cols]
# equivalent command to do this in one line
X = data[['TV', 'Radio', 'Newspaper']]
# print the first 5 rows
print X.head()
# check the type and shape of X
print type(X)
print X.shape
输出结果如下:
TVRadioNewspaper
0230.137.869.2
144.539.345.1
217.245.969.3
3151.541.358.5
4180.810.858.4
class 'pandas.core.frame.DataFrame'
(200, 3)
[python] view plain copy
# select a Series from the DataFrame
y = data['Sales']
# equivalent command that works if there are no spaces in the column name
y = data.Sales
# print the first 5 values
print y.head()
输出的结果如下:
022.1
110.4
29.3
318.5
412.9
Name: Sales
(2)、构建训练集与测试集
[html] view plain copy
pre name="code" class="python"span style="font-size:14px;"##构造训练集和测试集
from sklearn.cross_validation import train_test_split#这里是引用了交叉验证
X_train,X_test, y_train, y_test = train_test_split(X, y, random_state=1)
#default split is 75% for training and 25% for testing
[html] view plain copy
print X_train.shape
print y_train.shape
print X_test.shape
print y_test.shape
输出结果如下:
(150, 3)
(150,)
(50, 3)
(50,)
注:上面的结果是由train_test_spilit()得到的,但是我不知道为什么我的版本的sklearn包中居然报错:
ImportErrorTraceback (most recent call last)ipython-input-182-3eee51fcba5a in module()1 ###构造训练集和测试集---- 2 from sklearn.cross_validation import train_test_split3 #import sklearn.cross_validation4 X_train,X_test, y_train, y_test = train_test_split(X, y, random_state=1)5 # default split is 75% for training and 25% for testingImportError: cannot import name train_test_split
处理方法:1、我后来重新安装sklearn包 。再一次调用时就没有错误了 。
2、自己写函数来认为的随机构造训练集和测试集 。(这个代码我会在最后附上 。)
(3)sklearn的线性回归
[html] view plain copy
from sklearn.linear_model import LinearRegression
linreg = LinearRegression()
model=linreg.fit(X_train, y_train)
print model
print linreg.intercept_
print linreg.coef_
输出的结果如下:
LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
2.66816623043
[ 0.046410010.19272538 -0.00349015]
[html] view plain copy
# pair the feature names with the coefficients
zip(feature_cols, linreg.coef_)
输出如下:
[('TV', 0.046410010869663267),
('Radio', 0.19272538367491721),
('Newspaper', -0.0034901506098328305)]
y=2.668+0.0464?TV+0.192?Radio-0.00349?Newspaper

推荐阅读