R语言--逐步回归分析
【R语言--逐步回归分析】逐步回归分析是以AIC信息统计量为准则,通过选择最小的AIC信息统计量,来达到删除或增加变量的目的。R语言中用于逐步回归分析的函数 step(),drop1(),add1()。
1.载入数据 首先对数据进行多元线性回归分析
tdata<-data.frame(
x1=c( 7, 1,11,11, 7,11, 3, 1, 2,21, 1,11,10),
x2=c(26,29,56,31,52,55,71,31,54,47,40,66,68),
x3=c( 6,15, 8, 8, 6, 9,17,22,18, 4,23, 9, 8),
x4=c(60,52,20,47,33,22, 6,44,22,26,34,12,12),
Y =c(78.5,74.3,104.3,87.6,95.9,109.2,102.7,72.5,
93.1,115.9,83.8,113.3,109.4)
)
tlm<-lm(Y~x1+x2+x3+x4,data=https://www.it610.com/article/tdata)
summary(tlm)
多元线性回归结果分析
Call:
lm(formula = Y ~ x1 + x2 + x3 + x4, data = https://www.it610.com/article/tdata)Residuals:
Min1QMedian3QMax
-3.1750 -1.67090.25081.37833.9254 Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)62.405470.07100.8910.3991
x11.55110.74482.0830.0708 .
x20.51020.72380.7050.5009
x30.10190.75470.1350.8959
x4-0.14410.7091-0.2030.8441
---
Signif. codes:0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 2.446 on 8 degrees of freedom
Multiple R-squared:0.9824,Adjusted R-squared:0.9736
F-statistic: 111.5 on 4 and 8 DF,p-value: 4.756e-07
通过观察,回归方程的系数都没有通过显著性检验
2.逐步回归分析###
tstep<-step(tlm)
summary(tstep)
Start:AIC=26.94
Y ~ x1 + x2 + x3 + x4Df Sum of SqRSSAIC
- x310.1091 47.973 24.974
- x410.2470 48.111 25.011
- x212.9725 50.836 25.728
47.864 26.944
- x1125.9509 73.815 30.576Step:AIC=24.97
Y ~ x1 + x2 + x4Df Sum of SqRSSAIC
47.97 24.974
- x419.9357.90 25.420
- x2126.7974.76 28.742
- x11820.91 868.88 60.629
结果分析:当用x1 x2 x3 x4作为回归方程的系数时,AIC的值为26.94
去掉x3 回归方程的AIC值为24.974;
去掉x4 回归方程的AIC值为25.011;
……
由于去x3可以使得AIC达到最小值,因此R会自动去掉x3;
去掉x3之后 AIC的值都增加 逐步回归分析终止 得到当前最优的回归方程
Call:
lm(formula = Y ~ x1 + x2 + x4, data = https://www.it610.com/article/tdata)Residuals:
Min1QMedian3QMax
-3.0919 -1.80160.25621.28183.8982 Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)71.648314.14245.066 0.000675 ***
x11.45190.117012.410 5.78e-07 ***
x20.41610.18562.242 0.051687 .
x4-0.23650.1733-1.365 0.205395
---
Signif. codes:0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 2.309 on 9 degrees of freedom
Multiple R-squared:0.9823,Adjusted R-squared:0.9764
F-statistic: 166.8 on 3 and 9 DF,p-value: 3.323e-08
回归系数的显著性水平有所提高 但是x2 x4的显著性水平仍然不理想
3.逐步回归分析的优化
drop1(tstep)
结果分析:
Single term deletionsModel:
Y ~ x1 + x2 + x4
Df Sum of SqRSSAIC
47.97 24.974
x11820.91 868.88 60.629
x2126.7974.76 28.742
x419.9357.90 25.420
如果去掉x4 AIC的值从24.974增加到25.420 是三个变量中增加最小的
4.进一步进行多元回归分析
tlm<-lm(Y~x1+x2,data=https://www.it610.com/article/tdata)
summary(tlm)
结果分析:
Call:
lm(formula = Y ~ x1 + x2, data = https://www.it610.com/article/tdata)Residuals:
Min1Q Median3QMax
-2.893 -1.574 -1.3021.3634.048 Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.577352.2861723.00 5.46e-10 ***
x11.468310.1213012.11 2.69e-07 ***
x20.662250.0458514.44 5.03e-08 ***
---
Signif. codes:0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 2.406 on 10 degrees of freedom
Multiple R-squared:0.9787,Adjusted R-squared:0.9744
F-statistic: 229.5 on 2 and 10 DF,p-value: 4.407e-09
所有的检验均为显著。
因此所得回归方程为y=52.57735+ 1.46831x1+ 0.66225x2.
推荐阅读
- 【生信技能树】R语言练习题|【生信技能树】R语言练习题 - 中级
- 逻辑回归的理解与python示例
- 一起来学习C语言的字符串转换函数
- C语言字符函数中的isalnum()和iscntrl()你都知道吗
- C语言浮点函数中的modf和fmod详解
- C语言中的时间函数clock()和time()你都了解吗
- C语言学习|第十一届蓝桥杯省赛 大学B组 C/C++ 第一场
- C语言解方程的根和判断是否是闰年
- C语言的版本比较
- 【C】题目|【C语言】题集 of ⑥