从来好事天生俭,自古瓜儿苦后甜。这篇文章主要讲述Scikit-Learn集成机器学习:boosting(4万字详解,收藏)相关的知识,希望能为你提供帮助。
目录?
- 介绍
- GradientBoostingRegressor
- GradientBossting分类器
- AdaBoost 回归器
- AdaBoost分类器
- 参考
介绍?Boosting 是一种集成学习,我们按顺序训练估计器,而不是并行训练所有估计器。我们尝试创建一些快速简单(弱但比随机猜测更好)的模型,然后结合所有弱估计器的结果来做出最终预测。我们已经在bagging & random forests教程中讨论了另一种集成学习方法。如果您想了解它,请随时浏览它。
Scikit-learn 为分类和回归问题提供了两种不同的增强算法:
- 梯度树提升(梯度提升决策树)
- 它迭代地构建学习器,弱学习器在预测错误的样本错误上进行训练。它最初从一个学习者开始,然后迭代地添加学习者。它试图通过迭代地添加新树来最小化损失。它使用决策树是弱估计器。Scikit-learn 提供了两个类,用于实现?
?Gradient Tree Boosting?
?分类和回归问题。
- 梯度提升分类器?GradientBoostingClassifier?
- ?梯度提升回归器?GradientBoostingRegressor
- 梯度提升分类器?GradientBoostingClassifier?
- Adaptive Boost
- 它在修改后的数据上迭代地拟合弱估计器列表。然后它根据加权投票组合所有估计器的结果以生成最终结果。在每次迭代中,将高权重分配给在前一次迭代中预测错误的样本,并为那些在前一次迭代中预测正确的样本减少权重。这使模型能够专注于出错的样本。最初,所有样本都被分配相同的权重(1/ n_samples)。它让我们指定用于该过程的估计器。Scikit-learn 提供了两个类,用于实现?
?Adaptive Boosting?
?分类和回归问题。
- AdaBoost分类器
- AdaBoost 回归器
- AdaBoost分类器
我们将从导入必要的库开始。
import numpy as np
import pandas as pd
import sklearn
import warnings
warnings.filterwarnings("ignore")
np.set_printoptions(precision=3)
%matplotlib inline
加载数据集?
为了我们的目的,我们将加载下面提到的两个。
- 数字数据集:我们将使用具有数字大小图像的数字数据?
?8x8?
?集??0-9?
?。我们将在下面的分类任务中使用数字数据。
- 波士顿住房数据集:我们将使用波士顿住房数据集,其中包含有关各种房屋属性的信息,例如平均房间数量、城镇人均犯罪率等。我们将使用它进行回归任务。
?datasets?
??模块的一部分。我们可以通过调用??load_digits()?
??和??load_boston()?
??方法来加载它们。它返回类似字典的对象??BUNCH?
?,可用于检索特征和目标。from sklearn.datasets import load_boston, load_digits
digits = load_digits()
X_digits, Y_digits = digits.data, digits.target
print(Dataset Size : ,X_digits.shape, Y_digits.shape)
数据集大小 : (1797, 64) (1797,)
boston = load_boston()
X_boston, Y_boston = boston.data, boston.target
print(Dataset Size : ,X_boston.shape, Y_boston.shape)
Dataset Size :(506, 13) (506,)
GradientBoostingRegressor?可?
?GradientBoostingRegressor?
??
作为??ensemble?
?sklearn 模块的一部分使用。我们将使用波士顿住房数据训练默认模型,然后通过尝试各种超参数设置来调整模型以提高其性能。我们还将它与其他回归估计器进行比较,以检查其相对于其他机器学习模型的性能。from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston, train_size=0.80, test_size=0.20, random_state=123)
print(Train/Test Sizes : , X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
训练/测试大小:(404, 13) (102, 13) (404,) (102,)
from sklearn.ensemble import GradientBoostingRegressor
grad_boosting_regressor = GradientBoostingRegressor()
grad_boosting_regressor.fit(X_train, Y_train)
GradientBoostingRegressor(alpha=0.9, criterion=friedman_mse, init=None,
learning_rate=0.1, loss=ls, max_depth=3,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_iter_no_change=None, presort=auto,
random_state=None, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False)
Y_preds = grad_boosting_regressor.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print(Test R^2 Score : %.3f%grad_boosting_regressor.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print(Training R^2 Score : %.3f%grad_boosting_regressor.score(X_train, Y_train))
[33.731 26.108 48.711 18.784 31.065 43.077 25.4749.0318.201 29.294
22.577 19.0615.871 24.611 19.605]
[15.26.6 45.4 20.8 34.9 21.9 28.77.2 20.32.2 24.1 18.5 13.5 27.
23.1]
Test R^2 Score : 0.812
Training R^2 Score : 0.979
【Scikit-Learn集成机器学习(boosting(4万字详解,收藏))】
?
?GradientBoostingRegressor?
?的重要属性以下是一些重要属性,?
?GradientBoostingRegressor?
?一旦模型经过训练,这些属性就可以提供重要信息。- ?
?feature_importances_?
?- 它返回一个浮点数组,表示数据集中每个特征的重要性。 - ?
?estimators_?
?- 它返回训练有素的估计器。 - ?
?oob_improvement_?
?- 它返回大小数组 (n_estimators,)。数组中的每个值都表示袋外样本相对于前一次迭代的损失有所改善。 - ?
?loss_?
?- 它返回损失函数作为对象。
print("Feature Importances : ", grad_boosting_regressor.feature_importances_)
Feature Importances :[1.551e-02 3.064e-04 1.025e-03 4.960e-05 3.208e-02 4.883e-01 1.295e-02
5.908e-02 1.602e-03 1.187e-02 2.461e-02 5.472e-03 3.472e-01]
print("Estimators Shape: ", grad_boosting_regressor.estimators_.shape)
grad_boosting_regressor.estimators_[:2]
Estimators Shape:(100, 1)
array([[DecisionTreeRegressor(criterion=friedman_mse, max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=auto,
random_state=RandomState(MT19937) at 0x7F0EF00E7780,
splitter=best)],
[DecisionTreeRegressor(criterion=friedman_mse, max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=auto,
random_state=RandomState(MT19937) at 0x7F0EF00E7780,
splitter=best)]], dtype=object)
print("Loss : ", grad_boosting_regressor.loss_)
Loss :< sklearn.ensemble._gb_losses.LeastSquaresError object at 0x7f0eac9d4b00>
通过对各种超参数进行网格搜索来微调模型?
以下是需要调整以获得最适合我们数据的常见超参数列表。我们将尝试对训练/测试数据的各种拆分进行各种超参数设置,以找出最佳拟合,这对于训练和测试数据集具有几乎相同的精度,或者在精度之间的差异非常小。
- learning_rate - 它缩小了每棵树的贡献。learning_rate 和 n_estimatros 之间存在权衡。
- n_estimators
- 其结果将被组合以产生最终预测的基本估计器的数量。
?
?default=100?
? - max_depth
- 单个树的最大深度。我们需要找到最佳价值。?
?default=3?
? - min_samples_split
- 拆分内部节点所需的样本数。它接受?
?int(0-n_samples)?
?,??float(0.0-0.5]?
?值。Float 采用 ceil(min_samples_split * n_samples) 特征。??default=2?
? - min_samples_leaf
- 叶节点所需的最小样本数。它接受?
?int(0-n_samples)?
?,??float(0.0-0.5]?
?值。Float 采用 ceil(min_samples_leaf * n_samples) 特征。??default=1?
? - 标准- 我们算法试图最小化的成本函数。目前它支持 mse(均方误差)和 mae(平均绝对误差)。?
?default=friedman_mse?
? - max_features
- 进行拆分时要考虑的特征数。它接受?
?int(0-n_features)?
?, ??float(0.0-0.5]?
?,??string(sqrt, log2, auto) or None?
?作为值。??default=None?
?
- None - 如果提供 None,则 n_features 用作值。
- sqrt - sqrt(n_features) 特征用于分割。
- auto - sqrt(n_features) 特征用于分割。
- log2 - log2(n_features) 特征用于分割。
validation_fraction - 它是指用于早期停止验证的训练数据的比例。它接受?
?float(0.0,1.0)?
??
??default=0.1?
?我们将在下面尝试上述超参数的各种值,通过对数据进行 3 折交叉验证来找到我们数据集的最佳估计器。
%%time
from sklearn.model_selection import GridSearchCV
n_samples = X_boston.shape[0]
n_features = X_boston.shape[1]
params = n_estimators: np.arange(100, 301, 50),
max_depth: [None, 3, 5,],
min_samples_split: [2, 0.3, 0.5, n_samples//2, ],
min_samples_leaf: [1, 0.3, 0.5, n_samples//2, ],
criterion: [friedman_mse, mae],
max_features: [None, sqrt, auto, log2, 0.3, 0.7, n_features//2, ],
grad_boost_regressor_grid = GridSearchCV(GradientBoostingRegressor(random_state=1), param_grid=params, n_jobs=-1, cv=3, verbose=5)
grad_boost_regressor_grid.fit(X_train,Y_train)
print(Train R^2 Score : %.3f%grad_boost_regressor_grid.best_estimator_.score(X_train, Y_train))
print(Test R^2 Score : %.3f%grad_boost_regressor_grid.best_estimator_.score(X_test, Y_test))
print(Best R^2 Score Through Grid Search : %.3f%grad_boost_regressor_grid.best_score_)
print(Best Parameters : ,grad_boost_regressor_grid.best_params_)
Fitting 3 folds for each of 3360 candidates, totalling 10080 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done10 tasks| elapsed:27.6s
[Parallel(n_jobs=-1)]: Done 134 tasks| elapsed:29.9s
[Parallel(n_jobs=-1)]: Done 854 tasks| elapsed:38.2s
[Parallel(n_jobs=-1)]: Done 1662 tasks| elapsed:44.9s
[Parallel(n_jobs=-1)]: Done 2958 tasks| elapsed:55.1s
[Parallel(n_jobs=-1)]: Done 4542 tasks| elapsed:1.1min
[Parallel(n_jobs=-1)]: Done 5503 tasks| elapsed:1.8min
[Parallel(n_jobs=-1)]: Done 6281 tasks| elapsed:2.8min
[Parallel(n_jobs=-1)]: Done 7186 tasks| elapsed:3.5min
[Parallel(n_jobs=-1)]: Done 8124 tasks| elapsed:4.2min
[Parallel(n_jobs=-1)]: Done 9301 tasks| elapsed:5.3min
[Parallel(n_jobs=-1)]: Done 10080 out of 10080 | elapsed:5.9min finished
Train R^2 Score : 0.997
Test R^2 Score : 0.776
Best R^2 Score Through Grid Search : 0.891
Best Parameters :criterion: friedman_mse, max_depth: None, max_features: None, min_samples_leaf: 1, min_samples_split: 0.3, n_estimators: 150
CPU times: user 5.14 s, sys: 254 ms, total: 5.39 s
Wall time: 5min 53s
输出前几个交叉验证结果?
cross_val_results = pd.DataFrame(grad_boost_regressor_grid.cv_results_)
print(Number of Various Combinations of Parameters Tried : %d%len(cross_val_results))
cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 3360
比较梯度提升与 Bagging、随机森林、额外树、决策树和额外树的性能
from sklearn import ensemble, tree
## Gradient Boosting Regressor with Default Params
gb_regressor = ensemble.GradientBoostingRegressor(random_state=1)
gb_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_regressor.__class__.__name__,
gb_regressor.score(X_train, Y_train),gb_regressor.score(X_test, Y_test)))
## Above Hyper-perameter tuned Gradient Boosting Regressor
gb_regressor = ensemble.GradientBoostingRegressor(random_state=1, **grad_boost_regressor_grid.best_params_)
gb_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_regressor.__class__.__name__,
gb_regressor.score(X_train, Y_train),gb_regressor.score(X_test, Y_test)))
## Random Forest Regressor with Default Params
rforest_regressor = ensemble.RandomForestRegressor(random_state=1)
rforest_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(rforest_regressor.__class__.__name__,
rforest_regressor.score(X_train, Y_train),rforest_regressor.score(X_test, Y_test)))
## Extra Trees Regressor with Default Params
extra_forest_regressor = ensemble.ExtraTreesRegressor(random_state=1)
extra_forest_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_forest_regressor.__class__.__name__,
extra_forest_regressor.score(X_train, Y_train),extra_forest_regressor.score(X_test, Y_test)))
## Bagging Regressor with Default Params
bag_regressor = ensemble.BaggingRegressor(random_state=1)
bag_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(bag_regressor.__class__.__name__,
bag_regressor.score(X_train, Y_train),bag_regressor.score(X_test, Y_test)))
## Decision Tree with Default Parameters
dtree_regressor = tree.DecisionTreeRegressor(random_state=1)
dtree_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(dtree_regressor.__class__.__name__,
dtree_regressor.score(X_train, Y_train),dtree_regressor.score(X_test, Y_test)))
## Decision Tree with Default Parameters
extra_tree_regressor = tree.ExtraTreeRegressor(random_state=1)
extra_tree_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_tree_regressor.__class__.__name__,
extra_tree_regressor.score(X_train, Y_train),extra_tree_regressor.score(X_test, Y_test)))
GradientBoostingRegressor : Train Accuracy : 0.98, Test Accuracy : 0.81
GradientBoostingRegressor : Train Accuracy : 1.00, Test Accuracy : 0.78
RandomForestRegressor : Train Accuracy : 0.98, Test Accuracy : 0.81
ExtraTreesRegressor : Train Accuracy : 1.00, Test Accuracy : 0.83
BaggingRegressor : Train Accuracy : 0.98, Test Accuracy : 0.81
DecisionTreeRegressor : Train Accuracy : 1.00, Test Accuracy : 0.44
ExtraTreeRegressor : Train Accuracy : 1.00, Test Accuracy : 0.51
GradientBossting分类器可?
?GradientBosstingClassifier?
??
作为??ensemble?
?sklearn 模块的一部分使用。我们将使用数字数据训练默认模型,然后通过尝试各种超参数设置来调整模型以提高其性能。我们还将它与其他分类估计器进行比较,以检查其相对于其他机器学习模型的性能。X_train, X_test, Y_train, Y_test = train_test_split(X_digits, Y_digits, train_size=0.80, test_size=0.20, random_state=123)
print(Train/Test Sizes : , X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :(1437, 64) (360, 64) (1437,) (360,)
from sklearn.ensemble import GradientBoostingClassifier
grad_boosting_classif = GradientBoostingClassifier()
grad_boosting_classif.fit(X_train, Y_train)
GradientBoostingClassifier(criterion=friedman_mse, init=None,
learning_rate=0.1, loss=deviance, max_depth=3,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_iter_no_change=None, presort=auto,
random_state=None, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0,
warm_start=False)
Y_preds = grad_boosting_classif.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print(Test Accuracy : %.3f%(Y_preds == Y_test).mean())
print(Test Accuracy : %.3f%grad_boosting_classif.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print(Training Accuracy : %.3f%grad_boosting_classif.score(X_train, Y_train))
[3 3 4 4 1 3 1 0 7 4 0 6 5 1 6]
[3 3 4 4 1 3 1 0 7 4 0 0 5 1 6]
Test Accuracy : 0.956
Test Accuracy : 0.956
Training Accuracy : 1.000
?
?GradientBoostingClassifier?
?重要属性?
?GradientBoostingClassifier?
??具有与 相同的属性集??GradientBoostingRegressor?
?。print("Feature Importances Shape: ", grad_boosting_classif.feature_importances_.shape)
grad_boosting_classif.feature_importances_[:10]
Feature Importances Shape:(64,)
array([0., 0.001, 0.011, 0.003, 0.003, 0.059, 0.004, 0.001, 0.001,
0.002])
print("Estimators Shape : ", grad_boosting_classif.estimators_.shape)
Estimators Shape :(100, 10)
print("Loss : ", grad_boosting_classif.loss_)
Loss :< sklearn.ensemble._gb_losses.MultinomialDeviance object at 0x7f0eac97d4a8>
通过对各种超参数进行网格搜索来微调模型?
?
?GradientBoostingClassifier?
??几乎所有参数都与??GradientBoostingRegressor?
?%%time
n_samples = X_digits.shape[0]
n_features = X_digits.shape[1]
params = n_estimators: [100, 200],
max_depth: [None, 2,5,],
min_samples_split: [2,0.5, n_samples//2, ],
min_samples_leaf: [1, 0.5, n_samples//2, ],
criterion: [friedman_mse, mae],
max_features: [None, sqrt, log2, 0.5, n_features//2,],
grad_boost_classif_grid = GridSearchCV(GradientBoostingClassifier(random_state=1), param_grid=params, n_jobs=-1, cv=3, verbose=5)
grad_boost_classif_grid.fit(X_train,Y_train)
print(Train Accuracy : %.3f%grad_boost_classif_grid.best_estimator_.score(X_train, Y_train))
print(Test Accuracy : %.3f%grad_boost_classif_grid.best_estimator_.score(X_test, Y_test))
print(Best Accuracy Through Grid Search : %.3f%grad_boost_classif_grid.best_score_)
print(Best Parameters : ,grad_boost_classif_grid.best_params_)
Fitting 3 folds for each of 540 candidates, totalling 1620 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done10 tasks| elapsed:7.6s
[Parallel(n_jobs=-1)]: Done64 tasks| elapsed:21.5s
[Parallel(n_jobs=-1)]: Done 154 tasks| elapsed:40.8s
[Parallel(n_jobs=-1)]: Done 280 tasks| elapsed:1.3min
[Parallel(n_jobs=-1)]: Done 442 tasks| elapsed:1.9min
[Parallel(n_jobs=-1)]: Done 640 tasks| elapsed:2.7min
[Parallel(n_jobs=-1)]: Done 874 tasks| elapsed: 13.0min
[Parallel(n_jobs=-1)]: Done 1144 tasks| elapsed: 30.8min
[Parallel(n_jobs=-1)]: Done 1450 tasks| elapsed: 48.8min
[Parallel(n_jobs=-1)]: Done 1620 out of 1620 | elapsed: 59.1min finished
Train Accuracy : 1.000
Test Accuracy : 0.972
Best Accuracy Through Grid Search : 0.978
Best Parameters :criterion: mae, max_depth: 5, max_features: log2, min_samples_leaf: 1, min_samples_split: 2, n_estimators: 200
CPU times: user 50.9 s, sys: 236 ms, total: 51.2 s
Wall time: 59min 56s
输出前几个交叉验证结果?
cross_val_results = pd.DataFrame(grad_boost_classif_grid.cv_results_)
print(Number of Various Combinations of Parameters Tried : %d%len(cross_val_results))
cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 540
比较梯度提升与 Bagging、随机森林、额外树、决策树和额外树的性能?
from sklearn import ensemble
## Gradient Boosting Regressor with Default Params
gb_classifier = ensemble.GradientBoostingClassifier(random_state=1)
gb_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_classifier.__class__.__name__,
gb_classifier.score(X_train, Y_train),gb_classifier.score(X_test, Y_test)))
## Above Hyper-perameter tuned Gradient Boosting Regressor
gb_classifier = ensemble.GradientBoostingClassifier(random_state=1, **grad_boost_classif_grid.best_params_)
gb_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_classifier.__class__.__name__,
gb_classifier.score(X_train, Y_train),gb_classifier.score(X_test, Y_test)))
## Random Forest Regressor with Default Params
rforest_classif = ensemble.RandomForestClassifier(random_state=1)
rforest_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(rforest_classif.__class__.__name__,
rforest_classif.score(X_train, Y_train),rforest_classif.score(X_test, Y_test)))
## Extra Trees Regressor with Default Params
extra_forest_classif = ensemble.ExtraTreesClassifier(random_state=1)
extra_forest_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_forest_classif.__class__.__name__,
extra_forest_classif.score(X_train, Y_train),extra_forest_classif.score(X_test, Y_test)))
## Bagging Regressor with Default Params
bag_classif = ensemble.BaggingClassifier(random_state=1)
bag_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(bag_classif.__class__.__name__,
bag_classif.score(X_train, Y_train),bag_classif.score(X_test, Y_test)))
## Decision Tree with Default Parameters
dtree_classif = tree.DecisionTreeClassifier(random_state=1)
dtree_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(dtree_classif.__class__.__name__,
dtree_classif.score(X_train, Y_train),dtree_classif.score(X_test, Y_test)))
## Decision Tree with Default Parameters
extra_tree_classif = tree.ExtraTreeClassifier(random_state=1)
extra_tree_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_tree_classif.__class__.__name__,
extra_tree_classif.score(X_train, Y_train),extra_tree_classif.score(X_test, Y_test)))
GradientBoostingClassifier : Train Accuracy : 1.00, Test Accuracy : 0.96
GradientBoostingClassifier : Train Accuracy : 1.00, Test Accuracy : 0.97
RandomForestClassifier : Train Accuracy : 1.00, Test Accuracy : 0.94
ExtraTreesClassifier : Train Accuracy : 1.00, Test Accuracy : 0.95
BaggingClassifier : Train Accuracy : 1.00, Test Accuracy : 0.94
DecisionTreeClassifier : Train Accuracy : 1.00, Test Accuracy : 0.83
ExtraTreeClassifier : Train Accuracy : 1.00, Test Accuracy : 0.83
AdaBoost 回归器?可?
?AdaBoostRegressor?
??
作为??ensemble?
?sklearn 模块的一部分使用。我们将使用波士顿住房数据训练默认模型,然后通过尝试各种超参数设置来调整模型以提高其性能。我们还将它与其他回归估计器进行比较,以检查其相对于其他机器学习模型的性能。X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston, train_size=0.80, test_size=0.20, random_state=123)
print(Train/Test Sizes : , X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :(404, 13) (102, 13) (404,) (102,)
from sklearn.ensemble import AdaBoostRegressor
ada_boost_regressor = AdaBoostRegressor()
ada_boost_regressor.fit(X_train, Y_train)
AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss=linear,
n_estimators=50, random_state=None)
Y_preds = ada_boost_regressor.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print(Test R^2 Score : %.3f%ada_boost_regressor.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print(Training R^2 Score : %.3f%ada_boost_regressor.score(X_train, Y_train))
[18.221 27.061 47.212 18.138 31.677 39.125 27.313 11.922 17.933 26.785
26.249 20.919 17.408 27.167 18.951]
[15.26.6 45.4 20.8 34.9 21.9 28.77.2 20.32.2 24.1 18.5 13.5 27.
23.1]
Test R^2 Score : 0.834
Training R^2 Score : 0.912
?AdaBoostRegressor的重要属性?
以下是一些重要属性,?
?AdaBoostRegressor?
?一旦模型经过训练,这些属性就可以提供重要信息。- ?
?base_estimator_?
?- 它返回基估计器,从中创建由弱估计器组成的整个强估计器。
- ?
?feature_importances_?
?- 它返回一个浮点数组,表示数据集中每个特征的重要性。
- ?
?estimators_?
?- 它返回训练有素的估计器。
print("Base Estimator : ", ada_boost_regressor.base_estimator_)
Base Estimator :DecisionTreeRegressor(criterion=mse, max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter=best)
print("Feature Importances : ", ada_boost_regressor.feature_importances_)<
推荐阅读
- 每天学一个 Linux 命令(12)(chown)
- 每天学一个 Linux 命令(14)(cat)
- 每天学一个 Linux 命令(15)(man)
- #yyds干货盘点#PostgreSQL 1314中逻辑复制/解码改进
- 聊聊程序员面试时,那些必须注意的事情
- Java有了synchronized,为什么还要提供Lock
- #yyds干货盘点# Java 基础 - 反射机制详解
- Python 中的函数式编程三大法宝(mapfilterreduce)
- ASP.NET Core 自动刷新JWT Token #yyds干货盘点#