贝叶斯优化python包_《用贝叶斯优化进行超参数调优》贝叶斯优化python包

TPE CMAES 网格搜索随机搜索贝叶斯优化
用贝叶斯优化进行超参数调优
@QI ZHANG · JUL 12, 2019 · 7 MIN READ
超参数调优一直是机器学习里比较intractable的问题，繁多的超参数以及指数型爆炸的参数空间，往往让人无从下手。调参是一个很枯燥的过程，而且最后也不一定有很好的reward。很多的机器学习工程师也会戏称自己是”调参民工”，”炼丹师”……
超参数(Hyper-parameters)：Hyper-parameters are parameters that are not directly learnt within estimators.
超参数的优化可以看做是这样一个方程：
???=argmin??∈??(??)x?=arg?minx∈Xf(x)
其中??(??)f(x) 表示目标函数，用于衡量误差，可以是MSE, RMSE, MAE等(如果是accuracy等指标可以将其添加负号求最小值)，???x?是使得??(??)f(x)最小的参数组合，理论上我们的目标就是要找到这个???x?，但是要找到这样一个全局最优解几乎是不可能的。首先这个??(??)f(x)是个黑盒子，我们没法直接进行优化求解。事实上，我们每次为了得到??(??)f(x)，需要经过模型训练和评估，非常耗时，尤其对于一些深度学习模型，这个过程会特别漫长。我们只能在有限的计算资源和时间内，得到一些相对的局部最优解。
一般的调参方法有下面几种：
手动调参(Manual Search)
网格搜索(Grid Search)
随机搜索(Randomized Search)
贝叶斯优化(Bayesian Optimization)
1. 手动调参
对于手动调参，会对模型最重要的一些参数基于经验进行调节。比如lightgbm的叶num_leaves, learning_rate, feature_fraction, lambda_l1,lambda_l2, min_data_in_leaf等。
2. Randomized Search vs Grid Search
Grid Search会对定义的参数空间进行暴力搜索。网格搜索速度慢，但在搜索整个搜索空间方面效果很好。Randomized Search是从定义的参数空间中进行采样，然后训练。随机搜索很快，但可能会错过搜索空间中的重要点。

文章图片

事实上，调参的时候，不需要遍历模型的所有参数。事实上，影响效果的往往只有其中的几个参数，一般对这些参数进行Randomized Search或者Grid Search即可。具体可以查看模型文档，或相关文献。
# Comparing randomized search and grid search for hyperparameter estimation
import numpy as np
from time import time
from scipy.stats import randint as sp_randint
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
# get some data
digits = load_digits()
X, y = digits.data, digits.target
# build a classifier
clf = RandomForestClassifier(n_estimators=20)
# Utility function to report best scores
def report(results, n_top=3):
for i in range(1, n_top + 1):
candidates = np.flatnonzero(results[‘rank_test_score‘] == i)
for candidate in candidates:
print("Model with rank: {0}".format(i))
print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
results[‘mean_test_score‘][candidate],
results[‘std_test_score‘][candidate]))
print("Parameters: {0}".format(results[‘params‘][candidate]))
print("")
# specify parameters and distributions to sample from
param_dist = {"max_depth": [3, None],
"max_features": sp_randint(1, 11),
"min_samples_split": sp_randint(2, 11),
"bootstrap": [True, False],
"criterion": ["gini", "entropy"]}
# run randomized search
n_iter_search = 20
random_search = RandomizedSearchCV(clf, param_distributions=param_dist,
n_iter=n_iter_search, cv=5)
start = time()
random_search.fit(X, y)
print("RandomizedSearchCV took %.2f seconds for %d candidates"
" parameter settings." % ((time() - start), n_iter_search))
report(random_search.cv_results_)
# use a full grid over all parameters
param_grid = {"max_depth": [3, None],
"max_features": [1, 3, 10],
"min_samples_split": [2, 3, 10],
"bootstrap": [True, False],
"criterion": ["gini", "entropy"]}
# run grid search
grid_search = GridSearchCV(clf, param_grid=param_grid, cv=5)
start = time()
grid_search.fit(X, y)
print("GridSearchCV took %.2f seconds for %d candidate parameter settings."
% (time() - start, len(grid_search.cv_results_[‘params‘])))
report(grid_search.cv_results_)
Output:
RandomizedSearchCV took 5.55 seconds for 20 candidates parameter settings.
Model with rank: 1
Mean validation score: 0.939 (std: 0.024)
Parameters: {‘bootstrap‘: False, ‘criterion‘: ‘entropy‘, ‘max_depth‘: None, ‘max_features‘: 7, ‘min_samples_split‘: 3}
Model with rank: 2
Mean validation score: 0.933 (std: 0.022)
Parameters: {‘bootstrap‘: False, ‘criterion‘: ‘gini‘, ‘max_depth‘: None, ‘max_features‘: 6, ‘min_samples_split‘: 6}
Model with rank: 3
Mean validation score: 0.930 (std: 0.031)
Parameters: {‘bootstrap‘: True, ‘criterion‘: ‘gini‘, ‘max_depth‘: None, ‘max_features‘: 6, ‘min_samples_split‘: 6}
GridSearchCV took 16.95 seconds for 72 candidate parameter settings.
Model with rank: 1
Mean validation score: 0.937 (std: 0.019)
Parameters: {‘bootstrap‘: False, ‘criterion‘: ‘entropy‘, ‘max_depth‘: None, ‘max_features‘: 10, ‘min_samples_split‘: 2}
Model with rank: 2
Mean validation score: 0.935 (std: 0.020)
Parameters: {‘bootstrap‘: False, ‘criterion‘: ‘gini‘, ‘max_depth‘: None, ‘max_features‘: 10, ‘min_samples_split‘: 2}
Model with rank: 3
Mean validation score: 0.930 (std: 0.029)
Parameters: {‘bootstrap‘: False, ‘criterion‘: ‘entropy‘, ‘max_depth‘: None, ‘max_features‘: 10, ‘min_samples_split‘: 3}
1.1 Grid Search
from sklearn.model_selection import GridSearchCV
def func_grid_search(model, X_train, y_train, param_grid, cv=5, n_jobs=-1):
"""
model: model instance
para_grid: dict type, params searching grid.
para_grid = {‘n_estimators‘:[100,200,500],
‘max_depth‘: [5, 8, 10, 15],
‘max_features‘: [0.80, 0.90, 0.95],
‘min_samples_split‘:[2, 5, 8],
‘min_samples_leaf‘: [1, 3, 5]}
"""
gs = GridSearchCV(model, param_grid, cv=cv, n_jobs=n_jobs)
gs.fit(X_train, y_train.ravel())
print(gs.best_params_)
print(gs.best_score_)
# print(gs.cv_results_)
bst_model = gs.best_estimator_
return bst_model
1.2 Randomized Search
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint as sp_randint
def func_random_search(model, X_train, y_train, param_dist, n_iter=20, cv=5, n_jobs=-1):
"""
# parameters for GridSearchCV
# specify parameters and distributions to sample from
param_dist = {"max_depth": [3, 5],
"max_features": sp_randint(1, 11),
"min_samples_split": sp_randint(2, 11),
"min_samples_leaf": sp_randint(1, 11),
"bootstrap": [True, False]
}
"""
rs = RandomizedSearchCV(model, param_dist, n_iter=n_iter, cv=cv, n_jobs=n_jobs)
rs.fit(X_train, y_train.ravel())
print(rs.best_params_)
print(rs.best_score_)
# print(rs.cv_results_)
bst_model = rs.best_estimator_
return bst_model
Grid Search太慢了，并不是很实用，一般会对参数空间先调一个粗粒度的格点搜索，然后根据结果进行细粒度的调整。而Randomized Search在迭代一定的次之后也可能实现较好的效果，值得更多的尝试。

文章图片

3. Bayesian Optimization
Grid Search和Randomized Search虽然可以让整个调参过程自动化，但它们无法从之前的调参结果中获取信息，可能会尝试很多无效的参数空间。而贝叶斯优化，会对上一次的评估结果进行追踪，建立一个概率模型，反应超参数在目标函数上表现的概率分布，即??(score|hyperparameters)P(score|hyperparameters)，可用于指导下一次的参数选择。
贝叶斯优化可以更好的trade off Exploration&Exploitation，而且适用于随机、非凸、不连续方程的优化。具体过程可用一句话概括为：对目标函数建立概率模型，通过这个概率模型得到最promising的参数组合，用于最终目标函数的评估。
Sequential Model-Based Optimization(SMBO) 是贝叶斯优化更具体的表现形式，可认为它们是等价的。一般会有以下几个过程：
给定要搜索的超参数空间
定义一个目标函数用于评估优化
建立目标函数的surrogate model
建立一个评估surrogate model，作为选择超参数的标准(选择方程)
获取score和hyperparameter的样本用于更新 surrogate model
贝叶斯优化模型主要的区分是代理函数(surrogate function)的差异。surrogate model一般有 Gaussian Process, Random Forest和Tree Parzen Estimator (TPE) 这几种。常见的框架有Spearmint, Hyperopt, SMAC, MOE, BayesianOptimization, skopt等，它们的对比如下表：
librarysurrogate function
Spearmint
Gaussian Process
Hyperopt
Tree Parzen Estimator (TPE)
SMAC
Random Forest
Optunity包括多种超参数调优的方法：
下面以Hyperopt为例说明贝叶斯优化的具体应用。
对于一个优化问题我们可以分为4个部分：
优化的目标函数
优化的参数空间
超参数优化方程，建立代理函数(surrogate function)，并用其决定下一次尝试的参数组合
Trials: 记录每次尝试的loss、参数及更多额外信息(可DIY)，可以记录整个迭代的过程，用于回测。
空间定义：
from hyperopt import STATUS_OK, fmin, tpe, Trials, hp
import xgboost as xgb
import logging
from timeit import default_timer as timer
import os
from functools import partial
import pandas as pd
import numpy as np
MAX_EVALS = 100 # 迭代次数
NFOLDS = 5 # K-FOLD
FOLDS = None # 自定义的FOLDS，优先级高于NFOLDS
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
XGB_SPACE = {
‘booster‘: ‘gbtree‘,
‘random_state‘: 2019,
‘eval_metric‘: ‘rmse‘,
‘n_jobs‘: -1,
‘learning_rate‘: 0.05,
‘subsample‘: hp.uniform(‘subsample‘, 0.1, 1.0),
‘colsample_bytree‘: hp.uniform(‘colsample_bytree‘, 0.1, 1.0),
‘max_depth‘: hp.quniform(‘max_depth‘, 5, 30, 1),
‘gamma‘: hp.uniform(‘gamma‘, 0.0, 2.0),
‘min_child_weight‘: hp.uniform(‘min_child_weight‘, 0.0, 5.0),
‘reg_alpha‘: hp.uniform(‘reg_alpha‘, 0.0, 3.0),
‘reg_lambda‘: hp.uniform(‘reg_lambda‘, 0.0, 3.0)
}
定义优化的目标函数：
def objective_base(params,
train_set,
folds=None,
nfold=5,
writetoFile=True):
"""
Objective function for Gradient Boosting Machine Hyperparameter Optimization
Args:
folds: This argument has highest priority over other data split arguments.
Return:
"""
# Keep track of evals
global _ITERATION
_ITERATION += 1
# Make sure parameters that need to be integers are integers
for parameter_name in [
‘num_leaves‘, ‘max_depth‘, ‘bagging_freq‘, ‘min_data_in_leaf‘,
‘min_samples_split‘, ‘min_samples_leaf‘
]:
if parameter_name in params:
params[parameter_name] = int(params[parameter_name])
start = timer()
logging.info(f"{_ITERATION} ITERATION")
logging.info(f"params:\n{params}")
cv_dict = xgb.cv(params,
train_set,
num_boost_round=5000,
nfold=nfold,
stratified=False,
folds=folds,
early_stopping_rounds=100,
as_pandas=False,
verbose_eval=10,
seed=0,
shuffle=False)
# Extract the min rmse, Loss must be minimized
loss = np.min(cv_dict[‘test-rmse-mean‘])
# Boosting rounds that returned the lowest cv rmse
n_estimators = int(np.argmin(cv_dict[‘test-rmse-mean‘])+1)
run_time = timer() - start
# Write to the csv file (‘a‘ means append)
if writetoFile:
random_datetime = str(int(time.time()))
hyper_base_path = os.path.join(BASE_DIR, ‘hyperopt_output‘)
trial_file = os.path.join(hyper_base_path, ‘trials.csv‘)
trial_file_rename = os.path.join(hyper_base_path,
‘trials_%s.csv‘ % random_datetime)
if not os.path.exists(hyper_base_path):
os.makedirs(hyper_base_path)
print(
"No trial file directory exists, will be created..."
)
if os.path.exists(trial_file) and _ITERATION == 1:
print("Trial file exists, will be renamed...")
os.rename(trial_file, trial_file_rename)
assert os.path.exists(
trial_file
) == False, "Trial file still exists, rename failed..."
# File to save first results
of_connection = open(trial_file, ‘w‘)
writer = csv.writer(of_connection)
# Write the headers to the file
writer.writerow(
[‘loss‘, ‘params‘, ‘iteration‘, ‘estimators‘, ‘train_time‘])
of_connection.close()
of_connection = open(trial_file, ‘a‘)
writer = csv.writer(of_connection)
writer.writerow([loss, params, _ITERATION, n_estimators, run_time])
# Dictionary with information for evaluation
return {
‘loss‘: loss,
‘params‘: params,
‘iteration‘: _ITERATION,
‘estimators‘: n_estimators,
‘train_time‘: run_time,
‘status‘: STATUS_OK
}
定义前处理和后处理模块：
def build_train_set(X_train, y_train):
isX_df = isinstance(X_train, pd.DataFrame)
isY_sr = isinstance(y_train, pd.Series)
isY_df = isinstance(y_train, pd.DataFrame)
if isY_df:
raise TypeError(
f"y_train is df, with the shape {y_train.shape}, which is not supportable now."
)
if isX_df ^ isY_sr:
raise TypeError(f"X_train and y_train have different types!")
if isX_df:
train_set = xgb.DMatrix(X_train.values, y_train.values)
else:
train_set = xgb.DMatrix(X_train, y_train)
return train_set
def post_hyperopt(bayes_trials, train_set, folds=None, nfold=5):
# get best params
bayes_results = pd.DataFrame(bayes_trials.results)
bayes_results = bayes_results.sort_values(by=‘loss‘)
bayes_results.reset_index(drop=True, inplace=True)
best_params = bayes_results.loc[0, ‘params‘]
# get best loss and trees
best_params[‘learning_rate‘] = 0.01
# Perform n_folds cross validation
cv_dict = xgb.cv(best_params,
train_set,
num_boost_round=5000,
folds=folds,
nfold=nfold,
stratified=False,
shuffle=False,
early_stopping_rounds=100,
as_pandas=False,
verbose_eval=10,
seed=2019)
# Extract the min rmse, Loss must be minimized
loss = np.min(cv_dict[‘test-rmse-mean‘])
# Boosting rounds that returned the lowest cv rmse
n_estimators = int(np.argmin(cv_dict[‘test-rmse-mean‘]) + 1)
best_params[‘n_estimators‘] = n_estimators
logging.info(f"best loss: {loss}, best n_estimators: {n_estimators}")
logging.info(f"best params: {best_params}")
return best_params, loss
定义主函数：
def main_tuning_with_bo(X_train,
y_train,
max_evals=MAX_EVALS,
folds=FOLDS,
nfold=NFOLD):
# Keep track of results
bayes_trials = Trials()
# Global variable
global _ITERATION
_ITERATION = 0
TRAIN_SET = build_train_set(X_train, y_train)
SPACE = XGB_SPACE
func_objective = partial(objective_base,
train_set=TRAIN_SET,
folds=folds,
nfold=nfold,
writetoFile=True)
# Run optimization
best = fmin(fn=func_objective,
space=SPACE,
algo=tpe.suggest,
max_evals=max_evals,
trials=bayes_trials,
rstate=np.random.RandomState(2019))
best_params, loss = post_hyperopt(bayes_trials,
train_set=TRAIN_SET,
folds=folds,
nfold=nfold)
return best_params, loss
Reference
【贝叶斯优化python包_《用贝叶斯优化进行超参数调优》】原文：https://www.cnblogs.com/cx2016/p/12899500.html

贝叶斯优化python包_《用贝叶斯优化进行超参数调优》

推荐阅读

mysql回收站功能 mysql收回所有权限

kb4523205更新了什么

如何开发安卓系统，如何开始android开发

回归分析点预测与区间预测的联系

电子信息类就业前景分析,大学专业电子信息类就业前景

什么叫特殊韵母

飞机有喇叭吗?为什么没有声音飞机有喇叭吗?为什么?

红酒在多少度时喝最好

vc精华液的作用和功效 vc的作用和功效

安卓苹果内存机制,为什么比安卓更快?原来是这样

威能锅炉显示5s是什么怎么回事,遇到这种故障解决方法步骤

兰博基尼毒药能上路吗

里番acg全彩资源站

孕妇奶粉哪家好？推荐几款好喝的孕妇奶粉

吸烟|不能饮酒，不能吸烟，不能熬夜，养生专家的话有多少值得相信？

安卓3d开发,本书作者专门出版unity3.x书籍

缅甸花梨木价格暴跌缅甸黄花梨一斤多少钱

运动女鞋哪种牌子好？想买一双跑步锻炼的鞋子？

电动车合格证可以补办吗

天猫618活动真的便宜吗 2021年天猫618预售便宜还是618便宜