Grid Search

模型参数(Model Parameters)

模型参数是根据训练集数据而定义的,故它们是利用训练集数据训练得到的,它们往往不能手动设置,常见的模型参数包括:

  • 线性模型、非线性模型的系数
  • 神经网络的权重,隐藏层的层数,每一层的神经元个数等
  • 随机森林中决策树的个数

模型超参数(Model Hyper-Parameters)

模型超参数往往独立于训练集而被定义,所以它们不能从训练集中学习得到。常见的超参数包括:

  • 模型的学习速率
  • k折交叉验证的k值

Grid Search

每一个模型几乎都有许多超参数,所以寻找超参数的一个直观的方法是尝试这些超参数的不能组合,然后比较结果。

Python实现

下面我们以寻找逻辑斯蒂回归模型最佳正则函数和学习速率为例,来感受一个Grid Search。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Create logistic regression object
logistic = linear_model.LogisticRegression()
# Create a list of all of the different penalty values that you want to test and save them to a variable called 'penalty'
penalty = ['l1', 'l2']
# Create a list of all of the different C values that you want to test and save them to a variable called 'C'
C = [0.0001, 0.001, 0.01, 1, 100]
# Now that you have two lists each holding the different values that you want test, use the dict() function to combine them into a dictionary.
# Save your new dictionary to the variable 'hyperparameters'
hyperparameters = dict(C=C, penalty=penalty)
# Fit your model using gridsearch
clf = GridSearchCV(logistic, hyperparameters, cv=5, verbose=0)
best_model = clf.fit(X, Y)
#Print all the Parameters that gave the best results:
print('Best Parameters',clf.best_params_)
# You can also print the best penalty and C value individually from best_model.best_estimator_.get_params()
print('Best Penalty:', best_model.best_estimator_.get_params()['penalty'])
print('Best C:', best_model.best_estimator_.get_params()['C'])