gridSearch performance measure effect

Refresh

March 2019

Views

6 time

0

I have an assignment and it asks me to:

Improve the performance of the models from the previous stepwith hyperparameter tuning and select a final optimal model using grid search based on a metric (or metrics) that you choose. Choosing an optimal model for a given task (comparing multiple regressors on a specific domain) requires selecting performance measures, for example, R2(coefficient of determination) and/or RMSE (root mean squared error) to compare the model performance.

I used this code for hyperparameter tuning:

model_example = GradientBoostingRegressor()
parameters = {'learning_rate': [0.1, 1], 
              'max_depth': [5,10]}

model_best = GridSearchCV(model_example,
                          param_grid=parameters,
                          cv=2,scoring='r2').fit(X_train_new,y_train_new)
model_best.best_estimator_

I found the learning rate=0.1 and max_dept=5 I have chosen scoring='r3' as a performance measure but it doesn't have any effect on my model accuracy when I used this code for providing my best model:

my_best_model = GradientBoostingRegressor(learning_rate=0.1,
                                          max_depth=5).fit(X_train_new,y_train_new)
my_best_model.score(X_train_new,y_train_new)

Do you know what's wrong with my work?

CFD

1 answers

0

Try setting a random_state as a parameter of your GradientBoostingRegressor(). For example, GradientBoostingRegressor(random_state=1).

The model will then produce the same results on the same data. Without that parameter, there's an element of randomness that makes it difficult to compare different model fits.

Setting a random state on the train-test-split will also help with this.