1 year ago
#333287
user1093541
Why do I need to fit a GridSearchCV object to the data before I can get the best parameters and model?
My question is similar to the one here, but the answer there does not explain why we must fit to the data before getting the best paramters, it just states that we must. In my understanding, GridSeachCV picks the best model parameters using cross validation, and then returns this best model in the .best_estimator
attribute. Then we can fit the that model to our data. But shouldn't we be able to access the parameters it picked and the .best_estimator
model before fitting to the data?
As an example, the code below works fine:
logreg = LogisticRegression(solver='liblinear')
params = {'penalty':['l1','l2'], 'C':np.logspace(-3,3,7),}
grid = GridSearchCV(estimator=logreg, params, cv=4)
grid.fit(X_train,y_train)
best_model_params = grid.best_params_
y_pred = grid.predict(X_test)
But the following code does not work:
logreg = LogisticRegression(solver='liblinear')
params = {'penalty':['l1','l2'], 'C':np.logspace(-3,3,7),}
grid = GridSearchCV(estimator=logreg, params, cv=4)
best_model_params = grid.best_params_
grid.fit(X_train,y_train)
y_pred = grid.predict(X_test)
It gives AttributeError: 'GridSearchCV' object has no attribute 'best_estimator_'
.
On a related note, if grid.best_estimator_
is the best LogisticRegression model (the model with the set of hyperparameters found to be best through cross validation in GridSearchCV) then why do we fit the grid
object instead of the grid.best_estimator_
object? E.g. if I could figure out how to access the best_estimator
attribute before fitting, would the following code work?:
logreg = LogisticRegression(solver='liblinear')
params = {'penalty':['l1','l2'], 'C':np.logspace(-3,3,7),}
grid = GridSearchCV(estimator=logreg, params, cv=4)
best_model = <somehow get the model GridSearchCV has picked>
best_model.fit(X_train,y_train)
y_pred = best_model.predict(X_test)
python
machine-learning
scikit-learn
cross-validation
gridsearchcv
0 Answers
Your Answer