1 year ago

#333287

test-img

user1093541

Why do I need to fit a GridSearchCV object to the data before I can get the best parameters and model?

My question is similar to the one here, but the answer there does not explain why we must fit to the data before getting the best paramters, it just states that we must. In my understanding, GridSeachCV picks the best model parameters using cross validation, and then returns this best model in the .best_estimator attribute. Then we can fit the that model to our data. But shouldn't we be able to access the parameters it picked and the .best_estimator model before fitting to the data?

As an example, the code below works fine:

logreg = LogisticRegression(solver='liblinear')  
params = {'penalty':['l1','l2'], 'C':np.logspace(-3,3,7),}
grid = GridSearchCV(estimator=logreg, params, cv=4)                                                                                                                                                   
grid.fit(X_train,y_train)  
best_model_params = grid.best_params_
y_pred = grid.predict(X_test)

But the following code does not work:

logreg = LogisticRegression(solver='liblinear')  
params = {'penalty':['l1','l2'], 'C':np.logspace(-3,3,7),}
grid = GridSearchCV(estimator=logreg, params, cv=4)                                                                                                                                                   
best_model_params = grid.best_params_
grid.fit(X_train,y_train)  
y_pred = grid.predict(X_test)

It gives AttributeError: 'GridSearchCV' object has no attribute 'best_estimator_'.

On a related note, if grid.best_estimator_ is the best LogisticRegression model (the model with the set of hyperparameters found to be best through cross validation in GridSearchCV) then why do we fit the grid object instead of the grid.best_estimator_ object? E.g. if I could figure out how to access the best_estimator attribute before fitting, would the following code work?:

logreg = LogisticRegression(solver='liblinear')  
params = {'penalty':['l1','l2'], 'C':np.logspace(-3,3,7),}
grid = GridSearchCV(estimator=logreg, params, cv=4)                                                                                                                                                   
best_model = <somehow get the model GridSearchCV has picked>
best_model.fit(X_train,y_train)  
y_pred = best_model.predict(X_test)

python

machine-learning

scikit-learn

cross-validation

gridsearchcv

0 Answers

Your Answer

Accepted video resources