How to retrieve best model from xgboost.train

2 years ago

#238276

user15325627

I'm learning how to use XGBClassifier to generate predictions, and I found out that xgboost.train is what XGBClassifier calls under the hood. I guess the first question is: is there any reason to favor one way over another, or are they not equivalent at all?

I had this code set up that gave me the best model at iteration 12:

m1 = xgb.XGBClassifier(max_depth = 5,
                       n_estimators = 20,
                       objective = 'binary:logistic',
                       use_label_encoder = False,
                       eval_metric = 'auc',
                       random_state = 1234)

m1.fit(x_train, y_train,
       eval_set = [(x_test, y_test)],
       eval_metric = 'auc',
       early_stopping_rounds = 5)

pred1 = m1.predict_proba(x_test)[:,1]
roc_auc_score(y_test, pred1)

I haven't tuned the parameters yet as I just wanted to make sure the code runs. Then I had the code below set up, hoping to get the same behavior as the one above:

train_params = {'objective': 'binary:logistic',
                'max_depth': 5,
                'eval_metric':'auc',
                'random_state':1234}

mat_train = xgb.DMatrix(data = x_train, label = y_train)
mat_test = xgb.DMatrix(data = x_test, label = y_test)

evals_result = {}
m2 = xgb.train(params = train_params,
               dtrain = mat_train,
               num_boost_round = 20,
               early_stopping_rounds = 5,
               evals = [(mat_test, 'eval')],
               evals_result = evals_result)

pred2 = m2.predict(mat_test)
roc_auc_score(y_test, pred2)

This also returns the same best model at iteration 12, but the prediction turns out different than the XGBClassifier method because pred2 actually used the 17th iteration. I dug through the docs and found this about the early_stopping_rounds argument:

The method returns the model from the last iteration (not the best one). Use custom callback or model slicing if the best model is desired.

I haven't been able to find a lot of resources on this topic, so I'm here to ask for some help so that I can generate predictions using the model iteration with the highest AUC value. Appreciate it!

python

xgboost

xgbclassifier

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs