June 30, 2025

Implementing Scikit-learn Hyperparameter Optimization with Optuna

scikitlearn

python

optuna

hyperparametertuning

machinelearning

mloptimization

Only Coders

@onlyCoders

Share what you learn in this blog to prepare for your interview, create your forever-free profile now, and explore how to monetize your valuable knowledge.

Implementing Scikit-learn Hyperparameter Optimization with Optuna

After hours of fine-tuning a machine learning model, do you ever feel like you are stuck in a loop, and unable to get that last bit of accuracy improvement? Let me tell you how to automate and smarten this procedure. Utilizing Optuna, a tool for hyperparameter tuning, could help you save time and effort. This blogpost shows how to simply integrate Optuna with Scikit-learn to optimize model hyperparameters for faster results and improved performance.

But, What is Optuna and Why Use It?

Optuna, an innovative framework, automates the search for the optimum hyperparameters in machine learning models. Instead of manually changing Random Forest trees or neural network learning rates, Optuna employs complicated algorithms to determine the best combinations.

How come you care? Any model trainer knows that fine-tuning takes a lot of time and effort. Optuna's Tree-structured Parzen Estimators (TPE) automatically choose the next hyperparameters to test to simplify that. This makes improvement faster, better, and with less work.

Setting Up Your Environment

Make sure you have the proper tools before coding. Installing Optuna with Scikit-learn is simple with this command:

pip install optuna scikit-learn

After installation, we may begin. I will use the popular classification dataset Iris and the RandomForestClassifier model.

Basic Example: Hyperparameter Optimization with Optuna

Start simple: See how Optuna optimizes a RandomForestClassifier. The purpose is to find the optimum hyperparameters (such number of trees, max depth, and min samples split) for Iris dataset accuracy.

Importing Libraries and Dataset

import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load Iris dataset
data = load_iris()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

We load and divide the Iris dataset into training and test sets. Now it's time to specify our optimization procedure in following steps.

Defining the Objective Function

Optuna is based on objective function. To find, train our model, and give a performance measure (accuracy), we set the hyperparameters here.

def objective(trial):
    # Define hyperparameter search space
    n_estimators = trial.suggest_int('n_estimators', 10, 200)
    max_depth = trial.suggest_int('max_depth', 1, 32)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
   
    # Initialize and train the model
    model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, min_samples_split=min_samples_split)
    model.fit(X_train, y_train)
   
    # Predict and evaluate accuracy
    y_pred = model.predict(X_test)
    return accuracy_score(y_test, y_pred)

Set the search space for n_estimators, max_depth, and min_samples_split, among other hyperparameters. The trial.suggest_* functions help Optuna search hyperparameters. Here Optuna raises the accuracy score.

Running the Optimization

Run the optimization procedure using Optuna:

# Create Optuna study
study = optuna.create_study(direction='maximize')

# Run optimization
study.optimize(objective, n_trials=50)

# Print the best hyperparameters and best score
print(f"Best Hyperparameters: {study.best_params}")
print(f"Best Accuracy: {study.best_value}")

Create a study and tell it to optimize accuracy in the code above. Optuna will conduct 50 trials of various hyperparameter combinations to find the optimum ones.

Visualizing the Optimization Process

Optuna's built-in visualization tools shine. You may instantly see the optimization history to see how trials went or which hyperparameters influenced performance.

Hereâ€™s how you can visualize the results:

import optuna.visualization as vis

# Visualize optimization history
vis.plot_optimization_history(study)

# Visualize parameter importance
vis.plot_param_importances(study)

These plots reveal which search criteria were most crucial and how optimization evolved.

Advanced Techniques: Using Optuna with Cross-validation

We may use cross-validation for stronger optimization. To analyze our model's performance on many data splits, we may utilize StratifiedKFold instead of a train/test split. This reduces split overfitting.

To apply cross-validation, change the objective function:

from sklearn.model_selection import StratifiedKFold

def objective_with_cv(trial):
    # Hyperparameters to tune
    n_estimators = trial.suggest_int('n_estimators', 10, 200)
    max_depth = trial.suggest_int('max_depth', 1, 32)
   
    model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)
   
    # Cross-validation setup
    skf = StratifiedKFold(n_splits=5)
    scores = []
    for train_idx, test_idx in skf.split(X_train, y_train):
        X_train_fold, X_test_fold = X_train[train_idx], X_train[test_idx]
        y_train_fold, y_test_fold = y_train[train_idx], y_train[test_idx]
       
       model.fit(X_train_fold, y_train_fold)
        y_pred = model.predict(X_test_fold)
       scores.append(accuracy_score(y_test_fold, y_pred))
   
    return np.mean(scores)

Cross-validation using StratifiedKFold ensures that each fold has the same class proportion. The performance evaluation becomes more accurate.

Conclusion

So there you have it! Optuna simplifies hyperparameter optimization. It works with Scikit-learn to automate model hyperparameter selection, improving performance with less effort. Optuna lets you enhance your models using simple optimization or complex methods like cross-validation.

What is next? Play with Optuna on your models. Try alternative algorithms, hyperparameters, and optimization methods. I promise you'll get time savings and more significant outcomes!

65 views

Please Login to create a Question

Posts

Questions

Blogs

Jobs