blog bg

November 13, 2024

Building a Decision Tree Classifier in Python

Share what you learn in this blog to prepare for your interview, create your forever-free profile now, and explore how to monetize your valuable knowledge.

 

Do you ever wondered how machines can make decisions like humans? Here Decision Tree Classifier will answer your queries. It is an efficient machine learning algorithm that make decisions similar to humans. This classifier is just like a flowchart, where each internal node makes a decision based on features, and each leaf node represents an output.

 

To answer all your queries, I'll discuss how you can build and visualize a decision tree using Python's scikit-learn library. Let's get started…

 

Setting Up the Environment

Start from setting up the environment, before building decision tree classifier. For this you must install the following library:

 

 

 

pip install scikit-learn pandas matplotlib

 

I’ll use Iris dataset for this tutorial. This dataset is widely used in machine learning community and includes features of Iris flower species. I'll use this dataset's features to train decision tree classifier to classify Iris flowers in different features.

 

Loading and Understanding the Dataset

Once you're done setting up the development environment, now let's load our dataset and get familiar with its structure. This dataset contains 150 images with 4 different features (sepal length, sepal width, petal length, petal width), and 3 different species (Setosa, Versicolor, or Virginica) of Iris flowers.

 

 

 

from sklearn.datasets import load_iris
import pandas as pd

# Load the dataset
iris = load_iris()

# Convert to DataFrame for easy viewing
data = pd.DataFrame(iris.data, columns=iris.feature_names)
data['species'] = iris.target

# Display the first few rows
print(data.head())

 

And our goal is to classify Iris flowers into one of the species according to its features.

 

Building the Decision Tree Classifier

So, you've now complete understanding of the dataset, now it's time to dig into the development phase of Decision Tree Classifier. For this, first you've to split dataset into training and testing to evaluate your model.

 

 

 

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# Split the data into training and testing sets
X = data.drop('species', axis=1)
y = data['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the Decision Tree model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate accuracy
accuracy = model.score(X_test, y_test)
print(f'Accuracy: {accuracy * 100:.2f}%')

 

In this code, we've trained the model on 70% dataset and tested it on the remaining 30% data. And the accuracy score will tell you how well your classifier performs.

 

Visualizing the Decision Tree

The beauty of decision trees depends on their interpretability. Now it's time to visualize the decision tree to see how this model makes decision according to the dataset.

 

 

 

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

# Plot the decision tree
plt.figure(figsize=(20,10))
plot_tree(model, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()

 

Conclusion

So, we've classified the famous Iris dataset using scikit-learn's Decision Tree Classifier in this guide. Visualising the decision tree helped us understand how the model decides. Decision trees are easy to understand and use, making them ideal for machine learning novices. After learning the fundamentals, you can try different datasets or model settings to increase accuracy.

130 views

Please Login to create a Question