November 13, 2024
Building a Decision Tree Classifier in Python
Do you ever wondered how machines can make decisions like humans? Here Decision Tree Classifier will answer your queries. It is an efficient machine learning algorithm that make decisions similar to humans. This classifier is just like a flowchart, where each internal node makes a decision based on features, and each leaf node represents an output.
To answer all your queries, I'll discuss how you can build and visualize a decision tree using Python's scikit-learn library. Let's get startedâ¦
Setting Up the Environment
Start from setting up the environment, before building decision tree classifier. For this you must install the following library:
pip install scikit-learn pandas matplotlib
Iââ¬â¢ll use Iris dataset for this tutorial. This dataset is widely used in machine learning community and includes features of Iris flower species. I'll use this dataset's features to train decision tree classifier to classify Iris flowers in different features.
Loading and Understanding the Dataset
Once you're done setting up the development environment, now let's load our dataset and get familiar with its structure. This dataset contains 150 images with 4 different features (sepal length, sepal width, petal length, petal width), and 3 different species (Setosa, Versicolor, or Virginica) of Iris flowers.
from sklearn.datasets import load_iris
import pandas as pd
# Load the dataset
iris = load_iris()
# Convert to DataFrame for easy viewing
data = pd.DataFrame(iris.data, columns=iris.feature_names)
data['species'] = iris.target
# Display the first few rows
print(data.head())
And our goal is to classify Iris flowers into one of the species according to its features.
Building the Decision Tree Classifier
So, you've now complete understanding of the dataset, now it's time to dig into the development phase of Decision Tree Classifier. For this, first you've to split dataset into training and testing to evaluate your model.
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
# Split the data into training and testing sets
X = data.drop('species', axis=1)
y = data['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize and train the Decision Tree model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate accuracy
accuracy = model.score(X_test, y_test)
print(f'Accuracy: {accuracy * 100:.2f}%')
In this code, we've trained the model on 70% dataset and tested it on the remaining 30% data. And the accuracy score will tell you how well your classifier performs.
Visualizing the Decision Tree
The beauty of decision trees depends on their interpretability. Now it's time to visualize the decision tree to see how this model makes decision according to the dataset.
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
# Plot the decision tree
plt.figure(figsize=(20,10))
plot_tree(model, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()
Conclusion
So, we've classified the famous Iris dataset using scikit-learn's Decision Tree Classifier in this guide. Visualising the decision tree helped us understand how the model decides. Decision trees are easy to understand and use, making them ideal for machine learning novices. After learning the fundamentals, you can try different datasets or model settings to increase accuracy.
130 views