November 12, 2024
Implementing K-Nearest Neighbors (KNN) from Scratch
Do you want to know how different machine learning algorithms can classify data points by analyzing around? No worries, I've come up with an amazing algorithm, K-Nearest Neighbors (KNN). This algorithm classifies data based on proximity, and is famous due to its simplicity and effectiveness.
And the most exciting part is, today in this article, I'll tell you all about how you can implement KNN from scratch without installing any libraries.
Understanding the KNN Algorithm
KNN as described by its name, classifies new data points based on the majority class of the most K nearest neighbors. You can measure the distance between new data points and other data points already in dataset using this algorithm.
Choosing K (the number of neighbors) and the distance metric are the important factors of this classifier. I'll use Euclidean distance for this guide, which is a commonly used metric for measuring the straight-line distance between data points.
Setting Up a Small Dataset
I'm not taking a big dataset for this guide instead, I'm going to create a small dataset manually to keep things simple.
# Example dataset
dataset = [
[2, 3, 'A'], [5, 4, 'A'], [9, 6, 'A'], # Class A points
[4, 7, 'B'], [8, 1, 'B'], [7, 2, 'B'] # Class B points
]
# New point to classify
new_point = [6, 3]
Here are 2 classes of data points plotted on a 2D plane. Each list has 2 features (x and y) with a class label ('A' or âBâ). And our target is not classify new new_point according to its neighbors.
Implementing KNN from Scratch
Step#1 Calculate Euclidean Distance
In this phase, you've to calculate the Euclidean Distance within new and already existed data points.
import math
# Function to calculate Euclidean distance
def euclidean_distance(point1, point2):
return math.sqrt((point1[0] - point2[0]) ** 2 + (point1[1] - point2[1]) ** 2)
Step#2 Find the K Nearest Neighbors
Next, you'll select the K nearest neighbors by sorting out the dataset based on distance.
# Function to get k nearest neighbors
def get_nearest_neighbors(dataset, new_point, k):
distances = []
for point in dataset:
distance = euclidean_distance(new_point, point)
distances.append((point, distance))
# Sort points by distance and return the nearest k neighbors
distances.sort(key=lambda x: x[1])
neighbors = [distances[i][0] for i in range(k)]
return neighbors
Step#3 Determine the Majority Class
Once you've find the K nearest neighbor, now it's time to look at the majority class among your dataset's K neighbors to classify the new data point.
# Function to make prediction based on majority vote
def predict_classification(neighbors):
classes = [neighbor[2] for neighbor in neighbors]
prediction = max(set(classes), key=classes.count)
return prediction
Step#4 Testing the Algorithm
So, as you've implemented the KNN algorithm made from scratch, now it's time to test it on our small dataset with K=3 to check its accuracy.
# Testing with k=3
k = 3
neighbors = get_nearest_neighbors(dataset, new_point, k)
predicted_class = predict_classification(neighbors)
print(f'The predicted class for point {new_point} is: {predicted_class}')
Let's analyze the results, after running the code, KNN algorithm will find 3 nearest neighbors, check their class labels and will predict the class for new data point in space. Like as you see in above code, our new data point was [6, 3] so it will be classified as Class âBâ.
Conclusion
Congratulations! You built K-Nearest Neighbors (KNN) from scratch without using libraries. I discussed KNN basics, computed distances, found closest neighbors, and created a majority class prediction in this guide. Now, you can try alternative K values and bigger datasets to learn more.
117 views