November 12, 2024

Implementing K-Nearest Neighbors (KNN) from Scratch

knn

python

machine learning

Mia Garcia

@mia-garcia

Share what you learn in this blog to prepare for your interview, create your forever-free profile now, and explore how to monetize your valuable knowledge.

Do you want to know how different machine learning algorithms can classify data points by analyzing around? No worries, I've come up with an amazing algorithm, K-Nearest Neighbors (KNN). This algorithm classifies data based on proximity, and is famous due to its simplicity and effectiveness.
And the most exciting part is, today in this article, I'll tell you all about how you can implement KNN from scratch without installing any libraries.

Understanding the KNN Algorithm

KNN as described by its name, classifies new data points based on the majority class of the most K nearest neighbors. You can measure the distance between new data points and other data points already in dataset using this algorithm.

Choosing K (the number of neighbors) and the distance metric are the important factors of this classifier. I'll use Euclidean distance for this guide, which is a commonly used metric for measuring the straight-line distance between data points.

Setting Up a Small Dataset

I'm not taking a big dataset for this guide instead, I'm going to create a small dataset manually to keep things simple.

# Example dataset
dataset = [
    [2, 3, 'A'], [5, 4, 'A'], [9, 6, 'A'],  # Class A points
    [4, 7, 'B'], [8, 1, 'B'], [7, 2, 'B']   # Class B points
]

# New point to classify
new_point = [6, 3]

Here are 2 classes of data points plotted on a 2D plane. Each list has 2 features (x and y) with a class label ('A' or â€˜Bâ€™). And our target is not classify new new_point according to its neighbors.

Implementing KNN from Scratch

Step#1 Calculate Euclidean Distance

In this phase, you've to calculate the Euclidean Distance within new and already existed data points.

import math

# Function to calculate Euclidean distance
def euclidean_distance(point1, point2):
    return math.sqrt((point1[0] - point2[0]) ** 2 + (point1[1] - point2[1]) ** 2)

Step#2 Find the K Nearest Neighbors

Next, you'll select the K nearest neighbors by sorting out the dataset based on distance.

# Function to get k nearest neighbors
def get_nearest_neighbors(dataset, new_point, k):
    distances = []
    for point in dataset:
        distance = euclidean_distance(new_point, point)
        distances.append((point, distance))
   
    # Sort points by distance and return the nearest k neighbors
    distances.sort(key=lambda x: x[1])
    neighbors = [distances[i][0] for i in range(k)]
    return neighbors

Step#3 Determine the Majority Class

Once you've find the K nearest neighbor, now it's time to look at the majority class among your dataset's K neighbors to classify the new data point.

# Function to make prediction based on majority vote
def predict_classification(neighbors):
    classes = [neighbor[2] for neighbor in neighbors]
    prediction = max(set(classes), key=classes.count)
    return prediction

Step#4 Testing the Algorithm

So, as you've implemented the KNN algorithm made from scratch, now it's time to test it on our small dataset with K=3 to check its accuracy.

# Testing with k=3
k = 3
neighbors = get_nearest_neighbors(dataset, new_point, k)
predicted_class = predict_classification(neighbors)
print(f'The predicted class for point {new_point} is: {predicted_class}')

Let's analyze the results, after running the code, KNN algorithm will find 3 nearest neighbors, check their class labels and will predict the class for new data point in space. Like as you see in above code, our new data point was [6, 3] so it will be classified as Class â€˜Bâ€™.

Conclusion

Congratulations! You built K-Nearest Neighbors (KNN) from scratch without using libraries. I discussed KNN basics, computed distances, found closest neighbors, and created a majority class prediction in this guide. Now, you can try alternative K values and bigger datasets to learn more.

326 views

Please Login to create a Question

Posts

Questions

Blogs

Jobs