October 31, 2024
Evaluating ML Models: Accuracy, Precision, Recall, and F1 Score Explained
Has anyone ever thought why their highly accurate machine learning model isn't doing what they want it to do? It's important to test machine learning models to see how well they work, and relying only on their accuracy can be wrong. In this article, I'll discuss the four most important metrics for evaluating any machine learning models. And these metrics are Accuracy, Recall, Precision, and F1 score. This will help you choose the right measure for your specific use case.
Basics of Model Evaluation
Model evaluation is an important part of machine learning since it lets data scientists see how well their models work on data they haven't seen before. You can use different metrics to measure performance based on the type of problem (classification or regression). Let's see accuracy, precision, recall, and F1 score below as the main evaluating measures for classification tasks.
But before diving into these metrics, let's understand the key terms you'll use in their formulas:
- TP = Truly predicted positive values
- TN = Truly predicted negative values
- FP = Falsely predicted positive values, while these are actually negative.
- FN = Falsely predicted negative values, while these are actually positive.
Accuracy
Accuracy is one of the easiest measures to understand. You can calculate it by dividing the number of true predictions by the total number of predictions.
Formula:
Accuracy= (TP+TN)/(TP+TN+FP+FN)
Accuracy can give you a quick idea of how well a model is doing, but it can also be wrong, especially when the datasets aren't fair. This means that if 95% of the instances in a dataset belong to the same class, a model that guesses the majority class can get 95% correct without having to learn anything new.
Precision
Precision is a way to measure how well positive predictions come true. It solves the question, "How many of the cases that were predicted to be positive turned out to be positive?"
Formula:
Precision: (TP)/(TP+FP)
Let's take an example, when making medical analyses high precision is very important because false positives can cause extra stress and procedures.
Recall
Recall, which is also called sensitivity or true positive rate, measures how well a model can find all the important cases. It solves the question, "Of all the real positive cases, how many were correctly predicted?"
Formula:
Recall = (TP)/(TP+FN)
High recall is important for tasks like scam detection, where missing a fake transaction can cost a lot of money.
F1 Score
By finding the harmonic mean of precision and recall, the F1 score integrates them into a single measure. It strikes a good mix between precision and recall, which makes it useful when you need a single measure to judge how well a model worked.
Formula:
F1 = (2*Precision*Recall)/(Precision+Recall)
When datasets aren't fair, the F1 score is especially useful because it's a better measure than accuracy alone.
Understanding with a Coding Example
Let's use a simple example to see how to use Python to calculate these measures.
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
# Create a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, weights=[0.9, 0.1], random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a Random Forest model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Print the results
print(f'Accuracy: {accuracy:.2f}')
print(f'Precision: {precision:.2f}')
print(f'Recall: {recall:.2f}')
print(f'F1 Score: {f1:.2f}')
Conclusion
Overall, evaluating machine learning models are not just measuring accuracy. Precision, recall, and F1 score reveal model performance, particularly in skewed data. Understand these metrics to determine the right model and make sure it fits your application's demands. Never forget that the right metric can make or break your model!
342 views