blog bg

November 14, 2024

Sentiment Analysis with Naive Bayes in Python

Share what you learn in this blog to prepare for your interview, create your forever-free profile now, and explore how to monetize your valuable knowledge.

 

Do you know how companies evaluate thousands of customer feedbacks to know the popularity of their products. In this case sentiment analysis comes to play. This process helps us to classify text expressions as positive and negative emotions. 

 

In this guide, I'll explain a robust algorithm to perform sentiment analysis, Naive Bayes algorithm. This algorithm is widely popular for text classification tasks. I'll describe this with a detailed coding example using Python and its scikit-learn library to classify positive or negative reviews.

 

Understanding the Naive Bayes Algorithm

The Naive Bayes classifier uses Bayes Theorem to determine event probability based on previous knowledge of related occurrences. Text classification using Naive Bayes calculates the likelihood that a review is good or negative based on its terms. 

Naive Bayes assumes all features (words) are independent, which simplifies computations yet works well for text classification applications like sentiment analysis. Naive Bayes excels in spam identification and review analysis despite this simple assumption.

 

Loading and Preprocessing the Dataset

For this guide, I'll use movie reviews dataset to perform sentiment analysis. The first step of classification is dataset loading and text preprocessing to make it suitable for training model. For performing preprocessing, you've to remove stopwords and transform text into numerical format to clean data.

 

 

 

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

# Load dataset
data = pd.read_csv('IMDB Dataset.csv')

# Split data into training and testing sets
X = data['review']
y = data['sentiment'].map({'positive': 1, 'negative': 0})

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Vectorize text data (convert text to numerical data)
vectorizer = CountVectorizer(stop_words='english')
X_train_vect = vectorizer.fit_transform(X_train)
X_test_vect = vectorizer.transform(X_test)

 

In this code, I've splitted dataset in training and testing sets and used CountVectorizer to convert text reviews into numerical data so that it can be processed by Naive Bayes model.

 

Training the Naive Bayes Classifier

After you're done with loading and preprocessing step, now let's train the Naive Bayes classifier. Use MultinomialNB from scikit-learn, this is a variant of naive Bayes also used for text classification.

 

 

 

from sklearn.naive_bayes import MultinomialNB

# Initialize and train the Naive Bayes model
model = MultinomialNB()
model.fit(X_train_vect, y_train)

 

You can train MultinomialNB classifier using vectorized training data. This classifier learns which words are related to positive reviews and which are related to negative reviews.

 

Evaluating Model Performance

After training, now it's time to evaluate the model's performance by testing it on unseen data. 

 

 

 

from sklearn.metrics import accuracy_score, classification_report

# Make predictions on the test data
y_pred = model.predict(X_test_vect)

# Evaluate model performance
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

# Classification report
print(classification_report(y_test, y_pred))

 

Here I used the accuracy score to show the percentage of correct predictions, and the classification report to get the detailed breakdown of precision, recall, and F1-score, to see how well the model classifies the positive and negative reviews.

 

Conclusion 

In this Python guide, you've learnt how to use Naive Bayes for sentiment analysis. We classified movie reviews as positive or negative by preprocessing text data, training a MultinomialNB model, and testing it. Knowing sentiment analysis may help you understand client feedback and apply it to other datasets or text classification systems. Test Naive Bayes on your reviews dataset to see how it performs!

99 views

Please Login to create a Question