November 11, 2024
Linear Regression in Python: Predicting House Prices
Are you curious how you can predict the prices of a house based on its features? Such as how many rooms it has or where it is located? Here comes linear regression. Linear regression excels in predicting continuous data by find relationship between its independent and dependent variables ; like, predicting house prices.
So stay tuned with me. In this article, I'll tell you how you can build a linear regression model for predicting house prices, using Python and scikit-learn.
Setting Up the Environment
You get the brief overview of what linear regression is, right? So, before building the model, let's first set up the environment. For this you've to install this library:
pip install scikit-learn pandas numpy matplotlib
Loading and Understanding the Dataset
You've set up the development environment. Now, let's load the dataset and quickly look at the target variable and features:
import pandas as pd
from sklearn.datasets import load_boston
# Load dataset
boston = load_boston()
data = pd.DataFrame(boston.data, columns=boston.feature_names)
data['PRICE'] = boston.target
# Display first 5 rows
print(data.head())
For this tutorial, I've used Boston Housing Dataset. This dataset includes the features like; RM (room count), LSTAT (lower-status population percentage), and DIS (employment center distance), and our target predictable variable is Price.
Building the Linear Regression Model
So, now you can start building the linear regression model as you are familiar with the dataset. But first, you've to start separating the data into training and testing sets. The training set is where you'll train the model.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Features (X) and target (y)
X = data.drop('PRICE', axis=1)
y = data['PRICE']
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the model
model = LinearRegression()
model.fit(X_train, y_train)
In the above code, we've split this dataset into 80% training data and 20% testing data. Now, you'll train your model based on this 80% training data, to decide what to do.
Evaluating the Model's Performance
After training the model, it is time to put it to the test and see how well it does, means how much accurate your model is. We will use R-squared and Mean Squared Error (MSE) to check how accurate the predictions are.
from sklearn.metrics import mean_squared_error, r2_score
# Predict on test set
y_pred = model.predict(X_test)
# Calculate performance metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')
The Mean Squared Error reveals how far our predictions are wrong, while R-squared shows how well the model matches the data.
Conclusion
So, I've discussed above, how to create a scikit-learn linear regression model to predict housing prices. After partitioning the data into training and testing sets, training the model, and assessing its performance, you can use regression models in Python. Now, it's time to explore linear regression's capabilities by yourself by experimenting with various characteristics or datasets.
186 views