November 04, 2024

Implementing Linear Regression in Java: A Hands-On Approach

java

machine learning

linear regression

Noah Smith

@noah-smith

Share what you learn in this blog to prepare for your interview, create your forever-free profile now, and explore how to monetize your valuable knowledge.

Do you know how Netflix shows us suggestions for what you want to watch next or how Google knows how much your house is worth? Linear regression is a simple but strong tool for these kinds of predictions. I'll use a hands-on method to implement linear regression in Java in this blog post. This will help you understand its fundamental concepts and write its code from scratch.

Understanding Linear Regression

Linear regression is a way to use statistics to think about how a dependent variable and one or more independent variables are related. The main idea is to figure out which linear equation best explains this relationship. Let's express it in the formula y=mx+by = mx + by=mx+b, where:

yyy is the predicted value (dependent variable).
mmm is the slope of the line (coefficient of the independent variable).
xxx is the independent variable.
bbb is the y-intercept.

With linear regression you can minimize the gaps between the predicted and actual values. You can also consider certain assumptions like, linearity, independence of errors, homoscedasticity (constant variance), and normality of error terms to ensure your model's stability.

Setting Up the Environment

You must first set up the envirement before implementation. Install the Java Development Kit (JDK) and an Integrated Developemnt Environment (IDE) like IntelliJ IDEA or Eclipse. In the following simple linear regression code we'll use Java's built-in libraries. However, you can add Apache Commons Math to your project for complex mathematic functions.

Implementing Linear Regression in Java

Data Preparation

Let's start by preparing a simple dataset. We'll create two arrays representing our independent variable (X) and the dependent variable (Y). Here's a snippet of how to do that:

public class LinearRegression {
    public static void main(String[] args) {
        // Sample dataset
        double[] X = {1, 2, 3, 4, 5};
        double[] Y = {2, 3, 5, 7, 11};
    }
}

Creating the Linear Regression Model

Now, we'll implement the calculations needed to find the slope and intercept. First, we need to calculate the means of X and Y, then the covariance and variance.

Here's how you can do that in Java:

public class LinearRegression {
    public static void main(String[] args) {
        double[] X = {1, 2, 3, 4, 5};
        double[] Y = {2, 3, 5, 7, 11};
        
        double meanX = calculateMean(X);
        double meanY = calculateMean(Y);
        
        double slope = calculateSlope(X, Y, meanX, meanY);
        double intercept = meanY - (slope * meanX);
        
        System.out.println("Slope: " + slope);
        System.out.println("Intercept: " + intercept);
    }

    public static double calculateMean(double[] data) {
        double sum = 0;
        for (double num : data) {
            sum += num;
        }
        return sum / data.length;
    }

    public static double calculateSlope(double[] X, double[] Y, double meanX, double meanY) {
        double numerator = 0;
        double denominator = 0;
        for (int i = 0; i < X.length; i++) {
            numerator += (X[i] - meanX) * (Y[i] - meanY);
            denominator += Math.pow(X[i] - meanX, 2);
        }
        return numerator / denominator;
    }
}

Predicting New Values

Once we have the slope and intercept, we can predict new values. Let's add a method to make predictions based on our linear model:

public static double predict(double x, double slope, double intercept) {
    return slope * x + intercept;
}

// Example prediction
double predictedValue = predict(6, slope, intercept);
System.out.println("Predicted value for X=6: " + predictedValue);

Testing the Model

We've set up the environment, created the model, predicted the value, and now it is time to test the model. So, to test the model's performance, we need to calculate the Mean Squared Error (MSE) on our test dataset. MSE helps us know how close our predicted values are to the actual values.

public static double calculateMSE(double[] Y, double[] predictions) {
    double sum = 0;
    for (int i = 0; i < Y.length; i++) {
        sum += Math.pow(Y[i] - predictions[i], 2);
    }
    return sum / Y.length;
}

// Example usage
double[] predictions = {predict(X[0], slope, intercept), predict(X[1], slope, intercept), 
                        predict(X[2], slope, intercept), predict(X[3], slope, intercept), 
                        predict(X[4], slope, intercept)};
double mse = calculateMSE(Y, predictions);
System.out.println("Mean Squared Error: " + mse);

Conclusion

I've ended up covering all the basics of linear regression in Java. This will let's you investigate machine learning techniques and apply them to more complicated datasets. So, why not try new datasets or better regression techniques from today?

518 views

Please Login to create a Question

Posts

Questions

Blogs

Jobs