November 21, 2024

Detecting Anomalies in Network Traffic Using Autoencoders

python

machine learning

network traffic

anomalies

Ethan Kim

@ethan-kim

Share what you learn in this blog to prepare for your interview, create your forever-free profile now, and explore how to monetize your valuable knowledge.

Want to use deep learning to detect anomalies in network traffic? For this job, autoencoders work very well. In this post, I will show you how to use an autoencoder in Python to detect unusual activity on the network.

What are Autoencoders?

Autoencoders are artificial neural networks that compress and rebuild input data to develop efficient outputs. The network has an encoder that compresses input data into a lower-dimensional representation (bottleneck), a decoder that recreates the input, and the bottleneck.

An anomaly detecting autoencoder is trained on typical data patterns, thus reconstruction error increases when it meets an anomaly. This makes autoencoders useful for anomaly detection. For training, the unsupervised model uses regular traffic instead of anomalous data.

Why Use Autoencoders for Network Traffic Anomaly Detection?

Network traffic has predictable and regular patterns, making autoencoders excellent for anomaly detection. Train an autoencoder to learn "normal" patterns. During an assault or attack, the model will miss reconstruct abnormal traffic, resulting in a large reconstruction error.

Autoencoders can identify known and unknown anomalies even zero-day threats, without labeled training data. This makes them useful for network security, as new threats arise.

Step-by-Step Guide to Detecting Anomalies Using Autoencoders

Step 1: Data Collection

Start off with any network traffic dataset. This example uses the NSL-KDD dataset, which includes normal and abnormal network traffic.

Step 2: Data Preprocessing

Data preprocessing must be performed before training your model. This involves normalizing and dividing data into training and testing sets.

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

# Load dataset
data = pd.read_csv('KDDTrain+.txt', header=None)

# Normalize the data
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data)

# Split data into train and test sets
X_train, X_test = train_test_split(data_scaled, test_size=0.2, random_state=42)
print(X_train.shape, X_test.shape)

Step 3: Building the Autoencoder Model

There are two parts to the autoencoder: an encoder and a decoder. This is how to use Keras to build it.

from keras.models import Model
from keras.layers import Input, Dense

# Input layer
input_dim = X_train.shape[1]
input_layer = Input(shape=(input_dim,))

# Encoder layers
encoded = Dense(64, activation='relu')(input_layer)
encoded = Dense(32, activation='relu')(encoded)
encoded = Dense(16, activation='relu')(encoded)

# Decoder layers
decoded = Dense(32, activation='relu')(encoded)
decoded = Dense(64, activation='relu')(decoded)
decoded = Dense(input_dim, activation='sigmoid')(decoded)

# Autoencoder model
autoencoder = Model(inputs=input_layer, outputs=decoded)
autoencoder.compile(optimizer='adam', loss='mean_squared_error')

print(autoencoder.summary())

Step 4: Training the Autoencoder

Use the normal network traffic data to train the model how to reconstruct normal patterns.

# Train the autoencoder on the training data
history = autoencoder.fit(X_train, X_train, epochs=50, batch_size=256, validation_data=(X_test, X_test), shuffle=True)

Step 5: Detecting Anomalies

Calculate test data reconstruction error after training. Errors are anomalies if they exceed a threshold.

import numpy as np

# Predict the test set using the trained autoencoder
predictions = autoencoder.predict(X_test)

# Calculate reconstruction error
reconstruction_error = np.mean(np.abs(X_test - predictions), axis=1)

# Set threshold for anomaly detection
threshold = 0.02

# Detect anomalies based on reconstruction error
anomalies = reconstruction_error > threshold
print(f"Detected {np.sum(anomalies)} anomalies out of {len(anomalies)} samples")

335 views

Please Login to create a Question

Posts

Questions

Blogs

Jobs