November 21, 2024
Detecting Anomalies in Network Traffic Using Autoencoders
Want to use deep learning to detect anomalies in network traffic? For this job, autoencoders work very well. In this post, I will show you how to use an autoencoder in Python to detect unusual activity on the network.
What are Autoencoders?
Autoencoders are artificial neural networks that compress and rebuild input data to develop efficient outputs. The network has an encoder that compresses input data into a lower-dimensional representation (bottleneck), a decoder that recreates the input, and the bottleneck.
An anomaly detecting autoencoder is trained on typical data patterns, thus reconstruction error increases when it meets an anomaly. This makes autoencoders useful for anomaly detection. For training, the unsupervised model uses regular traffic instead of anomalous data.
Why Use Autoencoders for Network Traffic Anomaly Detection?
Network traffic has predictable and regular patterns, making autoencoders excellent for anomaly detection. Train an autoencoder to learn "normal" patterns. During an assault or attack, the model will miss reconstruct abnormal traffic, resulting in a large reconstruction error.
Autoencoders can identify known and unknown anomalies even zero-day threats, without labeled training data. This makes them useful for network security, as new threats arise.
Step-by-Step Guide to Detecting Anomalies Using Autoencoders
Step 1: Data Collection
Start off with any network traffic dataset. This example uses the NSL-KDD dataset, which includes normal and abnormal network traffic.
Step 2: Data Preprocessing
Data preprocessing must be performed before training your model. This involves normalizing and dividing data into training and testing sets.
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
# Load dataset
data = pd.read_csv('KDDTrain+.txt', header=None)
# Normalize the data
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data)
# Split data into train and test sets
X_train, X_test = train_test_split(data_scaled, test_size=0.2, random_state=42)
print(X_train.shape, X_test.shape)
Step 3: Building the Autoencoder Model
There are two parts to the autoencoder: an encoder and a decoder. This is how to use Keras to build it.
from keras.models import Model
from keras.layers import Input, Dense
# Input layer
input_dim = X_train.shape[1]
input_layer = Input(shape=(input_dim,))
# Encoder layers
encoded = Dense(64, activation='relu')(input_layer)
encoded = Dense(32, activation='relu')(encoded)
encoded = Dense(16, activation='relu')(encoded)
# Decoder layers
decoded = Dense(32, activation='relu')(encoded)
decoded = Dense(64, activation='relu')(decoded)
decoded = Dense(input_dim, activation='sigmoid')(decoded)
# Autoencoder model
autoencoder = Model(inputs=input_layer, outputs=decoded)
autoencoder.compile(optimizer='adam', loss='mean_squared_error')
print(autoencoder.summary())
Step 4: Training the Autoencoder
Use the normal network traffic data to train the model how to reconstruct normal patterns.
# Train the autoencoder on the training data
history = autoencoder.fit(X_train, X_train, epochs=50, batch_size=256, validation_data=(X_test, X_test), shuffle=True)
Step 5: Detecting Anomalies
Calculate test data reconstruction error after training. Errors are anomalies if they exceed a threshold.
import numpy as np
# Predict the test set using the trained autoencoder
predictions = autoencoder.predict(X_test)
# Calculate reconstruction error
reconstruction_error = np.mean(np.abs(X_test - predictions), axis=1)
# Set threshold for anomaly detection
threshold = 0.02
# Detect anomalies based on reconstruction error
anomalies = reconstruction_error > threshold
print(f"Detected {np.sum(anomalies)} anomalies out of {len(anomalies)} samples")
97 views