Network security has become increasingly critical due to the growing frequency and sophistication of cyber-attacks. To enhance intrusion detection capabilities, emerging solutions leverage machine learning (ML) models.
However, a significant challenge persists in acquiring sufficient high-quality training data. To overcome this challenge, we explored the use of generative neural networks to create realistic and synthetic network traffic data.
This paper introduces \textit{NetGlyphizer}, a novel approach to learn a discrete representation of network traffic using Vector Quantized-Variational Autoencoders (VQ-VAE). This model transforms network flows into a sequence of discrete tokens, referred to as \textit{NetGlyphs}. This method enables the use of Transformer models to generate new \textit{NetGlyphs} sequences, which can be decoded into real network traffic. The efficacy of this approach is evaluated using a dataset comprising both benign and malicious traffic flows.
In comparison to a method employing a continuous representation, our model exhibits superior performances in accurately reconstructing the data and preserving the original distribution. Additionally, conditional generation facilitates the generation of labeled network traffic flows based on the specific network traffic class. The results demonstrate that our approach effectively preserves protocol compliances and usages, making it a promising solution for labeled network traffic generation.