RESSI 2025 : Rendez-vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information

RESSI 2025 : Rendez-vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information

21-23 mai 2025 Domaine de l'Orangerie à Lanniron (Bretagne - France)

Un évènement soutenu par

sciencesconf.org:ressi2025:615165

NetGlyphizer: Labeled Network Traffic Generation Using Representation Learning and Transformers

Gabin Noblet 1, 2, @ , Cédric Lefebvre 3, @ , Philippe Owezarski 4, @ , William Ritchie 3, @

1 : LAAS-CNRS

CNRS, CNRS : UPR8001

2 : Custocy

Custocy

3 : Custocy

Custocy

4 : LAAS-CNRS

Centre National de la Recherche Scientifique

Network security has become increasingly critical due to the growing frequency and sophistication of cyber-attacks. To enhance intrusion detection capabilities, emerging solutions leverage machine learning (ML) models.
However, a significant challenge persists in acquiring sufficient high-quality training data. To overcome this challenge, we explored the use of generative neural networks to create realistic and synthetic network traffic data.
This paper introduces \textit{NetGlyphizer}, a novel approach to learn a discrete representation of network traffic using Vector Quantized-Variational Autoencoders (VQ-VAE). This model transforms network flows into a sequence of discrete tokens, referred to as \textit{NetGlyphs}. This method enables the use of Transformer models to generate new \textit{NetGlyphs} sequences, which can be decoded into real network traffic. The efficacy of this approach is evaluated using a dataset comprising both benign and malicious traffic flows.
In comparison to a method employing a continuous representation, our model exhibits superior performances in accurately reconstructing the data and preserving the original distribution. Additionally, conditional generation facilitates the generation of labeled network traffic flows based on the specific network traffic class. The results demonstrate that our approach effectively preserves protocol compliances and usages, making it a promising solution for labeled network traffic generation.

Type :	:	RESSI 2025
Thématiques	:	Thèse

Vie privée | Accessibilité