Researchers from CESNET and CTU FIT Released the Largest Dataset for Threat Detection and Network Traffic Prediction
New dataset from a real academic environment includes over 800,000 time series, enabling advanced cybersecurity research and testing of AI models
Prague, 31 July 2025 – A research team from the Administration and Security Tools Department at CESNET and the Faculty of Information Technology of the Czech Technical University in Prague (CTU FIT) has released the most comprehensive dataset of its kind. The new dataset from a real academic network contains over 800,000 time series capturing anonymized network traffic of personal computers, servers, routers, or entire institutions. Therefore, it is the most realistic and comprehensive publicly available dataset for utilizing artificial intelligence in network traffic prediction, anomaly detection, and network management.
Anomaly Detection in Everyday Life and Network Traffic – and Why It Matters
Anomaly detection is present in our daily life, often without us realizing it – whether it is a suspicious payment from another country, an unusual amount flagged by a banking system, irregularities in health data recorded by a smartwatch, or a sudden change in online shopping behavior that may indicate account abuse. In all these cases, anomaly detection identifies deviations from normal behavior that may signal risk. The same principles are utilized in cybersecurity, where anomalies in network traffic often indicate threats, errors, or critical changes in device behavior.
Anomaly detection plays a crucial role in network administration and security. Modern infrastructure attacks, such as Distributed Denial of Service (DDoS) attacks, malware spread, or exploitation of compromised devices, often hide in normal traffic and bypass traditional detection rules.
“Anomaly detection allows us to identify previously unknown threats by revealing changes in device communication behavior,” explains Josef Koumar, lead author of the dataset. “Anomalies can also reveal misconfigurations, device overloads, or other operational issues,” adds Koumar. Accurate detection of deviations in a short time significantly contributes to the resilience and reliability of digital infrastructure.
Largest Real-World Dataset of Its Kind Enables More Advanced AI-Based Threat Detection
The team of researchers from CESNET and the Faculty of Information Technology of the Czech Technical University in Prague – Josef Koumar, Karel Hynek, Tomáš Čejka, and Pavel Šiška – published the dataset in the prestigious journal Nature Scientific Data. It contains over 800,000 time series aggregated from real, anonymized network traffic across devices, networks, and institutions on the backbone lines of the Czech national academic network operated by CESNET.
Unlike commonly used datasets created in laboratory conditions previously available to the research community, this dataset captures large-scale and diverse traffic from a real-world network. This unprecedented release significantly advances research capabilities in network security and management. It supports the development of highly accurate AI for anomaly detection and enables robust, comprehensive testing under real-world conditions. As a result, detection outcomes—such as identifying DDoS attacks or suspicious behavior of infected devices—are much more reliable.
The dataset’s impact is further strengthened by the release of the open-source library CESNET TS-Zoo, which simplifies working with the dataset and facilitates sharing of methodology through benchmarks. This combination of a realistic dataset and an open-source tool enhances the transparency of research methods and the reproducibility of experiments—resulting in higher-quality and verifiable outcomes across the research ecosystem.
“Our goal was to provide the community with a realistic dataset for developing and testing algorithms that can protect networks even in an era where most network traffic is encrypted,” says Josef Koumar, the lead author of the dataset. “The dataset enables better detection of unknown threats because it is based on a real, complex environment. This makes the results more trustworthy than results on existing datasets. We hope it contributes to developing more secure and smarter infrastructure, not only in academia,” he concludes.
Details about the dataset:
https://www.nature.com/articles/s41597-025-04603-x
Open-source library for working with the dataset:
https://github.com/CESNET/cesnet-tszoo/tree/main
https://cesnet.github.io/cesnet-tszoo/
CESNET Association, founded in 1996 by Czech universities and the Academy of Sciences of the Czech Republic, provides advanced IT services for science, research, innovation, and education. It operates and develops the national academic computer network, ensures secure access to a portfolio of services, and offers an environment for high-performance computing, data storage, and communication tools for individuals and teams.
In addition to universities, CESNET’s services are used by students, academic staff, research organizations, researchers, public sector institutions, and non-profit organizations.
Research and development in the field of information and communication technologies are an integral part of CESNET’s activities. The association is also an active partner in international research infrastructures, such as the pan-European GÉANT network, the European Grid Infrastructure (EGI.eu), and the European Open Science Cloud (EOSC).