<?xml version="1.0" encoding="iso-8859-2"?>
<zprava jazyk="en" cislo="33/2006">
<nazev>Traffic Scanner</nazev>
<autor>Petr Kobierský, Jan Kořenek, Andrej Hank</autor>
<datum>30.11.2006</datum>

<h1>Abstract</h1>
<p>Intrusion detection system is an integrated software/hardware tool
capable of detecting unauthorised access to computer systems and
malicious network traffic such as viruses, trojan horses and
worms. This technical report presents the system architecture of
the Traffic Scanner which is a hardware accelerated IDS based on
Field-Programmable Gate Arrays (FPGAs). The designed system
supports rules described in Snort-compatible format and can be
configured using a web interface. System uses an architecture
based on non-deterministic finite automaton for fast pattern
matching. Using this approach, throughput up to 3.2~Gbps is
achieved on the COMBO6X card for all rules from Snort database.</p>

<p><b>Keywords:</b> Snort, acceleration, FPGA</p>

<h1 id="introduction">Introduction</h1>
<p>Network intrusion detection system (NIDS) is an integrated tool capable of
detecting intrusions or malicious traffic. NIDS can analyse data packets
that travel over the actual network and examine packets to verify their
purpose (malicious or benign). Software, or appliance hardware in some cases,
resides in one or more systems connected to a network, and is used to
analyse data such as network packets.</p>
<p>Snort is an open source network intrusion detection system utilizing a
rule-driven language, which combines the benefits of signature, protocol
and anomaly based inspection methods. The Snort database contains thousands
of rules which describe most of the known viruses and attacks. In Snort,
more than 80% of the CPU time is consumed by the string matching task
<cite href="Perf"/>. As network traffic speeds increase faster then PC performance,
PC-based solutions can not continue to process all traffic.</p>
<p>The system performance can be increased by moving Snort time-critical paths
from software to hardware. The technical report describes NIDS architecture
which consist of COMBO6X acceleration card and Snort running on host PC.
The acceleration card removes packets from incoming traffic and sends only
malicious packets to Snort.</p>
<p>The COMBO6X card contains powerful FPGA Virtex-II Pro. Using FPGAs, large
ruleset can be matched at multigigabit speed <cite href="Clark"/>, <cite href="Baker"/>. The proposed
architecture uses core generator to transform Snort rules to a circuit
described in VHDL language and matching all patterns from ruleset.</p>
<p>In the third chapter, an architecture of the whole system is described and
mapping of the Snort rules to FPGA configuration is shown. The fourth
chapter contains experimental results and the fifth chapter is about
Traffic scanner configuration interface. Finally the characteristics of
created system are conclude.</p>

<h1 id="system-architecture">System Architecture</h1>
<p>System architecture consists of several layers shown on the next picture.
Physical layer is represented by COMBO6X and SFPRO cards equipped with FPGA
chips suitable for proccessing network traffic. Behaviour of hardware cards
is determined by a firmware layer which is responsible for packet processing.
Malicious packets are separated from regular traffic at firmware layer. For
this purpose software described in firmware paragraph is used.</p>
<p>Next layers include drivers that provides access to physical layer for
operating system. Thanks to them COMBO6X card act as ordinary NIC card in
operating system. Card is used as standard network interface which ensures
transparency for using Snort IDS software. All basic software operations on
COMBO6X card, especially input/output operations within registers and
memory, are realized through libcombo and libcompat libraries designed for
COMBO6 family cards.</p>
<p>For managing and debugging purposes an 'ids_ctl' tool was created. It serves
mainly higher layers for probe initialization and firmware loading. It can
also be used for debugging purposes by advanced users. The user interface
layer is represented by 'ids' and 'ids_lkm' scripts which are used for
standard Traffic Scanner control. 'ids_lkm' is responsible for loading
kernel modules and 'ids' for initialization and more convenient firmware
loading. A simple GUI layer which simplify these operations has also been
created.</p>

<obr src="layers">Traffic scanner system layers</obr>
<p>Traffic Scanner supports two ways of plugging it into network topology.
Four 1~Gbps span ports monitoring or an in-line mode can be used. In-line
connection works as a T-splitter and allows duplex line. Both methods are
outlined in Figure~2.</p>

<obr src="network_placement">Traffic scanner network placement</obr>

<h2 id="traffic-scanner-platform">Traffic Scanner platform</h2>
<p>Target platform for Traffic Scanner is the COMBO6X card which was developed in
the scope of the <a href="http://www.liberouter.org">Liberouter project</a>. It is an universal platform that
can be used in several applications. Card is equipped with 3 SSRAMS, TCAM,
DRAM, PCI communication core and FPGA which can be configured for specific
application. Thanks to an extension connector, the COMBO6X card can be used
with several add-on cards with different network interfaces. Currently only
the SFPRO add-on card with SFP cages is supported. Both card cards contain
a powerful FPGA Virtex II Pro chip with many resources which are needed to
support thousands of IDS rules. New generation of COMBO cards will have more
powerful FPGAs and will be able to support many more IDS rules and other
IDS functionality like DOS detecting, TCP stream reassembly and filtering.</p>

<obr src="c6x-sfpro">Traffic scanner on COMBO6x and SFPRO add-on card</obr>

<h2 id="traffic-scanner-firmware">Traffic Scanner firmware</h2>
<p>Both FPGAs on COMBO6X platform must be configured to realize specified
function. FPGAs are configured by loading firmware into FPGA
configuration memory (SSRAM). This firmware can be changed anytime
without replacing whole platform.</p>
<p>Traffic Scanner firmware architecture is composed of classification unit,
pattern match unit and several other units which are chained for pipelined
packet processing. Incoming packets from four network interfaces are
entering the <i>Input Buffer (IBUF)</i> block. For each incoming packet the CRC
and packet length is checked and only valid packets pass through. Valid
packets are processed by <i>Header Field Extractor (HFE)</i> which is assigned
to each network interface. HFE extracts IPv4 header fields like source and
destination address, ports, TCP Flags and much more. These fields can be
used in Traffic scanner rules for detecting suspicious traffic. HFE also
marks the start of packet payload where pattern matching is performed.</p>

<obr src="hw-arch">Traffic scanner firmware</obr>
<p>The core of a Traffic Scanner consists of a <i>Classification Unit</i> and <i>Pattern
Match Unit</i> which are configured for accelerating specified Snort ruleset.
Classification unit compare extracted header fields to stored IDS rules.
The main purpose of this unit is to restrict pattern matching~-- some
patterns are searched only in specific flows not in every one. Depending on
Classification unit results, specified patterns are searched in packet
payload. Currently only string searching is supported but pattern match
approach can also be used for regular expression matching; this will be
supported in later Traffic Scanner versions. Whole packet is stored in
<i>Software Output Buffer (SWOBUF)</i> and depending on a packet analysis result
is either dropped or exported via PC bus for software processing.</p>

<h2 id="pattern-match-unit">Pattern match unit</h2>
<p>From several pattern match approaches the NFA hardware generation
was chosen as a most promising. This approach offers high throughput with
possibility of searching both strings and regular expressions. High
throughput compared to software solutions can be achieved thanks to
simultaneous search of all strings in contents. Process of mapping
patterns to hardware is composed of several phases which will be described
below.</p>

<blockquote>
<ul>
<li>
<p>Converting patterns to NFA</p></li>
<li>
<p>NFA optimizations</p></li>
<li>
<p>Converting NFA to Extended NFA</p></li>
<li>
<p>Mapping Extended NFA to FPGA logic</p></li></ul></blockquote>

<obr src="nfa">NFA before optimalization</obr>

<obr src="opt_nfa">NFA after optimalization</obr>

<obr src="enfa">Extended NFA</obr>
<p>Patterns extracted from IDS rules are translated using generally known
algorithms to NFA representation. This NFA can be further optimized.
Optimization algorithms put focus on state minimization which is performed
mainly through prefix sharing. Classical NFA accepts a char (8 bits) in
each transitions. If we map this NFA to hardware we are able to achieve
throughput up to 800 Mb/s with firmware running at 100 MHz clock frequency.
The throughput can be increased by using Extended NFA (ENFA) that accept
multiple chars in one clock cycle. In this case, throughput between 1 and
10~Gbps can be reached. Finally, ENFA is mapped into FPGA logic. Flip-flop
is generated for every state and every transition is coded into AND-OR logic
gates as show next figure. The number of FPGA gates used depends on number
of patterns and their lengths. This logic utilization is also linearly
increasing with number of chars which are accepted per clock cycle.</p>

<obr src="nfa_hw">FSM hardware mapping</obr>

<h1 id="experimental-results">Experimental Results</h1>
<p>Increasing system throughput affects requirements for chip capacity. This
influence can be reduced by several optimizations focused on Snort patterns.
Thanks to these optimizations all rules from default Snort installation can
fit into FPGA logic with theoretical throughput 6.4~Gbps. Relation
between used FPGA logic and maximal throughput for Snort rules and randomly
generated strings can be seen on the next graph.</p>

<obr src="test1">FPGA logic utilization for all patterns from Snort database and for
randomly generated patterns</obr>
<p>Further throughput tests over whole system were done. The aim of this test
was to find out a real throughput of proposed architecture. A Spirent AX4000
network traffic generator which can generate 2~Gbps traffic with different
datagram size distribution was used for testing. During the test Traffic
Scanner was able to process whole 2~Gbps network traffic for short datagrams
as well as long datagrams. Throughput 2~Gbps is not a Traffic Scanner limit.
Probe is able to archive throughput about 3.2~Gbps and there is one important
note: this throughput does not depend on the number of IDS rules. This
statement is not valid for software based IDS system where throughput rapidly
decreases with increasing number of rules. Results of throughput test can be
seen on the following graph. Measured throughput is exactly same as a theoretical
so Traffic Scanner is able to process packets on full network line speed.</p>

<obr src="test2">Theoretical throughput and throughput of proposed IDS</obr>
<p>The last test focuses on comparing Traffic Scanner accelerated Snort and
classical non-accelerated solution. Both probes were configured for
detecting virus and worms threats. IDS sensors monitored the same 1~Gbps
network span port from Masaryk University network for 24 hours. During this test,
both solutions detected the same number and type of malicious packets.
However Traffic Scanner filtered out 99.98 percent of non-malicious traffic
so only 0.02 percent of all traffic had to be processed by Snort as as shown
on the next graph. This pre-filtration is very useful because Snort and
other software based IDS solutions cannot process more than 100--300~Mbps.</p>

<obr src="comparing_graph">Comparison graph between classical Snort and Traffic Scanner
Accelerated Snort</obr>

<h1 id="configuration-interface">Configuration Interface</h1>
<p>Many rules can fit into the FPGA if core generator is used to generate and
optimize the Pattern Match unit but the ruleset cannot be changed simply.
If user wants to change the ruleset, new Pattern Match unit has to be
generated and new FPGA configuration bitstream has to be created by
synthesis tools. As users can't have all Traffic Scanner source codes and
licenses to hardware synthesis tools, an issue of firmware distribution has
to be solved.</p>
<p>One solution is to distribute prepared firmware versions which accelerate
different types of rulesets (exploits, virus, p2p detecting, company
network policy infringement etc.). After a new security threads is discovered
and rule is written, firmware is generated and then distributed
worldwide to Traffic Scanner sensors in similar way like anti-virus vendors
distribute their updates.</p>
<p>Many system administrators need to create their custom IDS configuration
for their specific network needs. For this reason a configuration web
interface was created. Philosophy of this interface is clear. Traffic
Scanner users can upload their custom ruleset to server and then start
firmware generation process. After a while user receives a email
notification with download link to custom firmware. This firmware can be
easily loaded into COMBO6X platform using Traffic Scanner tools.</p>

<obr src="webids">Traffic scanner Web interface</obr>
<p>Several compilation parameters can be set by user.</p>

<blockquote>
<ul>
<li>
<p>Path to Snort rules</p></li>
<li>
<p>String length truncation</p></li>
<li>
<p>Target throughput</p></li>
<li>
<p>Email address</p></li></ul></blockquote>
<p>The parameters influence the number of rules which can fit into FPGA.
If higher throughput is needed, number of supported rules decreases.
One possible way how to reach higher throughput with the same number of
rules is truncation of search patterns. However this optimization can
influence the number of packets exported to host computer, because
very short patterns can be found in many packets. Currently design
generation support three throughput rates~-- 800~Mbps, 1.6~Gbps and
3.2~Gbps.</p>

<h1 id="conclusion">Conclusion</h1>
<p>This technical report describes an architecture and configuration of an IDS
system based on the COMBO6X card. The proposed architecture uses core generator
to transform Snort ruleset to optimized circuit described in VHDL and FPGA
configuration. Using NFA approach for fast pattern matching, throughput
up to 3.2~Gbps is achieved for all rules from the Snort database.</p>
<p>The core generator produces a VHDL description which is highly optimised for
an input ruleset but it has to be run whenever the ruleset is changed. To
simplify work with Traffic scanner, a web configuration interface was
implemented. Using the configuration interface, any rules in Snort format
can be transformed to FPGA configuration and downloaded to FPGA on COMBO6X
card.</p>
<p>The performance of Traffic scanner was evaluated by Spirent AX4000 which
can generate 2~Gbps traffic with different datagram size distribution.
All generated traffic was processed by the Traffic scanner without losing any
packet. Measurement on real network was also performed. Using the
Traffic scanner, 99.98 percent of a non-malicious traffic were pre-filtered
on COMBO6X card and only 0.02 percent had to be processed by Snort on a
host computer.</p>

<seznamknih>
<kniha id="Web">Traffic scanner web pages: <a href="http://www.liberouter.org/ids.php">http://www.liberouter.org/ids.php</a>, 2006</kniha>
<kniha id="Perf">Markatos, Antonatos, Polychronakis and Anagnostakis. Exclusion-based signature matching for intrusion detection. In: <i>IASTED International Conference on Communication and Computer Network (CCN02)</i>, 2002.</kniha>
<kniha id="Clark">Clark and Schimmel. Efficient Reconfigurable Logic Circuits
for Matching Complex Network Intrusion Detection Patterns. In: Proc. <i>Field
    Programmable Logic and Applications</i>, 13th International Conference,
    Lisbon, Portugal, 2003, p.~956--959</kniha>
<kniha id="Baker">Zachary K. Baker and Viktor K. Prasanna. Time and Area
Efficient Pattern Matching on FPGAs. In: Proc. <i>12th ACM/SIGDA international
symposium on Field programmable gate arrays</i>, New York: ACM Press, 2004, p.~223--232.</kniha>
</seznamknih>
</zprava>
