HAMOC – Hardware-Accelerated Monitoring Center
CESNET technical report 9/2010
Pavel Čeleda, Radek Krejčí, Jan Barienčík, Martin Elich, Vojtěch Krmíček
CESNET, z.s.p.o.
Received 7. 12. 2010
Abstract
This technical report describes the Hardware-Accelerated Monitoring Center (HAMOC) platform based on COMBOv2 card family. In our research effort we concentrate on how to use hardware acceleration with already available and well-known monitoring applications. A set of network monitoring tools was tuned and tested with COMBOv2 hardware adaptors to be able to proceed 10 Gb/s traffic at line rate. The HAMOC performance is evaluated and typical deployment use cases are shown.
Keywords: HAMOC, HANIC, COMBOv2, FPGA, PCAP, NetFlow, DPI.
1 Introduction
Hardware acceleration is an essential part of monitoring tools in high-speed networks or environments where we need to guarantee functionally in worst-case scenarios like DDoS attacks. On the other hand any hardware or software acceleration mechanisms often need to use special operating system versions or monitoring tools hacks. Such approach distract most network administrators and researches to use hardware acceleration.
In our research effort we concentrate on how to use hardware acceleration in a user friendly and easy manner. To avoid any non-standard tools and extensions we modified our hardware acceleration framework to be compatible with most of current network monitoring applications. We added support to PCAP library to receive and send packets with COMBOv2 card family. To help users with hardware acceleration deployment a set of use-cases describe how to use HAMOC platform.
2 System Architecture
The HAMOC platform shown in Figure 1 takes advantage of cooperation between software tools and hardware-accelerated network interface cards. We use our own hardware-accelerated COMBO cards developed by the Liberouter project. The COMBO hardware accelerators use the Field-Programmable Gate Array (FPGA) technology that provides sufficient computing power. This approach enables e.g., smart load balancing effectively using full power of all CPU cores. Also performing another time critical operations in hardware is quite profitable. The main field of HAMOC applications is in the area of network monitoring.
![[Image]](hamoc_architecture.png)
Figure 1. HAMOC System Architecture.
Network monitoring can be performed either by generic network devices such as routers or switches, specialized hardware appliances (firewalls, IDS/IPS systems) or by commodity PC based probes with appropriate software tools. Network monitoring extension in generic network devices is limited by computing power mainly intended for the device main purpose (e.g., packet forwarding). In time of attack, when any information about network traffic is important and can be critical, there is no power for network monitoring. On the contrary, specialized hardware appliances have sufficient computing power to work under attack but high price and a lack of flexibility discourage their deployment. Commodity PC based probes have advantage in their flexibility and low cost. But they have performance issue in high-speed networks [1].
The HAMOC is based on commodity PC platform. The lack of computing power for high-speed network applications is solved by COMBO hardware accelerator performing time critical operations. Used FPGA technology enables flexible firmware changes according to specific demands in particular tasks. This feature is further supported thanks to general platform for rapid development of network applications (NetCOPE) [2].
The evolution of the COMBO card firmware lead to the implementation of the HAMOC currently crucial advantage – hardware-accelerated distribution of network traffic among many processor cores [4]. This feature enables load balancing using all available CPU cores. This approach significantly increases computing power of the HAMOC probe on the high-speed networks.
3 Hardware-Accelerated Network Interface Card
The main advantage and the heart of the HAMOC platform is a COMBO hardware accelerator. This section describes hardware cards used to accelerate HAMOC applications. Used firmware variants with their specifics are also described.
3.1 HAMOC Hardware
COMBO hardware accelerator is actually a “sandwich” consisting of PCI Express x8 mother card with connected interface card. The HAMOC platform uses the second generation of the COMBO cards developed by the Liberouter project – the COMBOv2 card family. The COMBO-LXT [3] is COMBOv2 family mother card equipped with the XILINX Virtex5 XC5VLX155T, QDRII RAM and socket for DDR2 SODIMM memory.
The main type of interface card used across all HAMOC applications is two-port 10 GbE interface card COMBOI-10G2. But specific applications also support four-port 1 GbE COMBOI-1G4 and new four-port 10 GbE COMBOI-10G4TXT interface cards.
![[Image]](combov2-10g2.jpg)
Figure 2. HAMOC's main hardware platform – COMBO-LXT155 mother card with COMBOI-10G2 interface card intended for 2x10 Gb/s networks.
![[Image]](combov2-10g4.jpg)
Figure 3. COMBO-LXT155 mother card with COMBOI-10G4TXT interface card intended for 4x10 Gb/s networks.
3.2 Connecting HAMOC Box Into the Network
There are three possibilities how to insert the HAMOC box into the network – you can connect it at a mirror port of network device (e.g., router), utilize a network TAP (Test Access Port) or insert it into a line as a repeater.
- Connecting at a Mirror Port
- The simplest way to connect HAMOC box is mirroring traffic from your router via a mirror (also known as SPAN) port.
- Connecting via a Network TAP
- Another way how to connect the HAMOC box in your network is to utilize a network TAP e.g., an optical splitter, see Figure 4.
- Inserting HAMOC Box in a Line
- In this case the HAMOC works as a T-splitter, see Figure 5: when inserted into a network link, the traffic is passed directly to the original destination and a separate copy of link data is processed by the probe in parallel. From the network perspective, the probe can be classified as a repeater that is invisible at both the network and link layer.
![[Image]](conn-tap.png)
Figure 4. HAMOC box connected via a network TAP.
![[Image]](conn-inline.png)
Figure 5. HAMOC box inserted in the network as a repeater.
3.3 HAMOC Firmware
Depending on a specific purpose and usage patterns, one of the set of available HAMOC firmware designs may be used. All designs are based on the NetCOPE platform that provides common core units for the specific firmware. The NetCOPE development kit for rapid development significantly simplifies development of new applications on COMBO cards.
NetCOPE firmware provides an abstraction layer that contains common modules required in networking applications firmware (e.g., Network communication, PCI/PCI Express communication, etc.). Developers are then supposed to simply implement and add own modules, translate the design and load the firmware into the card. The basic example is the NIC (Network Interface Card) firmware see Figure 6.
![[Image]](hanic_fw_arch.png)
Figure 6. NIC firmware architecture – basic NetCOPE firmware instantiation.
There are three currently used HAMOC firmwares based on NetCOPE platform:
- NIFIC – NIC with hardware packet filter (Section 3.3.1)
- HANIC – Hardware-Accelerated Network Interface Card (Section 3.3.2)
- Flexible FlowMon – NetFlow/IPFIX probe (Section 3.3.3)
3.3.1 NIFIC
NIC with hardware packet filter (NIFIC) is intended to serve as a non-state firewall (IP packet filter), a tool for network flow inspection, an intelligent HUB etc. NIFIC is able to process network traffic at the full speed of the line without any packet loss. NIFIC is able to process full throughput from Ethernet port to Ethernet port (20 Gb/s on two-ports 10 GbE interface card) for all packet lengths. Furthermore it is able to select specified traffic and forward it into the software applications (with throughput of 10 Gb/s for all packet lengths) according to up to 2048 filtration rules. The significant feature of the NIFIC is capability to change filtration rules without any packet loss.
Filtration rules format is similar to the OpenBSD PF or FreeBSD ipfw language. It is able to filter network traffic on the basis of the following fields in a packet:
- Source and destination MAC address.
- Source and destination IP address (currently only IPv4 is supported).
- Protocol.
- Source and destination port (TCP or UDP).
- TCP flags.
- Input interface number.
Packets entering NIFIC filtration core can be
- redirected to the Ethernet interface(s),
- redirected to the software interface(s) (software application) or
- discarded.
NIFIC also supports packet replication (packet can be sent to more output interfaces) and packet cropping for host PC software output.
3.3.2 HANIC
Hardware-Accelerated Network Interface Card (HANIC) is based on simple NIC design but in addition it is able to provide network traffic distribution among available CPU cores. HANIC is able to process network traffic at the full speed of the line without any packet loss for all packet lengths. HANIC works on two-port 10 GbE interface card with 10 Gb/s throughput into the software application. Besides static packet sampling, the packet cropping can be used to process all network traffic from both 10 GbE interfaces (20 Gb/s).
Data distribution among CPU cores is based on CRC hash computed from the following fields:
- source and destination IP address (both IPv4 and IPv6 support),
- source and destination port (TCP or UDP),
- protocol,
- IP version and
- input interface.
This way we can be sure that all data falling into the specific IP flow will be processed on the same core (by the specific application). This is highly profitable e.g., in case of flow information export.
More information about network traffic distribution among CPU cores can be found in [4].
3.3.3 Flexible FlowMon
Flexible FlowMon design was originally designed for intention of standalone NetFlow/IPFIX probe. This probe is now a part of the HAMOC applications family. The Flexible FlowMon Probe is a passive network monitoring device. It is able to collect data about IP flows and export them to external collectors in the NetFlow (version 5 and 9) and IPFIX (IP Flow Information eXport) format.
There are two firmware versions of the Flexible FlowMon. The Flexible FlowMon LT is lightweight version parsing IP packet headers and preparing IP flow records from every single packet. These records are forwarded together with computed hash into software applications providing flow cache and aggregating flow records preprocessed by hardware accelerator. The second, full Flexible FlowMon, firmware provides flow cache directly in the hardware and speedup the data processing. More information about Flexible FlowMon probe can be found in [5].
3.4 HAMOC Software
As a base for the HAMOC software we provide a Linux Kernel Modules (LKM) for our COMBO cards. COMBO card's LKMs serve as drivers providing interface for communication with the firmware stored in the COMBO card's FPGA.
COMBO drivers provide network data either by standard Linux network interfaces or by special fast Straight ZEro copy (SZE) data interfaces. Using standard network interfaces is extremely slow for the case of processing data from the high-speed network. In this case all data are parsed in the Linux's TCP/IP stack. But it is a basic way how to receive/send data from the COMBO card. On the other hand SZE interfaces are extremely fast thanks to Direct Memory Access (DMA) transfer from/to hardware through ring buffers provided by COMBO drivers. But in this case users/developers have to access data from SZE interfaces in its specific form. For the purposes of accessing data through the SZE interfaces we provide libsze2 library with API for accessing the data. The third possibility is to use PCAP library (some more info can be found in Section 4). This approach combines both previously described methods. By inner modification of the PCAP library we have added support for SZE data interfaces of the COMBO cards. The developer is able to use standard PCAP API newly also providing access to the SZE interfaces. Access to other standard network interfaces through the PCAP API is preserved. The modified PCAP library is available in the Liberouter RPM repository as libpcap-sze.
The top level of the HAMOC software architecture can be called user application layer. It is made up of the set of the third-party tools (e.g., tcpdump, tcpreplay, wireshark, snort etc.) tuned to work with COMBO card. These standard tools make using of the COMBO hardware accelerators more user-friendly.
![[Image]](hanic_sw_arch.png)
Figure 7. HAMOC software architecture.
4 Packet Capture Library
There are many tools intended for the packet capturing and further processing of the network data. The most known are tcpdump or Wireshark. These tools are based on the PCAP library (also known as libpcap) – a portable C/C++ library for network traffic capture. PCAP library is a part of the tcpdump project. It is currently available for the most of nowadays operating systems (Windows, Linux, Mac OS X, etc.).
As mentioned in Section 3.4 we have modified standard PCAP library to support the COMBO card family. Currently we provide libpcap-sze-1.1.1 RPM package – the latest stable release of the PCAP library with our modification to support SZE interfaces. The library is available in the Liberouter RPM repositories. This way we also provide recompiled tools (tcpdump, tcpreplay, etc.). These tools are by default provided compiled without some features needed to speed up processing of the captured data on high-speed networks.
4.1 Timestamps for Packet Capture
Standard PCAP file format strictly uses microsecond timestamp precision. This limitation was overcome in the pcap-ng file format [6]. The new format allows possibility to explicitly specify timestamp precision from seconds to theoretically 10-128 of second. On an Ethernet network with a speed of 10 Gb/s the shortest difference between the packet's timestamp can be 67 nanoseconds. Therefore standard PCAP's microsecond precision is not sufficient as illustrates Figure 8.
![[Image]](Wireshark-uts.png)
Figure 8. Insufficient precision of microsecond timestamps precision.
In the case of a low-cost network cards (NICs) the timestamps are filled by PCAP library with value provided by OS kernel in microsecond precision. This approach needs additional computing power and is quite inaccurate. This approach is inadequate for the precise timestamping in high-speed networks due to the long time delay between receipt of a packet on the network interface and its processing (and timestamping) in software. This time delay is evidential e.g., when using NICs connected by USB port.
Some NICs, including COMBO card family, provide timestamping directly in hardware. This way the time delay between receiving and timestamping is prevented. COMBO cards in addition provide timestamps with the high precision of nanoseconds. To generate high precise timestamps we use COMBOL-GPS add on card that is able to connect GPS receiver and use it as a source of high precise PPS (pulse per second) signal. PPS signal together with clock signal from the precise crystal integrated on the COMBOL-GPS card serves for generating timestamps with nanosecond precision. To initiate the system the current system time can be used but we recommend to get the current time from the Network Time Protocol (NTP) server.
![[Image]](timestamp_hw.png)
Figure 9. Hardware timestamps generation.
During processing packets with a nanosecond timestamps, PCAP library by default crops the precision and works only with microsecond precision to preserve compatibility of the standard PCAP file format used by all PCAP based applications. If there is the need of high precise timestamps, we have prepared sze2pcap tool that is able to store packets in PCAP file format with a nanosecond timestamps (see Figure 10). These files can be then processed (only) by application supporting nanosecond timestamp precision (e.g., Wireshark or tshark). The tcpdump doesn't support pcap-ng with a nanosecond timestamp precision yet.
![[Image]](timestamp_sw.png)
Figure 10. High-precision timestamps – pcap-ng file format.
4.2 Remote Packet Capture
The most of packet capture tools are based on the PCAP library. These tools are running locally processing packets received through the PCAP library. But in a specific cases (strict privileges management, better user interface of the tool on different platform or missing graphic environment on the probe, etc.) it can be more meaningful to process data on a remote analysis center.
For the purposes of resending captured data to the remote analysis center there is Remote Packet Capture (RPCAP) feature of the PCAP library. RPCAP is implemented for Linux (and other *NIXs) as well as for Windows versions of the PCAP library.
The scheme of the RPCAP usage is shown in Figure 11. There is RPCAP daemon (rpcapd) running on the probe and capturing network data. Wireshark/tcpdump (or other PCAP based tools) using PCAP library compiled with enabled RPCAP enhancement is placed at an analysis center. RPCAP daemon captures data (with possible filtration according to specified rules) and resend them to the remote network data analysis center. The connection is initialized by the client side. When the IP address of the probe is specified, libpcap is able to list probe's network interfaces to capture data from.
![[Image]](rpcap_schema.png)
Figure 11. Remote packet capture scheme.
RPCAP itself doesn't provide any mechanism to secure data transferred between client and server (including user authentication data). To avoid interception and data compromising we recommend using of encrypted channel prepared with OpenVPN or OpenSSH to secure the data.
5 Network Traffic Monitoring Applications
One of the main goals of the HAMOC is to provide a set of well-known third-party network tools tuned to work with COMBOv2, as discussed in the previous parts of this report. Therefore the network administrators would have a possibility to easily use hardware acceleration with their applications without any special knowledge. In this section, we provide an overview of the possible HAMOC applications divided into three groups: packet capture and replay applications, NetFlow/IPFIX generation applications and deep packet inspection applications.
5.1 Packet Capture and Replay
Packet capture and replay tools are used for detailed examination and experiments with the network links. But the possibility to capture all the traffic from high-speed networks (10 Gb/s and more) is usually out of the capabilities of standard hardware and software tools. In this case, the use of hardware accelerated HAMOC platfrom provides common packet capture and replay tools with the possibility to monitor high-speed links at full rate.
5.1.1 tcpdump
Tcpdump is a command line packet analyzer, which allows users to capture network traffic and to analyze network behavior. This tool is developed together with PCAP library. It uses libpcap library and its main features are:
- Displaying description of the contents of packets in console.
- Saving and reading packets from PCAP files.
- Advanced filtering capabilities based on the Berkeley Packet Filter capture filter.
Main contribution of the HAMOC platform is the guaranteed network traffic capturing at high-speed lines, compared to the standard tcpdump tool running over the standard NIC card. Standard approach provides no guarantees about the captured traffic and usually is not able to capture complete traffic trace at line rates around 10 Gb/s and more.
5.1.2 tcpreplay
Tcpreplay is a set of command line tools for replaying network traffic from PCAP files (e.g., files captured by the tcpdump tool). It supports several methods to inject packets back to the network (with adjustable speeds) and also a wide scale of possibilities how to re-write headers of injected packets.
The use of hardware accelerated HAMOC version of tcpreplay toolset provides users a possibility to replay captured network traffic at higher network speeds.
5.1.3 Wireshark
Wireshark is an alternative to the tcpdump tool with an advanced graphical front-end and more information sorting and filtering options. It also allows capture the network traffic and store it to the PCAP files. It supports a wide range of network types and protocols, nanosecond timestamp format etc. Wireshark also provides a console version called tshark.
The main advantage of the Wireshark running at the HAMOC platform is its capability to collect complete traffic data with no packet loss at full line rate, similarly to the tcpdump running at HAMOC platform.
5.2 NetFlow/IPFIX Generation
Second group of applications is oriented to IP flow monitoring. IP flow monitoring (represented by NetFlow or newer IPFIX export format) is a widely used approach for monitoring high-speed networks. Hardware acceleration provided by the HAMOC platform suits well for such purpose. It provides either a specialized firmware performing NetFlow/IPFIX generation or can provide hardware acceleration to third party tools generating NetFlow data, as nprobe or fprobe. NfSen is an example of the NetFlow collector suitable for the deployment with the HAMOC platform.
5.2.1 Flexible FlowMon
Flexible FlowMon provides a possibility to generate NetFlow/IPFIX data from two 10 Gb/s lines at full rate. It is fully reliable and generate complete flow statistics in every network states, e.g. during the (D)DoS attack. Standard solutions for NetFlow generation like network routers or software solutions are not able to generate complete and accurate NetFlow statistics in such cases and therefore their usage for e.g. network security monitoring is not proper. Details about Flexible FlowMon firmware were discussed in Section 3.3.3.
5.2.2 INVEA FlowMon, nprobe, fprobe
Although HAMOC platform provides finely tuned solution for NetFlow/IPFIX generation, there is a possibility to use third party tools for generation of NetFlow data with the support of HAMOC hardware acceleration.
INVEA FlowMon is a software tool for generation of NetFlow and IPFIX flows. It is a complex solution with a wide set of additional plugins for NetFlow data processing and analysis.
fprobe is a software tool for generation of NetFlow flows based on libpcap library. Similarly, nProbe is also a software probe for generation of both NetFlow/IPFIX flows with a wide set of added features.
All these applications can be used with HAMOC and hardware acceleration to achieve a reliable full rate processing and NetFlow or IPFIX (in case of nProbe and INVEA FlowMon) generation.
5.2.3 NfSen
NfSen represents a widely used open-source NetFlow collector. It is responsible for the NetFlow data acquisition, storing and basic visualization. NfSen is fully compatible with the HAMOC platform NetFlow exporters.
5.3 Deep Packet Inspection
Third group presents an overview of the applications suitable for hardware accelerated Deep Packet Inspection (DPI) and pattern matching. All standard solutions performing DPI are limited by the maximum line speed, at which are able to process all incoming packets.
Hardware accelerated HAMOC DPI improves the packets processing by two contributions: (i) It performs a distribution of the processed packets to the multiple CPUs (see Section 3.3.2) and therefore increases the maximum line rate, which is able to process. (ii) It performs packet filtering in the hardware (see Section 3.3.1). Therefore only a subset of the packet traffic is passed to the software and we are able to process higher line speed. Following list represents several possible application of hardware accelerated DPI.
5.3.1 Snort
Snort is an open source network intrusion detection and intrusion prevention system (IDS/IPS). As it combines the benefits of signature, protocol and anomaly-based inspection, it is one of the most widely deployed IDS/IPS tools worldwide. It works in the following modes:
- Sniffer – displays network traffic.
- Packet Logger – saves displayed traffic to file.
- Network Intrusion Detection System – IDS.
- Inline Mode – Intrusion Prevention System – IPS.
A possibility of HAMOC to distribute network traffic to the multiple CPUs allows to run up to eight parallel instances of Snort application (see Section 3.3.2) and also to inspect only particular part of the network traffic by defining filtering rules. These improvements increase Snort performance rapidly.
5.3.2 Bro, Suricata, OpenDPI, dsniff
Snort in one example of the IDS/IPS application suitable for the HAMOC platform, but there is possibility to use any other network security application performing DPI with HAMOC hardware acceleration.
Bro is an open-source network IDS performing DPI. It passively monitors network traffic and looks for suspicious traffic patterns. Bro first parses network traffic to extract its application-level semantics and then executes event-oriented analyzers that compare the activity with patterns deemed troublesome.
Suricata is an IDS/IPS system. It is multi-threaded and has native IPv6 support. It's capable of loading existing Snort rules and signatures.
OpenDPI is a software application for traffic classification based on deep packet inspection.
dsniff is a collection of network security tools for network auditing and penetration testing, based on DPI.
5.3.3 ngrep, httpry
HAMOC hardware acceleration is suitable also for other network applications, inspecting packet payloads and extracting particular patterns.
ngrep provides similar functions, as GNU grep tool, but applies them to the network layer. It works over the libpcap library and allows to specify extended regular or hexadecimal expressions to match against data payloads of packets. HAMOC hardware acceleration distributes network traffic to the multiple CPUs, runs multiple ngrep instances in parallel and consequently rapidly improves processing speed.
httpry is similar application as ngrep, but specialized for displaying and logging HTTP traffic only. It is not intended to perform analysis itself, but to capture, parse, and log the HTTP traffic for later analysis.
6 System Performance
Performance is one of the most important attributes of the system. When speaking about performance of HAMOC platform we can view it from two perspectives. The performance of HAMOC hardware and firmware is the first perspective. Application performance of HAMOC platform is the second perspective. In this section there is an overview of these two perspectives. Hardware performance is described in the first part and application performance in the second part.
6.1 Hardware Performance
The performance of HANIC hardware is describing how much network data can be transferred to the point where applications are able to read it from. There are some limitations given by the design itself. The COMBOI-10G2 interface card has two 10 GbE ports, which is giving us two situations. The first one is when only one of the ports is in use and the second one is when both ports are used.
When only one 10 GbE port is used the hardware system is able to transfer all packets on all lengths to software (see Figure 12 and Figure 13). CPU load during data transfer is shown in Figure 12. The peak CPU load is under 40 % and is lowering with increasing packet length.
![[Image]](hanic-single-port-throughput.png)
Figure 12. HANIC single port throughput including CPU load.
![[Image]](hanic-single-port-throughput-frames.png)
Figure 13. HANIC single port frame throughput.
The second situation is more complicated and reveals some hardware limitations which didn't affect performance on single port. The major limitation is the interface over which data are transferred. The COMBO-LXT card is connected to the motherboard by PCI Express x8 slot in version 1.3.
The specified maximum transfer rate of generation 1 PCI Express system is 2.5 Gb/s per one link. The COMBO-LXT card uses 8 links so the throughput should be 20 Gb/s (2.5 x 8), but this is the theoretical raw throughput and there are some other things to be considered. The first one is that data are transferred in 8B/10B encoding, this means there is 25 % overhead to size of the data. This lefts 16 Gb/s out of 20 Gb/s for data transfers. Also a PCI Express system transfers data in the payload of Transaction Layer Packets (TLPs). TLP overhead varies between 20 to 28 bytes depending on the use of 32-bit or 64-bit addressing and optional 4 bytes for end-to-end cyclic redundancy checksum. There are some other sources of overhead like Link Protocol and Flow Control Protocol [7].
The performance of PCI Express system is greatly affected by system parameters and specifications of machine like its chipset. COMBO-LXT card is using an 8-lane Virtex-5 FPGA Integrated Endpoint Block for PCI Express designs with a Bus Master Direct Memory Access reference design. According to performance results presented in [7] the burst performance is 13.8 Gb/s for write to operating memory and 10.7 Gb/s for read from it. This is the burst performance which is not sustainable for longer periods of time and it was measured on machine with the Intel E5000P chipset. The document also mentions performance on machine with the Intel 965 value chipset. The results were only 8.36 Gb/s for write and 10.74 Gb/s for read.
Figure 14 and Figure 15 show how COMBOv2 card with HANIC design behaves when it is transferring data from both ports, this is basically test of how good it can perform in PCI Express transfers (these results were measured on machine with Intel E5000P chipset). Its performance is somewhere between 10 and 11 Gb/s, which is not sufficient for fully saturated traffic on both ports.
![[Image]](hanic-dual-port-throughput.png)
Figure 14. HANIC dual port throughput including CPU load.
![[Image]](hanic-dual-port-throughput-frames.png)
Figure 15. HANIC dual port frame throughput.
6.2 Application Performance
Focusing on performance of applications running on HAMOC platform is another way of measuring system performance. Monitoring of IPv6 tunnels with packet decapsulation done in software can be an example of such application. The application performance was measured by throughput test, during which packets from 10 Gb/s Ethernet network link were processed. The measurements ran on 2.0 GHz quad-core CPU and beside throughput CPU load was also monitored.
Throughput was measured for Teredo [8] and 6to4 [10] packets (throughput of ISATAP [9] packets is the same as throughput of 6to4 packets). In the first scenario single instance of the FlowMon exporter with loaded input plug-in was processing packets. This setup was unable to process all packets on small packet lengths even with full load on one CPU core. In the second scenario packets were distributed to four instances of the FlowMon exporter. Each instance of the FlowMon exporter was running on different CPU core providing more computing power for processing (see Figure 16). All packets on all feasible lengths were processed with medium to low CPU load on every core (see Figure 17).
The results confirm benefits of packet distribution in HANIC design [4], which provides easy way to scale application performance on multiprocessor systems almost linearly.
![[Image]](plugin-throughput.png)
Figure 16. FlowMon IPv6 plugin throughput – RFC 2544 10 Gb/s compliant test.
![[Image]](plugin-cpu.png)
Figure 17. FlowMon IPv6 plugin CPU load – RFC 2544 10 Gb/s compliant test.
7 System Evaluation
This chapter is focused on the evaluation of the HAMOC platform in the real networking environment. We have deployed and evaluated HAMOC platform in two use cases. The first one (see Section 7.1) is focused on the long-term monitoring of the backbone CESNET link including monitoring of IPv6 tunnels. The second one (see Section 7.2) represents the network security monitoring solution deployed at the Masaryk University campus, performing detailed inspection of monitored traffic and examining the malware presence.
7.1 Backbone Monitoring Use Case
This use case is focused on monitoring of 10 Gb/s backbone links. A large amount of network data is being processed during monitoring, which is mainly aimed at network transfers but not at payload of datagrams. It is also desired to keep measured data for longer period of time, this requires use of monitoring technology that can describe network traffic in a small volume of data and can still provide wide range of information about network traffic. The most used technologies today are NetFlow and IPFIX. The traffic is described by flows, which enables storing of NetFlow/IPFIX statistics in fraction of original traffic size.
This use case is based on flow monitoring. Basic architecture is shown in Figure 18. This monitoring setup uses HANIC firmware (see Section 3.3.2) with hardware packet distribution, which enables possibility to run multiple applications on same network data and distribute them over multiple processors [4]. Example of processing data from same DMA channel is shown in Figure 18 where are two exporters with different input plug-ins subscribed to same channel. The Exporter 1 is generating NetFlow statistics from regular traffic and the Exporter 2 is generating NetFlow statistics from traffic hidden inside IPv6 tunnels (input plug-in handles packet detection and decapsulation). Unlimited number of applications can be subscribed to one DMA channel. Applications can subscribe to one, some or all DMA channels. This way performance intensive application can run in multiple instances on different processor cores each subscribed only to its share of DMA channels and applications which doesn't require such computing power can subscribe all DMA channels and run on one core.
It is reasonable to divide traffic between two exporters for each link direction. All these exporters can send NetFlow statistics to one or more collectors or they can send some statistics to one collector and other statistics to another collector. This behavior is configurable by simple filters passed to FlowMon exporter.
![[Image]](hanic_flowmon.png)
Figure 18. Using FlowMon to monitor backbone links.
7.2 Campus Monitoring Use Case
The second use case is focused on the security monitoring of the campus network. In this case, we don't need to process as large amounts of traffic data as in the case of backbone link. Therefore, we can focus on more detailed inspection of the campus traffic and perform multiple parallel processing/analysis.
We use hash-based packet distribution to disassemble campus traffic to the particular CPU cores, see Figure 19. Distributed network traffic is processed by three installed applications:
- NetFlow exporter application (see Section 5.2) is used to perform longterm NetFlow/IPFIX monitoring. Campus traffic is exported in the form of NetFlow format to the collector, where this data is stored and kept for long-time archiving. A set of NetFlow security tools processes exported NetFlow data at the collector side and alerts network operators about current network threats, attacks or malware in real-time.
- Snort application (see Section 5.3) is deployed in the mode of the network IDS system and performs DPI processing of the campus traffic. Whenever any malicious traffic pattern appears in the content of the inspected packets, the network operators are alerted.
- Wireshark application (see Section 5.1) is installed in the “on demand” mode. The network operators have a possibility to start capturing campus traffic to the PCAP files on their request. There is a possibility to dump complete campus traffic or to filter out only suspicious traffic and store it for further detailed analysis.
![[Image]](hanic_snort_simple.png)
Figure 19. Using FlowMon, Snort and Wireshark applications to monitor campus network.
8 Conclusion
In this technical report, we have described the platform for network traffic observation at 10 Gb/s and higher speeds. Instead of developing new and proprietary tools we use well-known applications like tcpdump, Wireshark, Snort, etc. The network administrators and researches don't need to design new hardware or to develop new firmware to use hardware acceleration in their applications. They can use standard network monitoring applications with hardware preprocessing. The HAMOC platform provides data preprocessing (i.e. filtration, statistic counting, etc.) and traffic distribution among the processor cores directly in hardware. Such approach allows to handle 10 Gb/s traffic and overcome any possible performance issues typical for software-only approaches in high-speed networks.
9 Acknowledgments
This work is supported by the Research Intent of the Czech Ministry of Education MSM6383917201.
References
| [1] | HEYDE, A. Investigating the performance of endace dag monitoring hardware and intel nics in the context of lawful interception, 2008 [cit. 11/2010] Available online. |
| [2] | MARTÍNEK, T.; KOŠEK, M. NetCOPE: Platform for rapid development of network applications. In Proceedings of 2008 IEEE Design and Diagnostics of Electronic Circuits and Systems Workshop, p. 219-224. IEEE Computer Society, 2008. ISBN 978-1-4244-2276-0 |
| [3] | NOVOTNÝ, J.; ŽÁDNÍK, M. COMBOv2 - Hardware Accelerators for High-Speed Networking. 2010 [cit. 11/2010]. Available online. |
| [4] | PUŠ, V.; DEDEK, T.; MARTÍNEK, T. Hardware-accelerated distribution of network traffic among many processor cores. Technical report 15/2010, Praha: CESNET, 2010. |
| [5] | ŽÁDNÍK, M.; ŠPRINGL, P.; ČELEDA, P. Flexible FlowMon. Technical report 36/2007, Praha: CESNET, 2007. |
| [6] | DEGIOANNI, L.; RISSO, F.; VARENNI, G. PCAP Next Generation Dump File Format. [cit. 11/2010] Available online. |
| [7] | GOLDHAMMER, A.; AYER, J. Jr. Understanding Performance of PCI Express Systems, 2008 [cit. 11/2010] Available online. |
| [8] | HUITEMA, C. Teredo: Tunneling IPv6 over UDP through Network Address Translations (NATs). RFC 4380, IETF, February 2006. |
| [9] | TEMPLIN, F.;GLEESON, T.;THALER, D.Intra-Site Automatic Tunnel Addressing Protocol (ISATAP). RFC 5214, IETF, March 2008. |
| [10] | CARPENTER, B.; MOORE, K. Connection of IPv6 Domains via IPv4 Clouds. RFC 3056, IETF, February 2001. |