STU_BURST: Hardware support for quantification of traffic burstiness

CESNET technical report number 27/2007
also available in PDF, PostScript, and XML formats.

Stanislav Hotmar, Sven Ubik
6.12.2007

1   Abstract

Network link load is usually expressed by average values over certain time periods. Short time periods reveal fluctuations that are not visible in long-term averages. Even short peaks of load can affect throughput of added traffic. Therefore it is interesting to look at short-term load dynamics. One way to quantify traffic dynamics independently of any time periods is to create a distribution of packet bursts sizes. In this report we describe design and implementation of a firmware unit that computes such distribution in real-time using a programmable FPGA card COMBO6X. Several parameters of the computed distribution are configurable providing various views on Internet traffic dynamics.

Keywords: Network traffic dynamics, passive monitoring, programmable hardware

2   Network traffic dynamics

Network link load is usually measured as average load over certain time periods. The measured values depend on the used time period. A shorter time period usually produces higher peaks and drops. A longer time period tends to smooth out short-term fluctuations. One way to quantify traffic dynamics independently of any time periods is to create a distribution of packet burst sizes. Such measurements can be used, for example, to estimate optimal sizes of router interface buffers.

We define a burst as a sequence of consecutive packets that ends with an inter-frame gap grater than a specified constant, while the inter-packet gaps between packets of a burst are smaller or equal to that specified constant

For precise packet burst monitoring we created a firmware unit called STU_BURST (Statistical Unit for Bursts) for a hardware monitoring card COMBO6X. This is a PCI-X card for a PC with 1 Gb/s SFP optical transceivers for packet capture and a Xilinx Virtex II Pro FPGA for traffic processing. An advantage of this card is that the hardware interface is available to developers who can create their own firmware.

We defined the following requirements for the STU_BURST unit:

3   Design description

We decided to add quantification of traffic burstiness to the existing NIFIC firmware (Network Interface Card with Filtration). An alternative was to add our monitoring to the SCAMPI firmware, but it was only available for older COMBO6 card with PCI 32-bit/33 MHz interface. We do not use functions of the NIFIC firmware in our monitoring, we just added burstiness monitoring in parallel to NIFIC so that users can use both functions at the same time. Packets come from the four input ports to the original NIFIC firmware as well as to the RTS (Relative Timestamp) and STU_BURST (Statistical Unit for Bursts) units as illustrated in Figure. However, while the RTS and STU_BURST units are designed for full line-rate processing at all packet sizes on all four input ports simultaneously (4x 1 Gb/s), the NIFIC design, which performs more advanced processing, cannot keep up at the full speed. For full line-rate processing it is safer to disable the input buffers in the NIFIC design. Packets are then processed only by the RTS and STU_BURST units.

There are four instances of the RTS and STU_BURST units, one to serve each of the four input ports. For each packet, the RTS unit provides packet length and packet timestamp on its output to the STU_BURST unit. Packet length is 14 bits long, which is enough for packets up to 16383 bytes. Packet timestamp is 64 bits long, where the first 32 bits are used for seconds and the other 32 bits are used for the fraction of a second. We call it a relative timestamp, because it uses its own system time, rather than the real wall time. The STU_BURST unit only needs to know the difference between timestamps of every two subsequent packets. Both packet length and packet timestamp are computed in the RTS unit using counters incremented by the 125 MHz clock of the GMII bus on RTS input. The rest of firmware runs at 100 MHz clock drived by on board oscillator. The RTS unit passes packet lengths and timestamps to the STU_BURST unit using 100 MHz clock. The STU_BURST unit is also connected to the Local Bus (LB) which is used for unit configuration and reading measured results.

[Figure]

Figure 1: Integration of traffic dynamics monitoring into firmware

The structure of the STU_BURST unit is shown in Figure. The BURST_SAMPLER unit computes the interframe gap (IFG) between every two subsequent packets A and B using the formula:

IFG = tsB - tsA - lenA

where tsA is a relative timestamp of packet A, tsB is a relative timestamp of packet B and lenA is the length of packet A. The interframe gap is measured from the CRC of one packet to the Ethernet header of the next packet, that is it includes a 7-byte preamble and 1-byte start of frame delimiter. If IFG is less or equal to the IFG_MAX specified by the user to denote the end of a burst, then packet B is considered as continuation of the current burst. The BURST_SAMPLER unit maintains the number of packets and bytes in the current burst. If IFG is greater than IFG_MAX, then packet B is considered as the beginning of a new burst and packet A was the last packet of a previous burst. In this case the BURST_SAMPLER unit passes the number of packets and bytes in the burst that has just finished to the BURST_CLASSIFIER unit.

The BURST_CLASSIFIER unit is configured by the user with 256 steps of bursts sizes (or bins) measured in bytes. These 256 bins can be all equally-sized, but they can be specified arbitrarily, such as in a logarithmic scale.

The BURST_CLASSIFIER unit increments one of 256 counters of packets, bytes and bursts that corresponds to the bin where the current burst was classified. Bursts are classified into bins in two sequential steps. In the first step, the size of burst is compared to 7 limits that divide the space of 256 bins into 8 parts. These limits are precomputed after the user uploads configuration of bin sizes and the comparison to all 7 limits takes place in parallel. In the second step, the size of burst is compared step-by-step with bin sizes within the part that was identified in the first step. This solution is a compromise between comparing to all 256 limits step-by-step, which would be slow, or in parallel, which would require more FPGA resources.

The packet and byte counters are increased by the number of packets and bytes in a burst provided by the BURST_SAMPLER unit, while the burst counter is just incremented. Packet and byte counters are 64 bits long and burst counters are 32 bits long. All counters are duplicated in two banks. At any time one bank is active and is used to count the number of packets, bytes and bursts in 256 bins. The other bank is inactive. When the user wants to read measured results, the inactive bank becomes the active bank and the formally active bank is available for reading. In this way the user can read results safely at any speed and values from all counters are valid for the same moment in time. Separate counters are provided for each input port. Therefore, the total number of counters is bins * characteristics * banks * input ports = 256 * 3 * 2 * 4 = 6144. The counters are implemented using BlockRAM in FPGA.

The counters that can overflow earliest are the burst counters in case when traffic is sent in all 64-byte packets (with 20-byte interframe gaps) and each packet is counted as a separate burst. At the full line-rate of 1 Gb/s, the burst counter capacity is exceeded in 232 / 109 / 8 / (64 + 20) = 2886 seconds. Therefore, to make sure that the counters do not overflow, the user should read results at least once per every 2886 seconds. The capacity of 64-bit byte counters is enough for more than 7 years.

[Figure]

Figure 2: Structure of STU_BURST unit

The required FPGA resources are summarized in Table along with the total resources available in FPGA.

RTSSTUB NIFIC+RTS+STUBFPGA capacity
Slice14513691318023616
Flip-flops25114141352447232
4-input LUTs6421501572847232
BlockRAM314130232

Table 1: Required FPGA resources

4   Use description

4.1   Initialization

Before we can start measurements, we need to load drivers, upload firmware into the card and initialize firmware units.

The drivers can be loaded by a script included with the NIFIC package:

# nific_lkm -l

The firmware can be uploaded using a script which is also included with the NIFIC package, but we need to point it to the directory where the STU_BURST firmware is available (rather than the original NIFIC firmware):

# nific -f <directory_with_STU_BURST_firmware>

If we do not need NIFIC functionality, it is safer to disable input buffers in the NIFIC design (they are disabled by default when firmware is uploaded, just to be sure we can do it manually), the set_ibuf.sh script that is provided with the STU_BURST firmware:

# set_ibuf.sh 0

4.2   Low-level access

We can communicate with the STU_BURST unit directly by reading and writing registers in its memory space. Alternatively, we can use a more user-friendly stub shell script, which is described in the next section.

Each of the four instances of the STU_BURST unit for the four input ports has its own address space, which starts at the address given in Table. The STU_BURST registers within its address space are described in Table. The csbus utility included in the NIFIC package can be used to read and write values from and to memory addresses.

InterfaceBase address
c6eth00x00140000
c6eth10x00141000
c6eth2 0x00142000
c6eth3 0x00143000

Table 2: Base addresses for STU_BURST units for input ports

OffsetSizeRead/writePurpose
008bR/WControl register
 - bit 0 - Enable (1=start measurement)
 - bit 1 - Read (1=start reading results)
 - bit 2 - Write (1=start writing configuration)
 - bit 7 - Reset (1=reset)
048bWAcknowledgement register
 - bit 0 - Read acknowledgement (1=go to next value)
 - bit 1 - Write acknowledgement (1=go to next value)
0816bR/W IFG_MAX (Maximum IFG within burst)
0C16bW Bin configuration (Write bin size here)
1032bR Burst counters
1432bR Byte counters - low part
1832bR Byte counters - high part
2032bR Packet counters - low part
2432bR packet counters - high part

Table 3: STU_BURST registers

4.2.1   Bin configuration

A typical sequence of commands for bin configuration is as follows:

# base = 0x001400
# csbus "$base"00 00000080 # reset
# csbus "$base"00 00000004 # start writing configuration
#
# csbus "$base"0C 00000064 # 100 byte size for bin 0
# csbus "$base"04 00000002 # go to next value
#
# csbus "$base"0C 000000C8 # 200 byte size for bin 1
# csbus "$base"04 00000002 # go to next value, etc.
#
# csbus "$base"00 # should be 0, otherwise configuration failed

4.2.2   Maximum IFG within burst

We can configure the maximum IFG when two subsequent packets will still be considered as part of the same burst by the following commands (here IFG is configured without preamble and the start of frame delimiter):

# base = 0x001400
# max_ifg=0E # set maximum IFG to 15 bytes
# csbus "$base"08 0000"$max_ifg"

4.2.3   Starting measurement

The measurement can be started by the following commands:

# base = 0x001400
# csbus "$base"00 00000001

4.2.4   Reading results

A typical sequence of commands to read measured results is as follows:

# base = 0x001400
# csbus "$base"00 00000003 # start reading results and continue measurement
#
# bursts = csbus "$base"10
# bytes_low = csbus "$base"14
# bytes_high = csbus "$base"18
# packets_low = csbus "$base"20
# packets_high = csbus "$base"24
# csbus "$base"04 00000002 # go to next value, etc.

4.3   Using stub script

To make communication with the STU_BURST unit more user-friendly, we created a shell script called stub. The script configures bin limits, maximum IFG within bursts, starts measurement and periodically reads results from the card and prints them on standard output. It finishes after specified time or it can be stopped by Ctrl-C.

You can start the stub script without arguments to see online help:

# stub
Usage: stub <interface> <ifg> <duration> [interval] [class_file]
    <interface>   c6eth0 to c6eth3
    <ifg>         maximum interframe gap within a burst in bytes
    <duration>    duration of measurement in seconds
    [interval]    interval of reading results in seconds (default 300)
    [class_file]  file with configuration of burst bins (default class.cfg)

IFG is configured without preamble and the start of frame delimiter. For example:

# stub c6eth0 20 60 300 class100.cfg

The above command will show a histogram of burst sizes every 60 seconds, which starts like this:

# Interface = c6eth0
# IFG           = 20
# Duration      = 300
# Interval      = 60
# Class file    = ./class100.cfg
# Start: 22:56:02:546495000
# Reading 1 : 22:56:07:554752000
1        0       0       0       0       (0 - 99)
1        1       0       0       0       (100 - 199)
1        2       0       0       0       (200 - 299)
etc.

5   Laboratory measurements

We first tested the STU_BURST unit by simulation, more details can be found in [Hot07]. Then we did a stress test by sending full 1 Gb/s to all four ports simultaneously in 1518 and 64-byte packets from Ixia 250 hardware packet generator. The unit counted all packets as one burst correctly and operated flawlessly.

In the next test, we checked if bursts are correctly classified into bins according to their sizes. We first configured the step of all bins to 100 bytes. We sent bursts of various sizes such that they were on boundaries between the bins. We did not test all 256 bins, just selected bins at the beginning, at the end and in the middle of the histogram. The IFG_MAX was set to 20 bytes, that is slightly more than 12 byte IFG of back-to-back packets. Measured results are given in Table. For instance, the second row shows that a 99-byte burst was still correctly in bin 0, while a 100-byte burst was already correctly in bin 1, etc.

We then configured bin sizes unequally to various numbers. We again sent several test bursts with sizes on boundaries between the bins, as shown in Table.

IXIA 250 - sent STU_BURST - measured
BurstsBurst sizeTotal bytes Exp. binBurstsTotal bytes Meas. bin
100 64 B 6.400 B 0 100 64 B 0
100 99 B 9.900 B 0 100 99 B 0
100 100 B 10.000 B 1 100 100 B 1
100 3.199 B 319.900 B 31 100 3.199 B 31
100 3.200 B 320.000 B 32 100 3.200 B 32
100 25.499 B 2.549.900 B 254 100 25.499 B 254
100 100.000 B 10.000.000 B 255 100 100.000 B 255

Table 4: Checking precision of burst quantification - equally-sized bins

IXIA 250 - sent STUB - measured
BurstsBurst sizeTotal bytes Exp. binBurstsTotal bytes Meas. bin
100 554 B 55.400 B 0 100 554 B 0
100 555 B 55.500 B 1 100 555 B 1
100 4.607 B 46.700 B 31 100 4.607 B 31
100 4.608 B 46.800 B 32 100 4.608 B 32
100 15.757 B 1.575.700 B 254 100 15.757 B 254
100 15.758 B 1.575.800 B 255 100 15.758 B 255
100 100.000 B 10.000.000 B 255 100 100.000 B 255

Table 5: Checking precision of burst quantification - variable-sized bins

In the final test, we checked precision of end-of-burst recognition. We sent packets with various interframe gaps and we measured the minimum value of IFG_MAX where all packets were still counted as one burst. Ideally, these two numbers should be equal. As we can see in measured results in Table, we needed to set IFG_MAX to approximately 3 bytes more than the theoretical value. When we sent packets between two ports on the generator connected back-to-back, we found that the generator exhibits certain fluctuations in packet dispatch times, which could be one of the reasons while we needed to set IFG_MAX to slightly higher value. Another reason can be impressions in timing within the RTS unit. In real measurements, the 3-byte difference is completely negligible, we can even set MAX_IFG to higher value, because interframe gaps of just a few bytes cannot be utilized by inserting a new packet and have no practical value.

IXIA 250STUB
Sent IFG Required IFG_MAX
12 B15 B
100 B103 B
1.000 B1.003 B
10.000 B10.003 B
65.530 B65.534 B

Table 6: Measuring precision of burst end recognition

6   Conclusion

We designed and implemented firmware for real-time monitoring of traffic burstiness using the hardware COMBO6X card. The measured results provide information about traffic dynamics independently of the sampling period, which is otherwise used to compute average load. The firmware counts the number of packets, bytes and bursts for 256 bins of bursts sizes specified arbitrarily by the user. We tested our implementation with a hardware packet generator for precision and performance as well as with live network traffic. The firmware can operate at full line-rate (1 Gb/s) at all packet sizes on all four input ports simultaneously. We plan to port our firmware to a 10 Gb/s hardware packet processing platform that we currently develop.

References

[Hot07] Hotmar S.: Implementace VHDL modulu pro sledování dynamiky síťového provozu, MSc Thesis, Computer Science Department, Faculty of Electrical Engineering, Czech Technical University, May 2007.
další weby:fond rozvojemetacentrumCzechLightpřenosyvideoservereduroameduID.cz