FPGA-based Packet Header Anonymization

CESNET technical report number 16/2006
also available in PDF, PostScript, and XML formats.

Sven Ubik, Petr Žejdl, Jiří Halák
27.12.2006

1   Abstract

Packet header anonymization is very important for passive network monitoring, which observes characteristics of user traffic and for sharing captured tracefiles, which are valuable resource for networking research.

In this paper we present design and implementation of our FPGA-based packet header anonymization operating in real time and preventing sensitive information from getting to the monitoring PC and beyond.

2   Purpose of packet header anonymization

In passive network monitoring we process real user traffic. This approach has important advantages over active network monitoring which uses artificially injected test traffic. Particularly, test traffic can affect user traffic and it cannot be used to detect phenomena present only in user traffic, such as security attacks and dynamic characteristics. Captured packet traces are useful resource for networking research in workload characterization, traffic engineering and security application design.

When processing user traffic, we have to secure user privacy. For the purposes mentioned above, we can remove most of the packet payload, but we need to keep packet headers and certain patterns from the payload, such as security attacks indicators. But packet headers still include sensitive information, particularly in the form of source and destination IP addresses.

The purpose of packet header anonymization is to remove all sensitive information to protect user privacy, while keeping as much of the original traffic characteristics as possible.

3   Related work

Anonymization can be done on a reconstructed stream, which is then again split into packets [anon-stream]. This type of anonymization can do advanced processing of higher-layer protocols, such HTTP. However, this approach is compute-intensive and it looses many of the original traffic dynamics (packet spaces, bad CRCs, IP options, etc.).

A table-based approach to IP address anonymization was implemented in TCPdpriv. A cryptography-based scheme to provide consistency of anonymization across different traces using the same cryptography key was described in [anon-crypt]. However, software implementation is slow for gigabit speeds and it requires that sensitive information is temporarily stored in a monitoring PC.

Hardware implementation on a network processor using precomputed mapping trees [np-anon] can remove sensitive information in the monitoring hardware, but it still requires a lot of instructions and several memory accesses per packet. The measured throughput for 40-byte UDP datagrams was 65000 packets per second, which is approx. 50 Mb/s at a wire level.

4   Requirements

We defined the following set of requirements to be fulfilled by our implementation:

5   Implementation

We decided to use a COMBO6 card developed by the Liberouter project with SCAMPI firmware as a platform to implement hardware anonymization. The COMBO6 card is a PCI card for PC equipped with Gigabit Ethernet ports and the Virtex II FPGA circuit to process packets arriving from the network. The SCAMPI firmware implements advanced packet processing including packet classification, sampling and statistics. An important advantage of COMBO6 cards and SCAMPI firmware for our implementation was that we had access to VHDL source code of the SCAMPI firmware.

We added hardware anonymization to the SCAMPI firmware as part of the LOBSTER project. The goal of LOBSTER is to enhance the SCAMPI architecture and to deploy it for passive monitoring in a European scale. In the following section we describe the architecture and properties of our implementation.

We divided anonymization tasks into two layers (or tiers) first-tier anonymization implemented in hardware of the monitoring adapter and second-tier anonymization implemented in software.

First-tier hardware anonymization works at packet level. It can do various types of anonymization of any fields in IP, ICMP, TCP and UDP headers. Packets with anonymized headers are retrieved from the monitoring adapter to the PC. Second-tier software anonymization works at flow level. Individual flows are reconstructed from packets. This allows anonymization of information in higher-layer protocols, such as replacing HTTP header fields or URLs in payload. In this report we concentrate on the first-tier hardware anonymization.

5.1   Position of anonymization in packet processing

Anonymization is a kind of data transformation. Therefore, we designed a universal Transformation Unit (TU) for general packet transformations, which can be used to anonymize selected information inside packets. The TU unit can be used as a standalone IP core when we provide required data to its input signals.

The SCAMPI firmware consists of several units. The position of the TU unit in the SCAMPI firmware is illustrated in Figure, which includes only units important to our discussion.

[Figure]

Figure 1: Position of anonymization in packet processing

Input packets go to the Header Field Extractor (HFE), which parses packet headers, stores them in internal data structures and prepares packet parameters, which are pointers to important sections of the packet, such as positions of selected header fields. A table of these packet parameters is added in front of each packet. HFE operation is programmable and we modified it such that the packet parameters include pointers to all important sections of the packet that we want to anonymize.

Parsed headers come to the Lookup Processor (LUP), which sorts packets into classes based on header fields and assigns each packet a 32-bit control word, which consists of three parts:

Due to limitations in hardware design we could not use a wider control word. Therefore, we reused an 8-bit STATID part of the control word indicating the STU number to also indicate a class of packets for anonymization purposes. Different anonymization can be applied to packets in different classes. Packets which have been assigned some non-zero STU number are passed to the TU unit. Packets which have been assigned zero STU number do not go to the TU unit and are not transformed.

5.2   Transformation Unit (TU)

The TU unit is designed as a small processor interpreting programs in a simple instruction set. These programs describe what anonymization should be performed for each header field for packets in each class.

The TU unit can do the following types of header field transformations:

Each of the above types of anonymization can be applied to any 16-bit header field in the packet. Two flags in the TU instruction (see below) can be used to mask out left or right half of the 16-bit header field and request that the anonymization function applies to an 8-bit header field only. Each header field can be identified by an offset (in 16-bit words) from one of the following packet parameters:

For example, you can identify source IPv4 address by using PP_IPv4 packet parameter and offsets 6 and 7.

5.2.1   TU instructions

The anonymization is specified as a sequence of instructions. Each instruction is 32 bits wide and it is structured in the following format:

|   |                   |                   |                   |
|1 0|9 8 7 6 5 4 3 2 1 0|9 8 7 6 5 4 3 2 1 0|9 8 7 6 5 4 3 2 1 0|
|J|  IC |   | HL|   PP  |  OFFS |            IMM16              |

From left to right, the instruction includes the following fields:

Programming of the TU unit is started by writing 0 to address 0x130010, which disables the TU unit operation. The TU program is then stored beginning at address 0x131000 and its end is marked by 0 value stored after the last instruction. The TU unit operation is then started again by writing 1 to address 0x130010.

If we want to use table-based hashing, we also need to upload content of hashing tables. There are eight tables, each including 256 rows of 16-bit patterns. The tables are beginning at address 0x132000.

An example of the TU program is shown in section.

5.2.2   TU operation

The TU unit structure is shown in Figure. Anonymization programs are stored in the instruction memory (1). A packet class assigned by LUP is used to select a corresponding anonymization program (2). Each instruction is then decoded (3), the required packet parameter (offset to some key header field) is retrieved (4), internally added with direct offset specified in the instruction and compared with the current packet position (5). The packet itself arrives through the input register in 16-bit chunks. These chunks are associated with the transformation description (6) and passed to the data pipeline. Different anonymization functions (7) require different number of clock cycles to complete. Therefore, the input data is passed to each function from the different stage of the pipeline such that all results arrive at the right time to the multiplexor (8), which selects the function that should be applied to the current header field.

[Figure]

Figure 2: Transformation Unit (TU)

5.2.3   C language API

A high-level programming API as a set of C language functions was developed for easy configuration of anonymization. Users do not have to configure the TU unit directly. The following functions are provided:

Predefined constants are available to refer to common protocols and their header fields for the scampi_add_transformation() function.

5.3   Prefix-preserving IP address mapping

The Prefix-Preserving Mapping Unit was designed for IP addresses, but it can operate on any 32-bit chunks of data. The advantage of prefix-preserving mapping is that is retains the full extent of the original traffic dynamics in IP address space. On the other hand, this property also implies vulnerability to attacks that can reveal original IP addresses.

The mapping between the original and anonymized IP address is based on swapping subtrees of a tree representation of IP addresses [anon-crypt]. The method is illustrated in Figure for 3-bit numbers. The set of all 3-bit numbers can be represented by leaf nodes or by paths going from the root node to the leaf nodes in a 3-level binary tree. When we take bits of a binary number from left to right, then the path from the root node goes left for 0 bit and goes right for 1 bit.

Now we can mark some nodes in our example by black color. When we swap bits in original IP addresses corresponding to the marked nodes, we get anonymized IP addresses and the mapping between original and anonymized IP addresses is prefix-preserving. We can mark any nodes in a tree to get prefix-preserving mapping, but certainly some markings produce low-quality anonymization, such as when very few nodes are marked or not marked. A common method is to mark nodes pseudorandomly or as a result of cryptographic function applied to a key.

[Figure]

Figure 3: Prefix-preserving mapping by swapping tree nodes

The flip or non flip action can be implemented in hardware by XOR operation of the original bit with 1 or 0 bit, respectively. The mapping and XOR patterns for our example are shown in Table. Note that the property of this method is that the first bit of an IP address is always either swapped or not swapped (it does not depend on the IP address) and that the last bit of an IP address is not used to select mapping.

 Original address   XOR pattern   Anonymized address 
000 110 110
001 110 111
010 110 100
011 110 101
100 100 000
101 100 001
110 101 011
111 101 010

Table 1: Prefix-preserving mapping by XOR operation

5.3.1   Memory organisation

To store marks of the whole tree, we need at least one bit for each non-leaf node. That is for 32-bit IP addresses we need at least 223 - 1 bits = approx. 512 MB of memory. For practical implementation it is more convenient to store each path from the root to a leaf as a separate 32-bit word which can be directly XORed with the original IP address to get the anonymized address. In this representation we would need 16 GB of memory. There is only 512 MB of memory on the COMBO card and we need some memory to store packets.

The volume of required memory can be reduced by storing only a part of the mapping tree and replicating it. This method was first proposed in [np-anon]. We can divide the 32-bit tree into four levels of 8-bit subtrees. There is one 8-bit subtree in the first level, 256 of 8-bit subtrees in the second level, 256*256 of 8-bit subtrees in the third level and 256*256*256 of 8-bit subtrees in the fourth level. We can anonymize the first byte (the most significant) of an IP address using the first level of 8-bit subtrees, the second byte by the second level of 8-bit subtrees and so forth.

In real packet traces we normally do not have all 232 IP addresses. The number of different IP addresses present is much smaller. Therefore, if we store only a subset of 8-bit subtrees and replicate them to the positions of remaining subtrees, chances are that we do not use many duplicates. In an extreme case, when we store only one 8-bit subtree in each of the four levels, the anonymization of each byte in an IP address is done independently of the values of previous bytes, which would result in weak anonymization. Therefore we should store more subtrees.

One 8-bit subtree includes 256 paths from the root to its leaves. But marking is the same for each two adjacent paths that differ only in the last step to the leaves. Therefore, one 8-bit subtree can be represented by 128 of 8-bit words.

In order to read a mapping path for the whole IP address as fast as possible and to allow configuration of the number of stored 8-bit subtrees, we store the mapping tree in BRAMs inside FPGA as illustrated in Figure. We use two dual-port BRAMs, which can be used to read mapping paths in all four levels in one clock cycle. The data width of each BRAM is 8 bits, the address width is configurable and it is at least 11 bits.

Figure a) shows the case when both BRAMs have address width of 11-bits. Each BRAM can store 16 8-bit subtrees. Only seven bits from each byte are used to direct anonymization (see the description of the mapping tree above). Therefore, there are some spare bits in the BRAM s address space. These bits are connected to some of the bits of the previous byte in the IP address. In this way anonymization of one byte is influenced at least by a part of the value of the previous byte. Spare bits of the first BRAM are connected to fixed values. Each additional available address bit doubles the number of stored 8-bit subtrees.

Figure b) shows the case when the address space of the BRAM for the two lower levels is larger than for the two upper levels. This is desirable because the lower levels need more 8-bit subtrees if we do not want to increase replication. More bits from the second and third bytes are used to influence anonymization of next bytes. In practical implementation we use at least 2 BRAMs for the upper levels and 6 BRAMs for the lower layers.

A consequence of using separate BRAMs is that the part of the first BRAM is not used and subtrees for each level can be replicated from only one BRAM. An alternative solution shown in Figure c) is to use one larger BRAM. We can use the whole address space of this BRAM to select subtrees for all four levels, but we need to read them in four clock cycles one after another. As speed was our primary goal we use two separate dual-port BRAMs at the cost of replicating subtrees only within each two levels (option b).

[Figure]

Figure 4: Storing mapping trees in BRAMs inside FPGA

The mapping tree must be loaded into BRAMs before enabling the TU unit. We connected BRAMs to the local bus in the SCAMPI design. The data width of the local bus is 16 bits. Lower 8 bits are sent to the first BRAM and the upper 8 bits are sent to the second BRAM.

A pseudo-random mapping tree can be generated by the build_tree.pl script (no input is required), for example:

$ build_tree.pl -g tree

The script generates the tree.hex file, which can be used to load BRAMs and the tree.sim file, which can be used to simulate IP address anonymization. For example:

$ build_tree.pl -s tree 10 20 30 40
  10.20.30.40 (0x281E140A) -> 63.198.30.128 

The mapping tree is stored in 32-bit words in the tree.hex file. For example, in order to load the following paths (hex numbers):

First BRAM: 11, 12, 13, 14
Second BRAM: 21, 22, 23, 24, 25, 26,\\
  27, 28

the content of the tree.hex file will be as follows:

22122111
24142313
26002500
28002700

6   Examples of use

We can first use the scampi_load.sh script to load firmware into the FPGA for the specified card type:

$ scampi_load.sh tu mtx xcv2000

We can then use the scampi_init utility to initialize all firmware units. In the following example we set header filtering to forward all packets to the sampling unit SAU0 (which is by default configured to pass every packet in the deterministic mode):

$ scampi_init -a 0 src net 0.0.0.0/0

Finally, we can use the init-tu script to configure the TU unit to do the required anonymization (which is specified inside the init-tu script) and to load the prefix-preserving mapping tree from the tree.hex file:

$ tu-init tree

Suppose that we want to do the following anonymization:

[Figure]

Figure 5: Setup to test packet header anonymization

The required anonymization can be configured by the init-tu script consisting of the following set of TU instructions:

csbus 130010 0                  # Disable TU, allow programming
csbus 131000 B0560000           # Hash source IP (first part)
csbus 131004 B0570000           # Hash source IP (second part)
                                # Keep destination IP (first part)
csbus 131008 90590000           # Randomize destination IP (second part)
csbus 13100C 80709426           # Set source port to 9876 (0x2694)
csbus 131010 A0711234           # XOR destination port with 0x1234
csbus 131014 0                  # End of program
csbus 130014 12345678           # Init LFSR (random number generator)
csbus 132000 40302010           # Load hash table
csbus 132004 80706050           # dtto
csbus 130010 1                  # Enable TU

The csbus utility reads and writes values from and to specified locations in memory space of COMBO card firmware. Alternatively, we can use the C language API to program the TU unit:

scampi_reset_classification(DIRECT);
scampi_add_transformation(0, TU_TRANSFORMATION_HASH, TU_PROTO_IP, 6, 0);
scampi_add_transformation(0, TU_TRANSFORMATION_HASH, TU_PROTO_IP, 7, 0);
scampi_add_transformation(0, TU_TRANSFORMATION_RANDOM, TU_PROTO_IP, 9, 0);
scampi_add_transformation(0, TU_TRANSFORMATION_MAP, TU_PROTO_UDP, 0, 0x2694);
scampi_add_transformation(0, TU_TRANSFORMATION_XOR, TU_PROTO_UDP, 1, 0x1234);
scampi_set_filters();

We used the setup shown in Figure to test the configuration. PC1 sends UDP packets to PC2. These packets are captured by PC3. Before we enabled anonymization, the packet capture looked as follows:

01:00:00.000000 IP 10.0.1.2.2000 > 10.0.1.1.2000: UDP, length: 1000
01:00:00.000000 IP 10.0.1.2.2000 > 10.0.1.1.2000: UDP, length: 1000
01:00:00.000000 IP 10.0.1.2.2000 > 10.0.1.1.2000: UDP, length: 1000
01:00:00.000000 IP 10.0.1.2.2000 > 10.0.1.1.2000: UDP, length: 1000
01:00:00.000000 IP 10.0.1.2.2000 > 10.0.1.1.2000: UDP, length: 1000

After we enabled anonymization, the capture of the same packets looked as follows:

01:00:00.000000 IP 0.32.48.96.9876 > 10.0.68.143.13250: UDP, length: 1000
01:00:00.000000 IP 0.32.48.96.9876 > 10.0.200.43.13250: UDP, length: 1000
01:00:00.000000 IP 0.32.48.96.9876 > 10.0.42.77.13250: UDP, length: 1000
01:00:00.000000 IP 0.32.48.96.9876 > 10.0.131.131.13250: UDP, length: 1000
01:00:00.000000 IP 0.32.48.96.9876 > 10.0.95.134.13250: UDP, length: 1000

7   Performance and resources

Packets are passed through the TU unit at the rate of 16 bits per clock cycle. We need additional 20 clock cycles per packet (9 cycles to retrieve packet parameters, 9 cycles to fill the data pipeline and 1 cycle to initialise the instruction pipeline). Throughput depends on the packets size and it can be computed by the following formula:

throughput = (length + gap) * 16 * clockrate / (length + gap + 2*delay)

where:

The TU unit on its own was synthesized including all constraints at 100 MHz clock rate. Computed throughput depending on the packet size for this clock rate is shown in Figure. Throughput for the worst case of 64-byte packets is 1.08 Gb/s, which is sufficient to process Gigabit Ethernet traffic at line rate.

We tested the TU unit with the COMBO card consisting of the COMBO6 mainboard and the COMBO4MTX interface card. We used SCAMPI phase 1 design version 1_02_07 as the basis for our modifications. We changed HFE program to produce more packet parameters and we integrated our TU unit. However, the SCAMPI design can run at 50 MHz only and the TU unit integrated in the SCAMPI design must also run at 60 MHz. Some units also add more per-packet overhead. Therefore, throughput of the whole modified SCAMPI design, which we measured by a hardware packet generator was lower than the computed throughput of the TU unit alone and is also indicated in Figure.

[Figure]

Figure 6: Throughput of the TU unit and the modified SCAMPI design

The consumption of FPGA resources for the TU unit alone and for the whole SCAMPI design with the integrated TU unit is shown in Table.

TU unit Modified SCAMPI design
Used Percentage Used Percentage
Slices 956 7% 6238 44%
Flip-flops 1116 4% 6496 23%
4-input LUTs 987 4% 8001 28%
BRAMs 13 14% 46 48%

Table 2: Consumption of FPGA resources

8   Conclusion

We implemented easily configurable FPGA-based packet header anonymization that removes sensitive information from packet headers in real time on the monitoring adapter before it can get to the host PC. Anonymization functions can be different for various classes of packets and can include prefix-preserving IP address mapping, which preserves original dynamics in IP address space. The implemented TU unit to perform anonymization can run at full Gigabit Ethernet speed, although the whole modified SCAMPI design currently runs at lower speed. We plan to integrate our TU unit in the newer version of design for the COMBO card, which will permit to operate at full Gigabit Ethernet speed.

References

[anon-stream] Pang R., Paxson V.: A High-Level Programming Environment for Packet Trace Anonymization and Transformation, SIGCOMM 2003, August 25-29, 2003, Karlsruhe, Germany.
[anon-crypt] Fan J., Xu J., Ammar M. H.: Crypto-PAn: Cryptography-based Prefix-preserving Anonymization. Software, available online.
[np-anon] Ramaswamy R., Weng N., Wolf T. An IXA-Based Network Measurement Node. Proc. of Intel IXA University Summit, 2004. Available online.
další weby:fond rozvojemetacentrumCzechLightpřenosyvideoservereduroameduID.cz