Traffic Shaping Precision Test

CESNET technical report number 1/2001
also available in PDF, PostScript, and XML formats.

Sven Ubik
16.5.2001

1   Introduction

The goal of this test was to verify the precision of traffic shaping as implemented by Cisco routers used in our CESNET2 network and to check the influence of traffic shaping on the router load. The precision indicates how much we can rely on the shape of the processed traffic being compliant to the traffic shaping specification. Precision is an important parameter in large-scale diffserv-based networks with many aggregation points where incorrectly shaped traffic can cause congestions within the diffserv domain. The work was done as a part of the TF-NGN project Support of Delay and Jitter Requirements.

Traffic shaping is specified by two parameters: CIR (Committed Access Rate) and Bc (Burst Committed). The shaped traffic can be sent at the maximum rate of CIR measured over any interval (called a shaping interval) equal to or greater than Bc/CIR. Within the shaping interval, the instantenous output rate can be higher, possibly up to the output link rate. The input rate can be up to the input link rate (if it is not limited by another traffic conditioning mechanism) at any time. Packets are buffered until they can be sent in compliance with the traffic shaping specification provided that there is enough buffer space. Packets that arrived when the buffer is full are discarded. The buffer should accommodate at least the number of packets corresponding to Bc bits at any possible packet size. Cisco extends the basic traffic shaping specification by adding the Be (Extended Burst) parameter, which allows to sent out up to Be bits during the shaping interval under certain conditions. We disabled this functionality by setting Be=0.

The test was performed on a Fast Ethernet interface of the Cisco 7500 router. We used this particular type of router because it is commonly used in our network and because it provides sufficient computing power for advanced traffic processing such as traffic shaping. We also perfomed some parts of the test on the Cisco 2600 router with very similar results (except for performance characteristics).

2   Measurement configuration

The measurement configuration is shown in Fig. 1. The Cisco 7500 router was equipped with an RSP8 processor, two Fast Ethernet ports on a FEIP2-DSW-2TX card and two Fast Ethernet modules on a VIP4-80 card. We used the RUDE program to generate an input stream of a specified rate and shape on WS1 and a companion program CRUDE to capture an output stream on WS2. Both workstations were running Debian Linux. RUDE/CRUDE are similar in operation to mgen/drec [mgen] utilities, but they are much more precise and can run at higher speeds. For each received packet, CRUDE records stream identifier, sequence number, departure and arrival times and packet length in a binary or text log file.

System clocks of both PCs were precisely synchronized using a PPS signal distributed from a GPS receiver to PC serial ports. Nanokernel [nanokernel] was used to obtain the system clock state with finer granularity. We developed a utility which takes as input the text log file created by CRUDE and produces as output data and command files for gnuplot [gnuplot] tool, which can be then used to plot graphs indicating lost packets, throughput, delay, jitter, delay distribution and jitter distribution. We presented a discussion of various factors affecting precision of time synchronization in the proposed system and a description of the utility for computing QoS characteristics in [qofis2001].

[Figure]

Figure 1: Test configuration

3   Router configuration

The principal part of the router software configuration is shown below:

class-map match-all class1
  match access-group 100
!
policy-map ds
  class class1
    shape average 5000000 500000 0
!
interface FastEthernet6/1/0
  service-policy output ds
!
access-list 100 permit ip host 195.113.147.42 any

4   Throughput measurement

We wanted to make sure that the processor could route the testing traffic streams so that the results were not distorted by processor saturation. We also wanted to see how traffic shaping increased the processor load. We used a set of constant-rate streams, each consisting of packets of a different length. The test was performed twice. First, the stream was sent between two ports of the FEIP2-DSW-2TX card. Second, the stream was sent from a port of the FEIP2-DSW-2TX card to a port of the VIP4-80 card. The measured throughput for packets of different lengths and corresponding processor load without traffic shaping is summarized in the following tables:

FEIP2-DSW-2TX -> FEIP2-DSW-2TX
Packet size Throughput FEIP2-DSW-2TX load
1500 bytes 97.3 Mb/s 6 %
1024 bytes 96.2 Mb/s 59 %
512 bytes 92.3 Mb/s 88 %
256 bytes 86.3 Mb/s 99 %
128 bytes 66.0 Mb/s 99 %
64 bytes 32.9 Mb/s 99 %

Table 1: Throughput of forwarding to a port on FEIP2-DSW-2TX

FEIP2-DSW-2TX -> VIP4-80
Packet size Throughput FEIP2-DSW-2TX load VIP4-80 load
1500 bytes 97.3 Mb/s 35 % 7 %
1024 bytes 96.2 Mb/s 24 % 11 %
512 bytes 83.9 Mb/s 79 % 21 %
256 bytes 15.2 Mb/s 99 % 34 %
128 bytes 2.2 Mb/s 99 % 45 %
64 bytes 8.2 Mb/s 99 % 42 %

Table 2: Throughput of forwarding to a port on VIP4-80

4.1   Observations:

5   Input streams for traffic shaping

We used three different input streams shown in Figures 2 to 4: a constant-rate stream 5Mb/s (equal to CIR), a bursty stream 5Mb/s generated as a sequence of bursts with a steady rate of 10Mb/s for 0.1s followed by gaps with no packets sent for 0.1s and a bursty stream 5Mb/s generated as a sequence of bursts with a steady rate of 20Mb/s for 0.1s followed by gaps with no packets sent for 0.3s. For clarity, only the first 3 seconds of streams 2 and 3 are shown in Figures 3 and 4, respectively.

[Figure]

Figure 2: Input stream 1 - constant rate 5Mb/s

[Figure]

Figure 3: Input stream 2 - bursty 5Mb/s (10Mb/s for 0.1s, a gap for 0.1s)

[Figure]

Figure 4: Input stream 3 - bursty 5Mb/s (20Mb/s for 0.1s, a gap for 0.3s)

We generated each of the three streams twice - in 1500-byte packets and in 256-byte packets. Therefore, there were actually six different streams.

6   Measurement results

We set the traffic shaping parameters as follows: CIR=5Mb/s, Bc=500kb or 20kb and Be=0. The first alternative of Bc=500kb corresponds to the shaping interval of 100ms, the second alternative of Bc=20kb corresponds to the shaping interval of 4ms. This is the smallest shaping interval allowed on this type of router and in this IOS version. We found that other Cisco routers and IOS versions allow different minimum shaping intervals, generally longer than 4ms.

The RUDE/CRUDE package includes a script that can compute throughput at the UDP payload level with time granularity of 100ms, 10ms or 1ms. The first choice is optimal for Bc=500kb. However, it is not possible to use this script to compute throughput with 4ms granularity, which is needed to verify correctness of traffic shaping for Bc=20kb. With our utility, it is possible to measure all QoS characteristics with arbitrary time granularity.

Some of the throughput graphs show fluctuations caused by a low number of packets sent during the shaping interval. In the worst case of 1500-byte packets and the shaping interval of 4ms, theoretically 1.644 packets should be sent per shaping interval. When we compute throughput over a certain period of time (given by time granularity), we consider all packets whose last bit was received within that period, even though the first and last packet could be in part received in the previous or following period, respectively. Moreover, it seems unlikely that the router can schedule individual packets in ideal times so that no more than Bc bits are sent during the shaping interval. At best, it would probably continue sending packets in some rate until Bc bits is sent in the shaping interval.

Measured throughput, delay and delay variation characteristics are shown in the following figures. 1500-byte streams are depicted in solid lines, 256-byte streams are depicted in dashed lines. In some cases there are two separate graphs for 1500-byte and 256-byte streams, respectively.

[Figure]

Figure 5: Throughput - input stream 1, Bc=500kb, 100ms granularity

[Figure]

Figure 6: Throughput - input stream 1, Bc=500kb, 10ms granularity

[Figure]

Figure 7: Throughput - input stream 1 (1500-byte), Bc=20kb, 4ms granularity

[Figure]

Figure 8: Throughput - input stream 1 (256-byte), Bc=20kb, 4ms granularity

[Figure]

Figure 9: Throughput - input stream 2, Bc=500kb, 100ms granularity

[Figure]

Figure 10: Throughput - input stream 2 (1500-byte), Bc=20kb, 4ms granularity

[Figure]

Figure 11: Throughput - input stream 2 (256-byte), Bc=20kb, 4ms granularity

[Figure]

Figure 12: Throughput - input stream 3, Bc=500kb, 100ms granularity

[Figure]

Figure 13: Throughput - input stream 3, Bc=500kb, 10ms granularity

[Figure]

Figure 14: Throughput - input stream 3 (1500-byte), Bc=20kb, 4ms granularity

[Figure]

Figure 15: Throughput - input stream 3 (256-byte), Bc=20kb, 4ms granularity

[Figure]

Figure 16: Delay - input stream 3, Bc=500kb, 100ms granularity

[Figure]

Figure 17: Delay distribution - input stream 3 (1500-byte), Bc=500kb, 100ms granularity

[Figure]

Figure 18: Delay - input stream 3 (256-byte), Bc=20kb, 4ms granularity

6.1   Comments on individual figures and observations:

6.2   Influence of traffic shaping on the processor load

The following table shows the load of both processors when routing the input stream 1 (steady 5Mb/s) without and with traffic shaping:

Bc Packet size FEIP2-DSW-2TX load VIP4-80 load
no shaping 1500 bytes 1 % 0 %
no shaping 256 bytes 9 % 2 %
500 kb 1500 bytes 2 % 1 %
500 kb 256 bytes 16 % 7 %
20 kb 1500 bytes 3 % 29 %
20 kb 256 bytes 14 % 99 %

Table 3: Influence of traffic shaping on processor load

6.3   Observations:

7   Conclusion

We found that with the hardware and software equipment used, traffic shaping is working, in most cases, with precision sufficient for practical purposes - passing bursty traffic through a limited bandwidth channel, when introducing delay and jitter is acceptable and preventing congestion, which could be caused by aggregation within diffserv domain. Hovewer, there are certain issues, which should be considered. First, with input stream 2 traffic shaping did not work for Bc=500kb. The reason is not clear and requires further investigation. Second, with a very small shaping interval and small packets, the VIP4-80 is highly loaded. Finally, while the router correctly maintains the limit of Bc bits which can be sent out during the shaping period, high peaks within the shaping period can occur. These short peaks, however, should not cause congestion resulting from aggregation.

References

[nanokernel] D.L. Mills, P.-H. Kamp: The Nanokernel, Proc. of Precision Time and Time Interval (PTTI) Applications and Planning Meeting, Reston, VA, November 2000.
[qofis2001] Sven Ubik, Vladimír Smotlacha, Sampo Saaristo, Juha Laine: Low-Cost Precise QoS Measurement Tool, submitted to QofIS, the 2nd International Workshop on Quality of Future Internet Services.
[gnuplot] Thomas Williams, Colin Kelley, et al.: Gnuplot: An Interractive Plotting Program
[mgen] Brian Adamson, Sean Gallavan: The Multi Generator (MGEN) Toolset, http://manimac.itd.nrl.navy.mil/MGEN
další weby:fond rozvojemetacentrumCzechLightpřenosyvideoservereduroameduID.cz