<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE zprava SYSTEM "techrep.dtd">

<zprava cislo="1/2001" jazyk="en">
<nazev>Traffic Shaping Precision Test</nazev>
<autor>Sven Ubik</autor>
<datum>16.5.2001</datum>

<h1>Introduction</h1>

<p>The goal of this test was to verify the precision of traffic shaping as
implemented by Cisco routers used in our CESNET2 network and to check the influence
of traffic shaping on the router load. The precision indicates
how much we can rely on the shape of the processed traffic being compliant
to the traffic shaping specification. Precision is an important parameter
in large-scale diffserv-based networks with many aggregation points where
incorrectly shaped traffic can cause congestions within the diffserv domain.
The work was done as a part of the <a href="http://www.terena.nl/task-forces/tf-ngn">TF-NGN</a> project <a href="http://www.cnaf.infn.it/~ferrari/tfngn">
Support of Delay and Jitter Requirements</a>.</p>

<p>Traffic shaping is specified by two
parameters: CIR (Committed Access Rate) and Bc (Burst Committed). 
The shaped traffic can be sent at the maximum rate of CIR measured over any 
interval (called a shaping interval) equal to or greater than Bc/CIR. Within the
shaping interval, the instantenous output rate can be higher, possibly up to 
the output link rate. The input rate can be up to the input link rate (if it 
is not limited by another traffic conditioning mechanism) at any time. 
Packets are buffered until they can be sent in compliance with the traffic 
shaping specification provided that there is enough buffer space. 
Packets that arrived when the buffer is full are discarded. The buffer should 
accommodate at least the number of packets corresponding to Bc bits at any 
possible packet size. Cisco extends the basic traffic shaping specification
by adding the Be (Extended Burst) parameter, which allows to sent out up
to Be bits during the shaping interval under certain conditions. We disabled
this functionality by setting Be=0.</p>

<p>The test was performed on a Fast Ethernet interface of the Cisco 7500 router.
We used this particular type of router because it is commonly used in our
network and because it provides sufficient computing power for advanced traffic processing such as
traffic shaping. We also perfomed some parts of the test on the Cisco 2600 
router with very similar results (except for performance characteristics).</p>

<h1>Measurement configuration</h1>

<p>The measurement configuration is shown in Fig. 1. The Cisco 7500 router
was equipped with an RSP8 processor, two Fast Ethernet ports on a 
FEIP2-DSW-2TX card and two Fast Ethernet modules on a VIP4-80 card. 
We used the <a href="http://www.atm.tut.fi/rude/">RUDE</a> 
program to generate an input stream of a specified rate and shape on WS1 
and a companion program CRUDE to capture an output stream on WS2.
Both workstations were running Debian Linux. RUDE/CRUDE are similar in
operation to mgen/drec <cite href="mgen"/> utilities, but they are much more precise and can run
at higher speeds. For each received packet, CRUDE records stream identifier,
sequence number, departure and arrival times and packet length in a binary 
or text log file.</p>

<p>System clocks of both PCs
were precisely synchronized using a PPS signal distributed from a GPS
receiver to PC serial ports. Nanokernel <cite href="nanokernel"/> was used
to obtain the system clock state with finer granularity. We developed
a utility which takes as input the text log file created by CRUDE and produces
as output data and command files for gnuplot <cite href="gnuplot"/> tool, which
can be then used to plot graphs indicating lost packets, throughput, delay,
jitter, delay distribution and jitter distribution. We presented a
discussion of various factors affecting precision of time synchronization 
in the proposed system and a description of the utility for computing QoS 
characteristics in <cite href="qofis2001"/>.</p>

<obr src="conf">Test configuration</obr>

<h1>Router configuration</h1>

<p>The principal part of the router software configuration is shown below:</p>

<p>
<pre>
class-map match-all class1
  match access-group 100
!
policy-map ds
  class class1
    shape average 5000000 500000 0
!
interface FastEthernet6/1/0
  service-policy output ds
!
access-list 100 permit ip host 195.113.147.42 any
</pre>
</p>

<h1>Throughput measurement</h1>

<p>We wanted to make sure that the processor could route the testing 
traffic streams so that the results were not distorted by processor saturation.
We also wanted to see how traffic shaping increased
the processor load. We used a set of constant-rate streams, each consisting 
of packets of a different length. The test was performed twice. First,
the stream was sent between two ports of the FEIP2-DSW-2TX card. Second,
the stream was sent from a port of the FEIP2-DSW-2TX card to a port of the
VIP4-80 card. The measured throughput for packets of different lengths and 
corresponding processor load without 
traffic shaping is summarized in the following tables:</p>

<p>
<tab sloupce="lll">
<tr><th colspan="3">FEIP2-DSW-2TX -&gt; FEIP2-DSW-2TX</th></tr>
<tr><th>Packet size</th><th>Throughput</th><th>FEIP2-DSW-2TX load</th></tr>
<tr><td>1500 bytes</td><td>97.3 Mb/s</td><td>6 %</td></tr>
<tr><td>1024 bytes</td><td>96.2 Mb/s</td><td>59 %</td></tr>
<tr><td>512 bytes</td><td>92.3 Mb/s</td><td>88 %</td></tr>
<tr><td>256 bytes</td><td>86.3 Mb/s</td><td>99 %</td></tr>
<tr><td>128 bytes</td><td>66.0 Mb/s</td><td>99 %</td></tr>
<tr><td>64 bytes</td><td>32.9 Mb/s</td><td>99 %</td></tr>
<nazev>Throughput of forwarding to a port on FEIP2-DSW-2TX</nazev>
</tab></p>

<p>
<tab sloupce="llll">
<tr><th COLSPAN="4">FEIP2-DSW-2TX -&gt; VIP4-80</th></tr>
<tr><th>Packet size</th><th>Throughput</th><th>FEIP2-DSW-2TX load</th><th>VIP4-80 load</th></tr>
<tr><td>1500 bytes</td><td>97.3 Mb/s</td><td>35 %</td><td>7 %</td></tr>
<tr><td>1024 bytes</td><td>96.2 Mb/s</td><td>24 %</td><td>11 %</td></tr>
<tr><td>512 bytes</td><td>83.9 Mb/s</td><td>79 %</td><td>21 %</td></tr>
<tr><td>256 bytes</td><td>15.2 Mb/s</td><td>99 %</td><td>34 %</td></tr>
<tr><td>128 bytes</td><td>2.2 Mb/s</td><td>99 %</td><td>45 %</td></tr>
<tr><td>64 bytes</td><td>8.2 Mb/s</td><td>99 %</td><td>42 %</td></tr>
<nazev>Throughput of forwarding to a port on VIP4-80</nazev>
</tab></p>

<h2>Observations:</h2>

<ul>
<li>Load of the RSP8 processor was negligible (indicated as 0%)</li>
<li>VIP4-80 processor is more powerful than FEIP2-DSW-2TX</li>
<li>Although the VIP4-80 processor is more powerful, the throughput
decreases more rapidly with decreasing packet size when sending packets from 
a port of the FEIP2-DSW-2TX processor to a port of
the VIP4-80 processor than when sending packets between the ports of the
FEIP2-DSW-2TX processor</li>
<li>For some reason, when sending from a port of the FEIP2-DSW-2TX processor
to a port of the VIP4-80 processor, the throughput is lower and the VIP4-80 
processor load is higher for 128-byte packets than for 64-byte packets</li>
<li>We did not measure the throughput for sending packets from a port of the
VIP4-80 processor to a port of the FEIP2-DSW-2TX processor and between two
ports of the VIP4-80 processor, these tests should be done for more 
comprehensive overview</li>
</ul>

<h1>Input streams for traffic shaping</h1>

<p>We used three different input streams shown in Figures 2 to 4: 
a constant-rate stream 5Mb/s (equal to CIR), a bursty stream 5Mb/s
generated as a sequence of bursts with a steady rate of 10Mb/s for 0.1s
followed by gaps with no packets sent for 0.1s and a bursty stream 5Mb/s
generated as a sequence of bursts with a steady rate of 20Mb/s for 0.1s 
followed by gaps with no packets sent for 0.3s. For clarity, only the 
first 3 seconds of streams 2 and 3 are shown in Figures 3 and 4,
respectively.</p>

<obr src="inputStream1">Input stream 1 - constant rate 5Mb/s</obr>

<obr src="inputStream2">Input stream 2 - bursty 5Mb/s (10Mb/s for 0.1s, a gap for 0.1s)</obr>

<obr src="inputStream3">Input stream 3 - bursty 5Mb/s (20Mb/s for 0.1s, a gap for 0.3s)</obr>

<p>We generated each of the three streams twice - in 1500-byte packets and
in 256-byte packets. Therefore, there were actually six different streams.</p>

<h1>Measurement results</h1>

<p>We set the traffic shaping parameters as follows: CIR=5Mb/s,
Bc=500kb or 20kb and Be=0. 
The first alternative of Bc=500kb corresponds to the shaping interval
of 100ms, the second alternative of Bc=20kb corresponds to the shaping interval
of 4ms. This is the smallest shaping interval allowed on this type of router
and in this IOS version. We found that other Cisco routers and IOS versions
allow different minimum shaping intervals, generally longer than 4ms.</p>

<p>The RUDE/CRUDE package includes a script that can compute throughput at 
the UDP payload level with time granularity of 100ms, 10ms or 1ms. The first
choice is optimal for Bc=500kb. However, it is not possible to use this script to 
compute throughput with 4ms granularity, which is needed to verify 
correctness of traffic shaping for Bc=20kb. With our utility, it is
possible to measure all QoS characteristics with arbitrary time granularity.</p>

<p>Some of the throughput graphs show fluctuations caused by
a low number of packets sent during the shaping interval.
In the worst case of 1500-byte packets and the shaping interval of 4ms,
theoretically 1.644 packets should be sent per shaping interval. When we
compute throughput over a certain period of time (given by time granularity),
we consider all packets whose last bit was received within that period,
even though the first and last packet could be in part received in the
previous or following period, respectively. Moreover, it seems 
unlikely that the router can schedule individual packets in ideal 
times so that no more than Bc bits are sent during the shaping interval. 
At best, it would probably continue sending packets in some rate until Bc bits 
is sent in the shaping interval.</p>

<p>Measured throughput, delay and delay variation characteristics are shown
in the following figures. 1500-byte streams are depicted in solid lines, 
256-byte streams are depicted in dashed lines. In some cases there are two
separate graphs for 1500-byte and 256-byte streams, respectively.</p>

<obr src="5Mb-500kb-100ms-throughput">Throughput - input stream 1, Bc=500kb, 100ms granularity</obr>

<obr src="5Mb-500kb-10ms-throughput">Throughput - input stream 1, Bc=500kb, 10ms granularity</obr>

<obr src="5Mb-20kb-4ms-1500B-throughput">Throughput - input stream 1 (1500-byte), Bc=20kb, 4ms granularity</obr>

<obr src="5Mb-20kb-4ms-256B-throughput">Throughput - input stream 1 (256-byte), Bc=20kb, 4ms granularity</obr>

<obr src="bursts-500kb-100ms-throughput">Throughput - input stream 2, Bc=500kb, 100ms granularity</obr>

<obr src="bursts-20kb-4ms-1500B-throughput">Throughput - input stream 2 (1500-byte), Bc=20kb, 4ms granularity</obr>

<obr src="bursts-20kb-4ms-256B-throughput">Throughput - input stream 2 (256-byte), Bc=20kb, 4ms granularity</obr>

<obr src="burstsSparse-500kb-100ms-throughput">Throughput - input stream 3, Bc=500kb, 100ms granularity</obr>

<obr src="burstsSparse-500kb-10ms-throughput">Throughput - input stream 3, Bc=500kb, 10ms granularity</obr>

<obr src="burstsSparse-20kb-4ms-1500B-throughput">Throughput - input stream 3 (1500-byte), Bc=20kb, 4ms granularity</obr>

<obr src="burstsSparse-20kb-4ms-256B-throughput">Throughput - input stream 3 (256-byte), Bc=20kb, 4ms granularity</obr>

<obr src="burstsSparse-500kb-100ms-delay">Delay - input stream 3, Bc=500kb, 100ms granularity</obr>

<obr src="burstsSparse-500kb-100ms-1500B-delayDistribution">Delay distribution - input stream 3 (1500-byte), Bc=500kb, 100ms granularity</obr>

<obr src="burstsSparse-20kb-4ms-256B-delay">Delay - input stream 3 (256-byte), Bc=20kb, 4ms granularity</obr>

<h2>Comments on individual figures and observations:</h2>

<ul>
<li>Fig. 5: Constant input stream is passed through as constant output 
    stream.</li>
<li>Fig. 6: Measuring throughput in finer granularity than the set shaping 
    interval reveales that the constant input stream actually produces some 
    fluctuations at the output (these fluctuations are really present in the 
    stream, there are not result of a low number of packets within a shaping 
    interval as described before). However, these fluctuations are allowed 
    within the shaping interval as long as no more than Bc bits are set during
    the shaping interval. </li>
<li>Fig. 7, Fig. 8: With the minimum shaping interval of 4ms, a low number of
    packets per shaping interval causes high fluctuations for large packets.
    </li>
<li>Fig. 9: 1500-byte stream experiences some transitional effect during the
    first 9.5 seconds. After that the stream is perfectly shaped. However, the
    256-byte stream is not shaped correctly.</li>
<li>Fig. 10: Traffic shaping works, fluctuations are caused by the low number of
    packets per shaping interval.</li>
<li>Fig. 11: Traffic shaping works very good, small fluctuations are caused by 
    the low
    number of packets per shaping interval, large drops every 0.2s are 
    probably caused by using input stream with slightly lower long-term 
    bandwidth than the set CIR=5Mb/s.</li>
<li>Fig. 12: Traffic shaping works for both 1500-byte and 256-byte streams.
    Considering relatively large shaping interval (and thus a large number of
    packets per shaping interval and lower requirements on the precision of
    scheduling in the
    router), resulting fluctuations are relatively large, but probably
    acceptable for practical use.</li>
<li>Fig. 13: Fluctuations within a shaping period are allowed, but the peaks
    in this case are rather high.</li>
<li>Fig. 14: Similar case as shown in Fig. 6, but fluctuations within shaping
    intervals are quite large.
    Considering a very fine shaping interval, resulting fluctuations are 
    acceptable.</li>
<li>Fig. 15: Traffic shaping works very good, similar case as in Fig. 11.</li>
<li>Fig. 16: Average delay per shaping interval circulates between a very low
    delay and several higher values. The reason is that the router passes some 
    of the packets from each input burst to the output almost immediately (with
    packet processing delay) until Bc bits are sent out. The remaining packets 
    of the input burst must be delayed and sent during the following shaping 
    intervals to keep the maximum of Bc bits sent out in a shaping interval.
    </li>
<li>Fig. 17: Delay distribution of packets that passed through a router 
    exhibits four significant peaks. The first column corresponds to packets
    sent during the same shaping period in which they arrived. The other three
    columns correspond to packets that were delayed and sent out during one of 
    the next three shaping periods.</li>
<li>Fig. 18: With fine shaping interval and small packets, delay linearly 
    increases within the period of input bursts as packets are sent in small
    quantities to produce a smoothly shaped stream.</li>
</ul>

<h2>Influence of traffic shaping on the processor load</h2>

<p>The following table shows the load of both processors when routing the
input stream 1 (steady 5Mb/s) without and with traffic shaping:</p>

<tab sloupce="llll">
<tr><th>Bc</th><th>Packet size</th><th>FEIP2-DSW-2TX load</th><th>VIP4-80 load</th></tr>
<tr><td>no shaping</td><td>1500 bytes</td><td>1 %</td><td>0 %</td></tr>
<tr><td>no shaping</td><td>256 bytes</td><td>9 %</td><td>2 %</td></tr>
<tr><td>500 kb</td><td>1500 bytes</td><td>2 %</td><td>1 %</td></tr>
<tr><td>500 kb</td><td>256 bytes</td><td>16 %</td><td>7 %</td></tr>
<tr><td>20 kb</td><td>1500 bytes</td><td>3 %</td><td>29 %</td></tr>
<tr><td>20 kb</td><td>256 bytes</td><td>14 %</td><td>99 %</td></tr>
<nazev>Influence of traffic shaping on processor load</nazev>
</tab>

<h2>Observations:</h2>

<ul>
<li>Load of the FEIP2-DSW-2TX processor increases more as a result of sending
shorter packets while the load of the VIP4-80 processor increases more as a
result of finer traffic shaping. This is not surprising, because traffic shaping
is performed by the output interface.</li>
<li>With the smallest possible shaping interval of 4ms and short packets,
the VIP4-80 processor becomes overloaded.</li>
</ul>

<h1>Conclusion</h1>

<p>We found that with the hardware and software equipment used,
traffic shaping is working, in most cases, with precision sufficient for practical purposes 
- passing bursty traffic through a limited bandwidth channel, when introducing
delay and jitter is acceptable and preventing congestion, which could be caused
by aggregation within diffserv domain. Hovewer, there are certain issues,
which should be considered. First, with input stream 2 traffic
shaping did not work for Bc=500kb. The reason is not clear and requires
further investigation. Second, with a very small shaping interval and small
packets, the VIP4-80 is highly loaded. Finally, while the router correctly
maintains the limit of Bc bits which can be sent out during the shaping
period, high peaks within the shaping period can occur. These short peaks, 
however, should not cause congestion resulting from aggregation.
</p>

<seznamknih>
<kniha id="nanokernel">
  D.L. Mills, P.-H. Kamp: <i>The Nanokernel</i>, Proc. of Precision Time
  and Time Interval (PTTI) Applications and Planning Meeting, Reston, VA,
  November 2000.
</kniha>
<kniha id="qofis2001">
  Sven Ubik, Vladimír Smotlacha, Sampo Saaristo, Juha Laine: <i>Low-Cost
  Precise QoS Measurement Tool</i>, submitted to QofIS, the 2nd International
  Workshop on Quality of Future Internet Services.
</kniha>
<kniha id="gnuplot">
  Thomas Williams, Colin Kelley, et al.: <i>Gnuplot: An Interractive Plotting
  Program</i>
</kniha>
<kniha id="mgen">
  Brian Adamson, Sean Gallavan: <i>The Multi Generator (MGEN) Toolset</i>,
  <a href="http://manimac.itd.nrl.navy.mil/MGEN">http://manimac.itd.nrl.navy.mil/MGEN</a>
</kniha>
</seznamknih>

</zprava>

