<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE zprava SYSTEM "techrep.dtd">

<zprava cislo="1/2006" jazyk="en">

<nazev>Clock synchronization in Cesnet monitoring infrastructure</nazev>

<autor>Vladimir Smotlacha</autor>

<datum>31.12.2005</datum>


<h1>Introduction</h1>

<p>Monitoring is a great challenge of the Cesnet~2 network in order to provide
detailed information for an effective network control and measurement. 
We have designed and installed a monitoring infrastructure in main network
nodes, i.e. nodes connected by two and more backbone lines. The
infrastructure consists of set of PC boxes designed to provide both active
and passive measurement. Boxes can host PCI-X passive monitoring adapters
Combo6 (or alternatively DAG) in order to provide full line speed monitoring.
</p>



<h1>Matter of clock synchronization</h1>

<p>Measurement of some network parameters  depends on clocks of sites of
measurement.  An example is one-way delay: the measurement is simple in
principle, as it is enough to know sending time and receiving time of the
packet. However, the transport capacity of links and network nodes is
permanently increasing and the value of one-way delay is decreasing.
Currently, one-way delay is in the order of hundreds of microseconds in LANs
and several milliseconds across the Cesnet~2 network. Meaningful measurement
of one-way delay requires an evaluation of timestamps with the accuracy for
one order in magnitude better than the expected delay. The accuracy of
measurement depends on relative accuracy of both clocks that assign
timestamps.  Such dependence on the clock is called 'time sensitivity'
</p>

<p>Another reason for providing an accurate time is the assigning of unique
timestamps to captured packets.</p>


<h1>Methods of clock synchronization</h1>


<p>The simplest method is to utilize NTP via network. Despite the accuracy,
which is in the order of hundreds of microseconds, such solution has a
drawback: when the clock of box located in the backbone node A is
synchronized by a NTP server located in another node B, the synchronization
is influenced by the backbone path between A and B. As a result, any
measurement depending on synchronized clocks A and B,
can't give objective information about the path (or its part) between A and B.
</p>

<p>To avoid this problem (which is a NTP native feature), we have to provide
all boxes by an accurate local source of time. We decided to install in each
node a GPS receiver disciplining clock of the measurement box.  According to
our previous experience, we installed Garmin receivers in majority of
localities. The receiver has a PPS (pulse per second) signal output and the
producer declares, that the PPS signal accuracy is better than 1
microsecond.</p>


<h1>Utilized hardware</h1>

<h2>Measurement boxes</h2>

<p>All measurement boxes that are currently in operational status 
are identical:</p>  

<ul compact="1">
<li>mainboard Supermicro,</li>
<li>PCI-X bus,</li>
<li>Xeon 3 GHz,</li>
<li>1 GB RAM,</li>
<li>70 GB SCSI disk.</li>
</ul>


<p>Such hardware configuration is powerful enough for installation of Combo6
or DAG cards up to 10~Gbps. Depending on the required capacity and
throughput of disk system, more disks can be installed.</p>


<h2>GPS receiver options</h2>

<p>Depending on local condition, we used several solutions how to provide PPS
signal for measurement box. In most localities, a NMEA signal (containing
the label of current  second) is also available.</p>


<dl>
<dt>Praha</dt>

<dd>the measurement box is provided by the signal of high-quality Trimble
Acutime~2000 which is already distributed in the computer room and the 
laboratory. We have two distribution units in cascade, each with 8 RS-232
outputs (Figure 1).</dd>


<obr src="card">PPS distribution unit</obr>


<dt>Brno, Plzen, Ceske Budejovice and Olomouc</dt>

<dd>the GPS receiver has been installed on the roof in each locality. The
solution is described in more detail in a later chapter.</dd>


<dt>Ostrava</dt>

<dd>a precise geodetic GPS receiver Topcon~GB-1000 has been already installed in
computer room in Ostrava. Therefore, we decided to utilize its PPS signal
for the clock synchronization.</dd>

<dt>Liberec</dt>
<dd>
a specific situation in Liberec node resulted in impossibility of GPS
receiver installation on the roof. The measurement box is synchronized only
by a NTP server located in Praha.</dd>
</dl>


<h1>GPS receivers installation</h1>

<p>There was unavoidable to install new GPS receivers in Brno, Plzen,
Ceske Budejovice, and
Olomouc. We used Garmin GPS~35 or GPS~18, that provide signals (PPS and serial) 
in the RS-232 data format. However, RS-232 is unsuitable for many tens
of meters long cable so some kind of conversion is necessary. We tried to
find a professional solution to prolong serial line up to hundreds of meters
and we chose adapters LD232 manufactured by Papouch s.r.o. The conversion unit
also supplies power for GPS receiver and provides galvanic separation
between measurement box and the receiver. Picture 2 shows all
installation set: the receiver Garmin GPS~18, the first adapter in
weather-proof box, and the second adapter. Both adapters are connected by
standard UTP(STP) cable with 4 pairs of wires. When a structural cabling
system is available, it can be utilized.</p>

<obr src="pic1">GPS receiver and link link adapter</obr>


<h1>Software issues</h1>

<p>Linux is our primary choice of  operating system.  
Precise clock synchronization by a PPS signal is not possible with neither 
2.4.x nor 2.6.x  standard kernel as it requires two kernel hacks:
the PPS API and nanosecond resolution of internal clock. As nanosecond clock
resolution has not yet been implemented in 2.6.x kernels, we utilize 2.4.29
kernel with the 'nanokernel' patch.</p>

<p>We installed NTP~4.2.0, the last stable version. Garmin receivers as well as
Acutime~2000 are configured to operate in NMEA output mode, which is
processed by a NTP package driver. The NMEA format provides label of current
second and state of the PPS signal. In Ostrava, no label information is
provided by the GPS receiver, so the label is determined by a main Cesnet
NTP server.</p>

<p>Specification of measurement software is out of scope of this report.</p>


<h1>Accuracy of clocks</h1>

<p>Following graphs show typical 24-hour record of the synchronized clock
offset of all sites of monitoring infrastructure. The offset was reported by
the 'ntpd' process. We can conclude, that only exceptional offset values
exceed the the interval +/- 30 us. A special case is Liberec, when we did
not succeed to install the GPS receiver and therefore the box is
synchronized only via the network.</p>

<p>Also or monitoring boxes are identical, we can see differences of clock
offset values. The clock offset is influenced by:</p>

<ul compact="1">
<li>parameters of PPS signal,</li>
<li>parameters of crystal oscillator,</li>
<li>variation of ambient temperature (cooling system, air condition,
...)</li>
</ul>

<p>As declared PPS signal accuracy of Garmin GPS is better than 1~us, the
influence should be negligible. However, we see similar graphs with offset
spikes in all localities with Garmin GPS~18 or GPS~35 (i.e. Brno, Plzen, 
Ceske Budejovice, and Olomouc). In case of Trimble Acutime~2000
(Praha) or Topcon~GB-1000 (Ostrava) are these spikes significantly smaller
or missing at all. This phenomenon should be verified and than clarified.</p>


<obr src="perf_praha">Clock offset - Prague</obr>

<obr src="perf_brno">Clock offset - Brno</obr>

<obr src="perf_plzen">Clock offset - Plzen</obr>

<obr src="perf_cesbud">Clock offset - Ceske Budejovice</obr>

<obr src="perf_olomouc">Clock offset - Olomouc</obr>

<obr src="perf_ostrava">Clock offset - Ostrava</obr>

<obr src="perf_liberec">Clock offset - Liberec</obr>



<h1>Ongoing tasks</h1>


<p>We are now ready to install two other boxes in Usti nad Labem and
Hradec Kralove but we are waiting for the manufacturing of LD232
adapters that was delayed for several months. We also have to improve
accuracy of clock synchronization in Liberec.</p>
</zprava>

