<?xml version="1.0" encoding="utf-8"?> 
<!DOCTYPE zprava SYSTEM "techrep.dtd" >
<zprava cislo="8/2005" jazyk="en">  
<nazev>High-level Modelling, Analysis and Verification on FPGA-based
  Hardware Design
    <footnote>The work presented in this paper has been supported by
the FP5 projects No. IST-2001-32603 and IST-2001-32404, and the CESNET
project 02/2003</footnote> 
</nazev>
<autor>Petr~Matoušek, Aleš~Smrčka, Tomáš~Vojnar</autor>
<datum>November 11, 2005</datum> 


<h1>Abstract</h1>

<p>
Implementation of network components in hardware is a trend in advanced
high-speed network technologies. Incoming packets can be analysed in fast
programmable cards using FPGA. Designing such a system is not easy and
requires a detailed analysis. In this paper, we discuss analysis and
verification of a non-trivial system-the network monitor and analyser Scampi
<cite href="Router" /> that has been developed within the Liberouter
project. The Scampi analyser is implemented in FPGA on a special
add-on card. It analyses packets incoming with the speed of several
Gbps. We created an abstract model of the design and verified several
safety properties. Our main task was to check if there is a risk of
buffer overflow and how to set the length of buffers to prevent
this. First, we made a timed analysis by hand and then we used automated 
tools - model-checkers Uppaal~<cite href="Uppaal00" /> and TReX
<cite href="TREX01" />. In
the following text, we show how to model such a complex system and some results
of our analysis and verification. We also propose a framework for modelling
and analysis of systems where the throughput of requests, their speed, and the
length of buffers are important. The proposed models can be reused 
when verifying and analysing of systems of the given kind.
</p>

<h1>Introduction</h1>

<p>
One of the current trends in the design of communication systems and embedded
applications is to move from a simple one-chip solution with a powerful
software in the background to a complex hardware implementation. Programmable
hardware is a suitable technology for such a move. FPGA-based hardware can
provide a similar functionality of a system as software implemented on general
microprocessors. In comparison to a software solution, programmable hardware is
very fast - it is common to communicate in gigabits per second. 
</p>

<p>
The use of programmable hardware is one of the cornerstones of the Scampi
project which the work presented here is a part of. Scampi is a European
project focused on the development of a scalable monitoring platform for the
Internet. Its aim is also to promote the usage of monitoring tools for
improving services and technologies. The project develops a network adapter
with an extensible monitoring architecture to support a secure and programmable
shared monitoring infrastructure. 
</p>

<p>
One of the key requirements put on Scampi and in general on the equipment used
in the communication infrastructure - is that it should be reliable and robust
to work even under conditions like DoS attacks. In the Scampi project, both the
traditional approaches ensuring a correct functionality - namely simulation
and testing - are used. Moreover, to achieve an as high as possible confidence
in the design, formal verification is extensively used too. The motivation is
that simulation and testing can reveal some errors in the design, but they can
never prove it to be correct. Moreover, even if formal verification is not
always successfully completed, it may find many bugs that tend to be different
from the ones found by simulation and testing which is due to a different way
of dealing with the state spaces of the considered system.
</p>

<p>
In Scampi, we apply formal verification on two different levels: (1) on the
level of the source code, i.e. VHDL code, and (2) on the level of suitable
abstract models. Its advantage is the precise coverage of the real system. The
disadvantage is that the system is too complex to be checked in this way
completely - we are only able to verify some selected components. To check the
key features of the entire system, we work with abstract models. 
</p>

<p>
The source code verification of Scampi is described, e.g.,
in~<cite href="TR-4-2004" />. In this paper, we discuss our experience
from high-level modelling, analysis, and verification of Scampi. Here, a
special focus is put on checking the throughput of the system - the
optimal length of buffers, timing, and  reliability - response time,
an appropriate behaviour under extreme conditions such as a DoS attack.
</p>

<p>
The Scampi design includes many different VHDL components - input buffers,
preprocessing units, a lookup processor, output buffers, a statistical unit,
etc. We divide the model of the system in three kinds of model components: a
model of the environment (generators), a model of buffers (queues, channels),
and a model of executive units. We show how the different model components may
be constructed and in the case of generators and buffers, we obtain general
templates that may be reused in different models of systems of the considered
kind.
</p>

<p>
We further discuss formal verification that we have performed over the obtained
model using the Uppaal~<cite href="Uppaal00" /> and
TReX~<cite href="TREX01" /> model checkers. Using TReX, we have
performed parametric verification allowing one to automatically synthesise
certain constraints on the parameters of the system necessary for its correct
behaviour. We also present some manual reasoning that we have initially
performed over the system and which we confirm and extend by automatic methods.
</p>

<p>
<b>Plan of the paper.</b> We start with an overview of the Scampi design.
Then, we show how we analysed the system manually. The next section concerns
modelling of the system - the creation of an abstract model with respect to the
properties we want to verify. We show models of basic parts of the system - the
environment, buffers, and executive units. Next, we discuss the results of a
fully automated analysis using Uppaal and TReX. Finally, we close by some
concluding remarks.
</p>

<!-- ==================================================================== -->
<h1>Scampi Design</h1>
<!-- ==================================================================== -->

<p>
Scampi is a name of the network adapter working at 10Gbps speed. The system
design consists of several components. In this paper we are interested in
front part of the system - input buffers, preprocessing units (header field
extractor), and searching units (lookup processor, processing units). 
</p>

<p>
Scampi adapter reads data from one input port and distributes them into four
independent paths working in parallel. Every path consists of several
executive components and several buffers. Here, we shortly explain  these
components and their functionality. A first part of the Scampi adapter is
depicted at <a href="#XFP">Figure</a>.
</p>

<obr id="XFP" src="combo-2xfp" size="10cm">The first part of the Scampi design</obr>

<p>
When an IP packet comes, it is forwarded to one of the paths and marked with
a time stamp generated by a Time Stamp Unit (TSU). The packet waits in
IO_BUF buffer (which is FIFO) to be processed by a Header Field Extractor
unit (HFE) at first. This component divides the packet into two parts - a
header and a body of the packet. The IP header is translated into a unified
header - an adjusted structure that stores the main information about the
packet including a source IP address, a destination IP address, MAC addresses,
port numbers, a VLAN id, etc. Unified headers are stored in UHFIFO
buffers. The body of the packet is stored in a Packet FIFO (PFIFO) and flows
by another path through the system. The searching core of the Scampi is a
Lookup processor component.   
</p>

<!-- ==================================================================== -->
<h2>Lookup Processor</h2>

<p>
Lookup processor (LUP) as showed in <a href="#FigLUP">Figure</a> is the core
of the Scampi design which classifies packets according to rules defined over IP
headers. LUP has four input buffers with unified headers (UHFIFO) and one
output. An output data is formed by an identification of the packet and a type
of the packet. Every matching rule is encoded into two parts. The first part is
a packet matching (parallel searching) encoded into the TCAM memory (Content
Address Memory), the second is a  sequential searching SSRAM memory performed
by Processing Units (PU). In this work we don't examine process of building 
recognition rules and matching a unified header - if interested in, see~<cite href="LUP" />.
</p>

<p>
Recognition works over the unified header - one part of the header is matched
in TCAM (we call it a MP part) and the other part is used for the sequential 
searching (a SP part). Unified header has the size of 1024 bits, the part for 
the parallel searching has 256 bits. CAM block sequentially loads MPs from all 
four inputs (UHFIFOs) and builds a block of data for the matching in the TCAM
memory. TCAM compares the data according to matching rules. This is done
simultaneously for all four inputs. We can say that there are four parallel
paths in the system. Every path has one UHFIFO, one FIFO of results (RFIFO)
and one PU. 
</p>

<obr id="FigLUP" src="lup">Structure of Lookup Processor</obr>

<p>
After TCAM comparison, there is a sequential searching done by PU. PU loads
the result from the matching process and uses it as an entry point of the
sequential program stored in the SSRAM. When the sequential searching
program is done, the PU takes the result from it. 
</p>

<p>
Every PU has to wait for a multiplexer MX to receive data from the
Lookup processor. The result of PU processing represents a numeric
information what to do with a packet - e.g. increment the counter of
the dangerous packet occurrences, forward the body of the packet to
the software, copy a input packet to every output (broadcast), or simply
release the packet from the Scampi system. 
</p>

<!-- ==================================================================== -->
<h1>First Analysis of the System</h1>
<!-- ==================================================================== -->

<p>
Here we will present our result of Scampi analysis made by
hand - a calculation of minimum packet throughput through the system. We
are only interested in the throughput caused by the Lookup
processor. Minimum throughput can be calculated from the number of
results per second and the minimum size of the packet. This
computation can be done because the system is synchronised by one
global clock. We must calculate the delay for the matching and the
sequential searching to obtain the result frequency. There are two
parts working in parallel in Lookup processor (<a href="#tLUP">Figure</a>). The
first is a matching part (CAM Block and TCAM) and, the second a sequential  
searching (four Processing Units).   
</p>

<!-- ==================================================================== -->
<h2>Rule Matching Analysis</h2>

<p>
Unified header stored in UHFIFO has the size of 1024 bits. The
matching part of the size 256 bits is loaded to the TCAM memory. TCAM
works in parallel for four input buffers. CAM Block loads four 256 bit
parts from four UHFIFOs and prepares them for the TCAM memory. 
We will concentrate in our computation only on one UHFIFO now. Data bus
width between UHFIFO and CAM Block is 32 bits, and we have to load 256
bits. So, it takes eight ticks to load the matching part of the unified
header from one input (UHFIFO). 
</p>

<obr id="tLUP" src="timedscheme">Lookup Processor with timing</obr>

<p>
In Scampi design two 32-bits blocks are loaded at the same time from
two UHFIFOs while other two UHFIFOs are waiting. Loading process from all
four ways takes exactly 16 time slots together. A data from the CAM
Block is loaded to the TCAM within the same time. It must be loaded 1024
bits via 64-bits data bus. This 1024-bits block is formed by four 256-bits
parts of four input buffers that will be processed in TCAM in parallel.
</p>

<p>
The loading into TCAM takes also 16 time slots. When the data is
loaded into the TCAM, the matching process starts. TCAM matches four
results, each of them takes 12 time slots. All four results are matched within
48 time slots. While doing this, the CAM Block is stopped and waits for the
matching process. Altogether the matching process with data loading takes 64
time slots. That means every 64 time slots CAM block provides four results for
four input queues.  
</p>

<p>
After this, four Processing Units process the second part of the
matching - a sequential searching. The sequential searching is done by 
execution of a program stored in the SSRAM. The program consists of several
instructions. It takes four time slots to load an instruction from the
SSRAM memory. The execution is done during next four time slots, so the
minimum delay of execution of one instruction takes eight time slots. With
eight instructions (at most) PU provides the same throughput like CAM
Block and TCAM. That is why there are supported programs with one
instruction at least and eight instruction at most. The four-paths output
is serialised by the Multiplexer, so that every 64 time slots it
forwards four results - one result takes an average 16 time slots.  
</p>

<!-- ==================================================================== -->
<h2>Throughput Analysis and Consequences</h2>

<p>
The minimum throughput of the Scampi system caused by the Lookup
Processor can be now calculated for the smallest supported packets
which have size of 64~bytes. The system is set to frequency 50~MHz,
time slot takes 20~ns. Rule matching is done within 16 time slots
which is 320~ns. The minimum throughput of the system is
<i>(64*8)/(320*10<sup>-9</sup>) = 1.6~Gbps</i>.
</p>

<p>
Since the Processing Unit provides the same speed as the TCAM
memory, the size of a result buffer between CAM block and PU is at most
one. We can not eliminate the whole buffer because of the difference  
in TCAM and PU timing. What cannot be easily done by hand is finding
the optimal length of UHFIFO buffers or to detect deadlock system
because of buffer overflow. 
</p>


<!-- ==================================================================== -->
<h1>Modelling the Scampi Design</h1>
<!-- ==================================================================== -->

<p>
Scampi design is a complex system that cannot be fully modelled and
analysed. Some parts are not interested from point of view of analysis of
the throughput of the system. In this section we present several models of
components that are important for further analysis of the system. We don't
present here a full design. Here, we concentrate on basic parts of the design
that are typical for many hardware designs - queues, packet generators,
processing units etc.  
</p>

<p>
In our approach we recognise three basic types of components that occur in
some form in many complex systems:
<ul>
  <li>FIFO queues (or buffers, channels) - they can be deterministic
  or non-deterministic, stochastic. We can model lossy queues, delayed
  queues  etc.</li>
  <li>executive components - e.g., processing units (lookup processors,
  preprocessing units), multiplexors etc.</li>
  <li>environment - e.g., generators of incoming requests (packets), output
  units (waiting for a result)</li>
</ul>
</p>

<!-- ==================================================================== -->
<h2>Modelling FIFO Queues</h2>

<p>
A FIFO queue (buffer, channel) is a  typical abstract data structure that
contains a sequence of stored data. FIFO queue is used to represent
transmitting channels, intermediate buffers between a processing unit and a
memory etc. We can have lossy queues where some data may be lost. These are
important to model communication channels. There are delayed queues where data
are delayed. Then we can model parametric queues where a symbolic constant
value - a parameter - defines the maximum length of the queue. In parametric
verification we are interested in values for which the parameter satisfies
expected properties. This approach is called parameter synthesis
<cite href="AHV93" />. In our abstract model we do not observe a type of data,
so we need only one value - the number of items in the queue.
</p>

<p>
<b>Simple FIFO queue.</b> Simple FIFO queue can be easily modelled in
abstract way consistent with the goal mentioned above as a finite
automaton with three states - empty, nonempty, and full. There is a
special variable <i>count</i> that represents a number of items currently
present in the queue. If we add a new item in the queue, counter <i>count</i> is
incremented. If it is removed from the queue by reading, counter is
decremented. The length of the queue is a constant value <i>FIFOsize</i>. A model 
of simple FIFO queue is depicted in~<a href="#squeue">Figure</a>. Notation
<i>x?</i> means
reading the queue into a variable x, <i>x!</i> means writing <i>x</i> in the queue
(handshake communication between two processes).
</p>

<obr id="squeue" src="sfifo2">Model of a simple FIFO queue</obr>

<p>
Notice a special state named <i>throwing</i>. This state is a so called 
observer - a
state that observes examined behaviour of the system. Here we are mostly
interested in queue overflow. If queue overflows a system moves to the state
<i>throwing</i>. In verification we define property "buffer will never overflow"
as a logical statement "the system never enters the state <i>throwing</i>",
<i>AG &vlnka; throwing</i> in CTL logic.   
</p>

<p>
<b>FIFO queue with delays.</b> Delayed FIFO is a model of FIFO with time where
it is guaranteed that every request is delayed at least <i>DELAY</i> time units
before released. Delayed FIFOs are modelled using Timed Automata
<cite href="Alur99" />. In <a href="#dqueue">Figure</a> there is a delayed FIFO queue with clock
<i>y</i>. Transitions that release an element of the queue are augmented with time
constraints allowing to release an item only if the constraint is
satisfied. The time of release is non-deterministic and must be greater then
<i>DELAY</i>, i.e., <i>y &gt;= DELAY</i>. If we want to define precise maximal delay of
the queue we have to add an invariant <i>y &lt;= MAX_DELAY</i> to each state where
data is released.
</p>

<obr id="dqueue" src="dfifo2">Model of a delayed FIFO queue</obr>

<p>
<b>Lossy FIFO queue.</b> Lossy FIFO queues can be timed or untimed. In
<a href="#lqueue">Figure</a> we consider untimed lossy FIFO queue. 
</p>

<obr id="lqueue" src="lfifo2">Model of a lossy FIFO queue</obr>

<p>
The model includes a special counter lossy period <i>lp</i> that counts a number
of successfully sent elements. If the count reaches constant <i>NLOSS</i> it
generates a loss that means it decrement the number of items of the queue
without sending it to the next process. In our model we have NLOSS equal to 5,
i.e., every fifth element is lost. Using this abstraction we can model queues
where we know the rate of losses. Distribution of losses is uniform, that means,
we don't model chunks of losses.  
</p>

<!-- ==================================================================== -->
<h2>Modelling Executive Components</h2>

<p>
Executive components by contrast to the buffers can not be modelled using some
<i>pattern</i> of the model. While creating the models we have to follow the
purpose of the verification process. In the Scampi system design we are
interested in timing of the components because the number of items in
the buffers depends only on the delay of the executive components.  
</p>

<p>
If executive components have an accurate timing plan, we distinguish two kinds
of stated in the model. The first is an urgent state that we use for
observers. The second type is a state which models delay of the system. This
state (e.g. <i>sending_result</i> on <a href="#CAM">Figure</a>) has an
incoming transition which resets a clock (<i>t := 0</i>), an invariant that
defines a time constraint over clocks for the state, and an outgoing transition
constrained by condition on a clock (<i>t == delay</i>).  
</p>

<p>
<b>Modelling CAM block and TCAM model.</b> The simple analysis of
comparison of delays of CAM block and TCAM shows that matching process in TCAM
takes more
time then preprocessing in CAM block. Our model, on <a href="#CAM">Figure</a>,
has two waiting states: <i>filling</i> when data from a UHFIFO is loaded and
the TCAM memory is filled with data (constant <i>FILL</i> is set to 16 time
slots), and <i>matching_result</i> state which waits for one result from
TCAM (constant <i>RESULT</i> is set to 12 time slots). The result is
subsequently sent via channel <i>rfifo_in</i> to the next component. When
the fourth result is forwarded, CAM block and TCAM start loading new four
parts of the unified header.   
</p>

<obr id="CAM" src="cam">CAM block and TCAM model</obr>

<p>
<b>Modelling Processing Unit.</b> Processing units (<a href="#PU">Figure</a>)
 are
the executive components that perform sequential searching for an
appropriate matching rule. The process is done by executing 
instructions stored in the SSRAM memory. Each PU waits for its time
slot to access SSRAM and loads instructions to be executed. Loading and
execution take together eight time slots (state <i>executing</i>). We
expect to have at most eight instructions to be executed, so the
transition is allowed when <i>t==seqsearch</i>. Then the result is sent to
multiplexer. 
</p>

<obr id="PU" src="pu">Processing unit</obr>

<p>
<b>Multiplexer.</b> Multiplexer subsequently checks every PU in every time slot
whether it already has a result to be forwarded. In our model we
abstract out of subsequent checking every PU because of complexity of
the model. However, we are able to compute the maximum delay caused by
subsequent polling - it is 4 time slots delay for four PUs. 
</p>

<p>
The model of Multiplexer (<a href="#MX">Figure</a>) has two interesting parts.
 The
first part reads the results of PUs (states <i>start</i>, <i>first_out</i>,
 <i>waiting</i>, <i>has_result</i>), the second part is a observer (formed by
  other states) which
is used to check the throughput and the delay of the system.  
</p>

<obr id="MX" src="mx">Model of the multiplexer</obr>

<p>
The forwarding part has two states in which the system is waiting for the
result in a PU (<i>start</i> and <i>waiting</i> states). If the PU has the
 result, it
is read via channel <i>out</i>.
</p>

<p>
For the performance checking, we are interested in the throughput of the system
which can be calculated from the size of an incoming packet in bits
(<i>size</i>),
the average delay of rule matching process in time slots (<i>delay</i>) and
time of a
time slot in seconds (<i>ts</i>): <i>throughput = size/(delay*ts)</i>. We
obtain the
average delay from the ratio <i>(delay for x results)/x</i>. To compute delay
 for
<i>x</i> results we can use the counter of outgoing results and a clock. When
 counter
reaches the maximum (user chosen value e.g. 1000), system fires the transition
to the <i>slow</i> state (the observer) only if the clock value is greater than
allowed delay. The performance checking of the throughput is based on manual
finding of the minimum value of delay i.e. model checking running several times
to find appropriate delay value where system does not reach the <i>slow</i>
 state.
</p>

<p>
The <a href="#wave">Figure</a> shows the dependence between the throughput,
calculated as average number of requests, and real incomming requests. An angle
of a dashed line expresses the throughput calculated from the delay of
wave of the results (<i>wave</i> is user chosen value of number of results).
</p>

<obr id="wave" src="wave">Wave of outgoing results</obr>

<!-- ==================================================================== -->
<h2>Modelling Environment</h2>

<p>
Environment of the system has to be added to the model in order to check how
the system responds to the environment. In systems with queues like our model
of packet analyser is, we have to model an input - a generator of incoming
requests (packets) and an output - a receiver of the result. 
</p>

<p>
We are not able to successfully model and verify the model with real
generator because of expressivity of modelling language and state explosion. We
define following properties of the generator:
<ul>
  <li>distribution of incoming requests - random, uniform, given by a
  specific   function</li>
  <li>time constraints on new requests - time span between two consequent
  requests, minimum and maximum delay between requests</li>
  <li>request properties - a size (minimal, maximal, average), a kind</li>
  <li>extreme conditions - flooding of requests, losses on the input</li>
</ul>
</p>

<p>
We are not aware of a verification tool that is able to express and
successfully verify above written condition completely. However, we can create
a set of generators such that every generator models only one of these
properties and then we can verify model with every item of this set. This kind
of abstraction provides one view on the system for a specific condition. Then,
if the property is satisfied for the given environment (a generator) we can
change/enriched environment (change kind of generator) and verify the property
under different input/output conditions. 
</p>

<obr id="generator" src="generator2">A model of generator</obr>

<p>
In Scampi project we created a simple model of input packets as <a href="#generator">Figure</a> shows. This model generates incoming packets for
every input
buffer under extreme conditions, that is, a new packet is generated
immediately when the previous packet was accepted. This is described by
sequence of states <i>sending0</i> to <i>sending3</i> when a new packet is
 generated to
every queue. We use so called committed states (with 'c' mark) where no time
progress is possible. That means in one moment (no time progress between) four
transitions are passed and four new packets are generated in every input
buffer.  
This behaviour performs an attack from the network (flooding), and we probe how
the system responds to a such condition.
</p>
 
<obr id="receiver" src="sreceiver">A model of a receiver</obr>

<p>
Similar analysis can be applied on the model of a receiver. In our case of
studying behaviour of simple FIFO queue we created a model of the receiver
that reads data whenever they are available, see <a href="#receiver">Figure</a>.
</p>
 
<p>
The receiver has three states - <i>start</i>, <i>waiting</i> and
 <i>release_state</i>. It
models a process that reads data every <i>read_time</i>. After reaching
<i>read_time</i> the control moves to <i>release_state</i> where the process
 stays
until a transition is allowed. This state is urgent - defined by letter
'u' - that guarantees the outgoing transition is taken immediately after a
corresponding guard is satisfied. In comparison to committed state 'c' time
progress is allowed there.  
</p>

<p>
In case of delayed FIFO queues it is important to mention that the
receiver should not put time constraints on transitions before data
are read. The receiver should read data from delayed FIFO queue
whenever they are available. Otherwise, deadlock will appear.
</p>

<!-- ==================================================================== -->
<h1>Verified Properties</h1>
<!-- ==================================================================== -->

<!-- ==================================================================== -->
<h2>Throughput Verification</h2>

<p>
This section discusses verified properties of the model obtained by model
checker Uppaal~<cite href="Uppaal00" /> on machine P4-2.8~GHz with 512~MB. Properties
are  written in form of CTL invariance formulas - remember <i>A</i> means over
 all paths, <i>G</i> means globally (always). 
</p>

<ul>
  <li><i>A G &vlnka; deadlock</i> - model checking time was 10.8~s.
     There is never deadlock in the system, which means that the system
     will always work. Model satisfies this property.</li>
  <li><i>A G &vlnka; (UHFIFO0.throwing or UHFIFO1.throwing or UHFIFO2.throwing
     or UHFIFO3.throwing)</i> - model checking time was &lt; 0.1~s. This
     property shows us whether the unified header is ever thrown away from the
     UHFIFO. Model not satisfies it which means that the input stream is
     faster than the output stream (this is the first result of the
     performance checking).</li>
  <li><i>A G &vlnka; (RFIFO0.throwing or RFIFO1.throwing or
     RFIFO2.throwing or RFIFO3.throwing)</i> - model checking time 2.3~s.
     This property is similar to the one above. We use this to get needed
     minimum size of RFIFO. This property still holds even if the size is set
     to only one item. It is obvious since the Processing Unit has the same
     speed as TCAM.</li>
  <li><i>A G &vlnka; MX.slow</i> - model checking time 2.2~s.
     This property expresses the throughput checking that the <i>wave</i>
     of the results from the multiplexer is sent at time corresponding
     to the average delay for one result. This property was
     satisfied for the constant <i>wave</i> set to 1024 and the constant
     <i>results</i> set at least to 16 time slots (which is 320ns at 50MHz).
     Today, system design supports IP packets with size between 64B
     and 2kB. If the input for the whole system is the stream of the
     smallest packets (64B), the throughput caused by Lookup processor can be
     theoretically 1.6Gbps at most <i>(64*8/320~ns)</i>.</li>
</ul>

<!-- ==================================================================== -->
<h2>Parametric Verification of a FIFO Queue</h2>

<p>
Parametric model of the FIFO queue has three processes - a generator, a
receiver and an simple FIFO queue. We considered three parameters: the
maximal length of the queue 
<i>FIFOsize</i>, the rate of incoming request <i>uh_time</i> and the rate of
reading data from the queue <i>read_time</i>. We used the model of simple
FIFO queue (see <a href="#squeue">Figure</a>) with an initial constraint
<i>FIFOsize &gt; 0</i>. Then, we obtained by TReX following results: 
<ul>
  <li>If <i>uh_time &gt;= read_time</i> there is not <i>throwing</i> state in
  the configuration space. The queue never overflows and the length of
  the buffer is not important. Verification takes only few seconds
  because there are only five symbolic configuration states.</li>
  <li>For <i>uh_item &lt; read_time</i> analysis did not finished. So
  we decided to set a fixed size of the queue <i>FIFOsize = 3</i> and then we
  tested whether the buffer overflows. It happened for <i>4~*~read_time~=
  uh_time</i>. In this case we know the length of the queue and are
  interested in time constraints at <i>throwing</i>. In other case we set a
  specific input condition (the rate of incoming requests) and we verify for
  what kind of time constraints the queue of given length will be overfilled.
  </li>
</ul>
</p>

<p>
Parametric verification of the FIFO queue is undecidable problem
<cite href="AHV93" />, however for specific initial constraints we can obtain
some results. These results help us to find a configuration of the
system where the verified property is not satisfied.
</p>

<!-- ==================================================================== -->
<h1>Conclusion</h1>
<!-- ==================================================================== -->

<p>
In this paper we introduced our experience with verification of the
behaviour of the Lookup processor and its close neighbourhood of the
Scampi project. We created models of several components of the system
and analysed its behaviour. We were especially focused on the
throughput of the system and a risk of buffers overflow. At first we
did a manual analysis of the system, then we used model checkers
Uppaal and TReX. Using Uppaal we were able to prove correctness
of the systems - deadlock detection, buffer overflow for specific time
conditions, maximal latence of the system etc. Using parametric
verification by TReX we observed constraints on the buffer length,
the rate of incoming packets and the rate of reading a buffer. We
detected several states where throwing appeared and found time
relations for these states. 
</p>

<p>
We noticed a typical structure of the hardware design and defined
three kinds of basic components that are present nearly at every
hardware design - FIFO queues, executive components and environment of
the system. We proposed patters for modelling and analysing these
components and discussed their behaviour.
</p>

<p>
However, there are still many challenges for the future research. We
noticed that system with global clocks and synchronized events can be
easily modelled and analysed even in untimed model checkers. This
makes possible to joint untimed verification of some parts of the
system and timed verification of important events. Another challenge
in the Scampi project is to set a common framework for analysis on the
level of the source code and high-level abstract analysis. At the
moment we did this as independent tasks.
</p>
  
<!-- ==================================================================== -->
<seznamknih>
  <kniha id="AHV93">
	R. Alur, T.A. Henzinger, and M.Y. Vardi.
	Parametric Real-time Reasoning.
	In <i>ACM Symposium on Theory of Computing</i>, pages 592-601, 1993.
  </kniha>
  <kniha id="Alur99">
	R. Alur.
	Timed Automata.
	In <i>Proceedings of 11th CAV</i>,
	volume 1633 of <i>LNCS</i>, pages 8-22, 1999.
  </kniha>
  <kniha id="LUP">
	D. Antoš, P. Zemčík, and J. Kořenek.
	Lookups in FPGA hardware accelerator for IPv6 routers.
	<i>CESNET Technical Report 2002.</i>
  </kniha>
  <kniha id="TREX01">
	A. Bouajjani, A. Collomb-Annichini, and M. Sighireanu.
	TREX: A tool for reachability analysis of complex systems.
	In <i>Proceedings of CAV</i>,
	volume 2102 of <i>LNCS</i>, pages 368-372, Springer Verlag, 2001.
  </kniha>
  <kniha id="TR-4-2004">
	J. Holeček, T. Kratochvíla, V. Řehák, D. Šafránek, and P. Šimeček.
	How to formalize FPGA hardware design.
	<i>CESNET Technical Report 4/2004.</i>
  </kniha>
  <kniha id="Router">
  	J. Novotný, O. Fučík, and D. Antoš.
	Project of IPv6 router with FPGA hardware accelerator.
	In Cheung P.Y.K., Constantinides G.A., and de~Souza~J.T., editors
	<i>Field-Programmable Logic and Applications, 13th International
	Conference FPL 2003</i>, pages 964-967, Berlin-Heidelberg, Springer,
	2003.
  </kniha>
  <kniha id="Uppaal00">
	P. Pettersson and K.G. Larsen.
	UPPAAL2k.
	<i>Bulletin of the European Association for Theoretical Computer
	Science</i>, 70:40-44, February 2000.
  </kniha>
</seznamknih>

</zprava>
