G3 System User Interface Prototype
CESNET
technical report number 9/2005
also available in PDF,
PostScript, and
XML formats.
Tom Kosnar
23. 11. 2004
1 Abstract
The G3 system is the experimental network infrastructure monitoring system based on standard measurement methods (mainly SNMP) with non-standard measurement timing, specific data processing and hopefully advanced user interface. In the future it should be the successor of GTDMS-II monitoring system which we currently use in NREN of the Czech Republic but first of all it should help us to get new ideas and points of view on network infrastructure and the importance of various types of information about it.
2 G3 System
There are several main objectives which the G3 system tries to meet:
Large scale and continuous network infrastructure monitoring. Network administrators need to have one monitoring and visualization system for the whole network infrastructure in the best case. They also require to run continuous measurements for months and years. This assumption lead us to use relatively common technologies for primary data retrieval - mainly SNMP.
Measurement mechanisms and data processing should be able keep and visualize some dynamics of flows, events, processes - within the limits of the primary data retrieval method (SNMP) of course.
Data processing mechanisms which should ensure automated adaptability on real device reconfigurations. It means automated continuous and flexible mapping between technological (given by SNMP) and logical structuring (human point of view) of measured devices.
Convergence to human understandable information - this is the crucial factor of the G3 system design. The user interface is from the network administrators point of view the most important part of any system - the quality of such systems is always viewed through the user interface options and functionality. I've tried to incorporate all main specific features of other G3 system parts (measurement, data processing) into the user interface prototype (even in the basic form) to get the early feedback and have chance to correct my premises. This report should touch and describe the key features of the user interface prototype.
3 G3 System User Interface - Key Features
3.1 Navigation
3.1.1 Time window of interest
Everything what G3 system user interface does refers to time
window which is defined on "From" - "To" basis. Time values can be
set up as absolute or relative. Relative values refer to
"now". Syntax and acceptable values corresponds with Time::ParseDate
Perl module. Some examples of typical setup follows:
| From | To |
|---|---|
| -6 day / -2 week / -12 month / -1 year | now |
| last mon 12:00 | last thu 7:00 |
| 2005/11/24 23:59 | 2005/11/25 4:12 |
| Jun 21 2005 12:00 | Aug 31 |
Table 1: Time window setup examples
3.1.2 Structure of the navigation tree
Each network device is usually perceived as a logical structure consisting of several groups of objects. For example groups of such objects like "System" or "Interfaces" are the most usual ones from common perspective. In this context there are several items/values within each group that can be understood as those which identify each object in the group. For example items like "description" and "IP address" make sense for "Interfaces" group. Users usually expect to be navigated with the help of some tree-like structure consisting of a hierarchy and lists of such identifiers. First problem in that case is, that virtual item "description" of an "Interface" in users' eyes usually consists of many different measured items (technological perspective) which values may be partially same or similar and need sophisticated processing. The second problem is, that users have their own ideas how should the structure of the navigation tree look like. Therefore the user interface prototype of the G3 system was designed and implemented so, that it enables to set up and interactively modify a template which defines the requested structure of navigation tree.
Figure 1: Navigation tree template example 1. (large image)
Figure 2: Navigation tree template example 2. (large image)
3.1.3 Basic visualization styles of the navigation tree
There are two typical styles of work with the navigation tree. People either prefer controlled expand&collapse tree behavior or the fully-expanded tree all the time. The first style is usually suitable for browsing all measured devices especially when the number of objects is rather high, the second one is generally used with some object filtering. Examples above shows the fully-expanded mode of navigation tree. The following example shows the expand&collapse mode.
Figure 3: Navigation tree in expand&collapse mode (large image)
3.1.4 Basic filtering of navigation tree objects
The G3 system is designed to provide large scale measurements of the network infrastructure. Therefore a huge number of measured objects can be expected (tens of devices, thousands of interfaces). Administrators need some filtering mechanism to find out all requested objects at once and quickly. Current implementation of filtering part is very simple but seems to satisfy most of typical requests. I've decided to implement multi-line manner of setting the filtering conditions (first attempt). Currently lines of that text area are understood as blocks to be bound with logical OR when constructing the filtering rule and each line consists of one or more conditions (divided by & character implying logical AND). These conditions are applied on all potential navigation tree objects - those which are currently visible as well as hidden. Identification of every object consists of several identifying values - they belongs to corresponding navigation tree items ("location group", "topology group", etc...). Filtering condition may be applied on any of them as well as selected ones. And last - conditions can be applied as substrings to match or as perl regular expressions. This part of user interface is currently an object of testing so no escaping of special characters is implemented.
Figure 4: Filter setup and condition example. (large image)
Conditions set up in the example above match all objects which contain either substring "nix2" or combination of substrings "geant" AND "pos" in values corresponding with any descriptive items. Conditions are applied as case insensitive with positive filter logic according to values of other switches (in the left side of the filtering setup window). Check-box "auto-select matching objects" may be set for automated marking of all matching objects and thus making their further visualization faster (i.e. on "one click").
3.1.5 Additional navigation features
There are many additional features that networks administrators may find very useful. They have one common attribute - should help to speed up the administrators' work. I've tried to implement some of them. They are located in the "Navigation tree parameters" window.
Let's start with the last parameter Object cache "time to live" in seconds. System has to traverse the whole database (appropriate part of stored data from each measured device) when it constructs the navigation tree for the first time. At first it creates an image of all measured objects and stores it into the cache in generic, widely usable format. This image is then used as a source for navigation tree construction. Navigation tree is constructed according to the actual parameters (given by the user - template for example). The source object image is kept in the cache for configured time interval and becomes an authoritative source for navigation tree construction until it expires.
When Suppress technological part of interface descriptions switch is on the system tries to omit the technological part of interface descriptions when the navigation tree is being build. In conjunction with internal aggregations (will be described later) during that process the number of visible interfaces goes down. Visible interfaces are those who have some additional identifiers - for example HIP configuration or specific description written by administrators. The rest of interfaces is merged and hidden behind the single description: "...hidden (tech. description only)". The differences can be seen in the examples bellow.
Figure 5: Technological part of interface descriptions enabled (large image)
Figure 6: Technological part of interface descriptions suppressed (large image)
Suppress "out of time" objects switch may be disabled to show objects which are invalid from configured time window perspective (expired or "not yet born"). The result depends on actual template set up again. See examples below. The "out of time" object visible in the first example is lost in the second one because it is merged with another (valid from the the time point of view) object due to removal of "interface IP" item from the template.
Figure 7: Visible "out of time" object - grey check-box background (large image)
Figure 8: "Out of time" object lost after removing "interface IP" item from template - merged with description "GigabitEthernet1/0/2, Gi1/0/2, Hodonin" (large image)
Next four switches are attempts how to project additional summarized information obtained by ad-hoc database traversing into the navigation results. Providing this type of information aims to speed up the whole "network tuning" process. Operations needed to be performed behind such requests may slow down the system responses (IO and processing needs) when large number of objects has to be checked (configured time window size is the minor factor). Therefore the internal design limits these operations to those objects only which will become a visible part of navigation tree (including multiple objects merged into one label). Unless anyone needs to get overall summary information from the whole measured infrastructure it is highly recommended to use these switches in conjunction with navigation tree filtering. These options are intended to be outlets for future extensions. Here are some output examples.
Figure 9: Devices rebooted during configured time window (large image)
Figure 10: Interfaces with possible problems (large image)
3.1.6 Aggregations behind navigation tree
One of the most important side effects given by configurable navigation tree template is possible aggregation (or merging) of multiple objects into single navigation tree label. What does it mean? Let's imagine the following template:
system name
object type ; interface description
and the corresponding navigation tree result may look like this:
Switch 1
[System]
[IP]
[ICMP]
[SNMP]
[Interfaces], 10/100 utp ethernet (cat 3/5), 2/1
[Interfaces], 10/100 utp ethernet (cat 3/5), 2/2
[Interfaces], 10/100 utp ethernet (cat 3/5), 2/3
[Interfaces], 10/100 utp ethernet (cat 3/5), 2/4
Switch 2
[System]
[IP]
[ICMP]
[SNMP]
[Interfaces], 10/100 utp ethernet (cat 3/5), 2/1
[Interfaces], 10/100 utp ethernet (cat 3/5), 2/2
[Interfaces], 10/100 utp ethernet (cat 3/5), 2/3
[Interfaces], 10/100 utp ethernet (cat 3/5), 2/4
Let's remove interface description item from the template now. The new template will be:
system name
object type
and the updated navigation tree result will change to:
Switch 1
[System]
[IP]
[ICMP]
[SNMP]
[Interfaces]
Switch 2
[System]
[IP]
[ICMP]
[SNMP]
[Interfaces]
In the updated navigation tree the [Interfaces] labels point to all original interface instances. This aggregation implies the style of further visualization. Aggregation is not limited by the scope of each device. Let's remove the system name item from the navigation tree template in the next step. Template result will be:
object type
and the updated navigation tree:
[System]
[IP]
[ICMP]
[SNMP]
[Interfaces]
All original interface instances from all devices are aggregated and merged behind and pointed by the [Interfaces] label.
This mechanism may be effectively used in conjunction with navigation tree filtering in case you need to separate and visualize several interfaces which have specific purpose. You may have for example two or more independent lines (redundancy purposes) connecting your network somewhere and you may want to get the summarized traffic to and from your network across the border.
In our case you need to select all appropriate interfaces either manually or with the help of navigation tree filtering in the first step. Then you have to merge both instances into one - removing proper items from navigation template will help.
Figure 11: Interface selection - filtering condition was: substring match "nix2" or "nix4" (large image)
Figure 12: Interfaces merged under the [Interfaces] label - see navigation template changes (large image)
3.2 Measured Data Visualization
3.2.1 Dynamics, peaks etc...
One of my challenges while designing the G3 system was to implement such measurement mechanism which would provide some information about dynamics in the network infrastructure. It is impossible to go outside the limits given by primary data retrieval methods of course. But I think, there is still unused potential for that - in the manner how SNMP is usually used I mean. Therefore I've implemented measurement engine which operates with the dynamically changing time step of measurements. The configuration of it is device based and consists of a sequence (unlimited in size) of time steps. Each time step can be set up as exact or semi-random time interval. Beside this low and high water marks must be set up to define the range absolutely (per measured device again). Here is a description how the configuration may be set up (in understandable form):
minimal time step: 10 seconds maximal time step: 1200 seconds time step sequence: 12 seconds, short, random, medium, 10 seconds, random, long
Except the chance to get relatively short time peaks of values, there is another important effect of that measurement style - you can configure the whole measurement to be less aggressive than in the case of usual systems. The "real time step" of device measurements is one of the items which are processed by the system itself. These values can be also accessed via the user interface. Here is an example of it - long term view on time step of measurements.
Figure 13: Time step of device measurements - 4 months view (large image)
The example above demonstrates, that although the minimal time step of measurements configured for this device is 20 seconds, the long-term average is around 14-15 minutes.
The short-term - 12 hours view below shows the real course of time step value and the influence of the semi-random time step generation - similar, but not identical periodically repeating patterns.
Figure 14: Time step of device measurements - 12 hours view (large image)
The effects of dynamically changing time step of measurements and its influence on measured values can be seen in the following examples - see the differences between peak and average values.
Figure 15: Simple interface view example (large image)
Figure 16: IP throughput example (large image)
3.2.2 Aggregated views
Internal aggregation mechanism which was mentioned in relation with navigation tree construction has its own analogical support in the module providing the visualization. The aim is to present aggregated values (draw aggregated graphs for example) for all objects hidden behind a single navigation tree label and also keep the possibility to select interactively one or more objects from the aggregated set and visualize them in the same way. The aggregated values may be sums, minimums, maximums - whatever may make sense. To solve this task I decided to add a simple and optional sub-navigation. This sub-navigation may be controlled by a menu in the visualization module - see the next example.
Figure 17: Sub-navigation in aggregated view (large image)
Aggregated view may look like the following example - output containing values from all objects is generated by default.
Figure 18: Aggregated view results example - summarized values (different sub-navigation style) (large image)
You can also sub-select some of the objects - selection can be done on the ID basis:
Figure 19: Aggregated view results example - sub-selection - instance ID (large image)
...as well as on the value basis:
Figure 20: Aggregated view results example - sub-selection by value (large image)
In general the result may look like the next example - the mixture of both methods.
Figure 21: Aggregated view results example (large image)
3.3 Conclusion
This report tries to describe some of the key features of the G3 system user interface prototype which may be relatively new or unusual in commonly used systems designed for infrastructure monitoring which are based on SNMP or generally with great portion of counter type measured items. The network administrators testing and their comments will indicate whether I'm on the right way or not. Currently it seems that it might satisfy needs in the area of long term views on network infrastructure and contextual trends of its use and behavior.