Monitoring of network traffic geographical characteristics
CESNET
technical report number 29/2007
also available in PDF,
PostScript, and
XML formats.
Sven Ubik, Pavel Bucek
30.11.2007
1 Abstract
We designed a system that combines IP address geolocation with passive network monitoring and Netflow exporter to produce interesting geographical characteristics of network traffic. Such characteristics are useful for planning the capacity on international lines, peerings with other networks and finding optimal locations for servers and network caches.
Keywords: passive monitoring, flow records, geolocation
2 Motivation
Network traffic monitoring is now an integral part of network infrastructure. Router interface utilisation can be monitored using SNMP. We can also tap a link by passive monitoring and compute various statistics on captured packets. Netflow records generated by routers or created from captured packets can also be analysed. The above methods provide information about network load, its time evolution and possibly distribution into protocols (from Netflow or passive monitoring), but they do not give any information about location of traffic sources and destinations.
For network planning and problem troubleshooting, it is important to know the distribution of incoming and outgoing traffic into geographical locations. In other words, where are largest origins and destinations of traffic in our network and what are the trends in this distribution. Geographical distribution of network traffic is particularly useful for planning the capacity of international lines, peerings with other networks and finding optimal locations for servers and network caches.
3 IP address geolocation
Packet capture and Netflow records provide source and destination IP addresses. An IP address does not by itself carry any information about a geographical location. IP address geolocation is a process of finding geographical location of an IP address is now increasingly demanded particular for business applications where a company is interested in locations of their potential customers browsing the company websites. Currently, several freely available and commercial systems for IP address geolocation are available.
The most notable IP address geolocation sources are given in the following sections.
3.1 Internet Registries (IR)
IP addresses are allocated by IANA (Internet Assigned Numbers Authority) to Regional Internet Registries (RIPE NCC for Europe, Middle East and former USSR, ARIN for North America, LACNIC for Latin America and the Caribbean, AfriNIC for Africa and APNIC for Asia and Pacific) for further assignments to customers. Databases of Internet Registries include geographical information entered by users who are assigned the blocks of IP addresses. These databases can be searched by the whois protocol or web interface. However, this geolocation method is slow because queries are resolved remotely and the rate of whois queries from individual IP addresses is intentionally limited to protect against e-mail address harvesting. The geographical information in Internet Registries is also not very reliable, because it depends on the information that users supply about their locations when requesting blocks of IP addresses.
3.2 Commercial geolocation databases
These products include, for example, GeoIP from MaxMind, IP2Location from Hexasoft Development or Quova. Most databases are available in several versions with different accuracy (ranging from country to zip code) at different prices. Less accurate versions are sometimes available free of charge for evaluation. All databases offer local API, which allows fast resolution of many IP addresses. We did not perform detailed comparison of these databases, but they seem to be quite similar. We found that GeoIP City offered good accuracy at reasonable price and we selected this product for our further development.
4 Geographical characteristics
Geographical characteristics that we can obtain depend on the available information on network traffic. That is whether we use standard Netflow records, extended IPFIX records (with some additional information about flows) or full packet headers from packet capture. We decided to start with standard Netflow records and explore extended IPFIX records later. Determining geolocations for packet headers would not scale to high traffic volumes.
The interesting characteristics that we can get by combining IP address geolocation with standard Netflow records include the following:
-
Number of bytes and packets sent to or from each country
-
Number of flows to or from each country
-
Average bytes and packets in a flow sent to or from each country
-
Average throughput of a flow sent to or from each country
-
Average geographical distance between source and destination of a flow
In addition to computing average values for flows, we can also compute distribution of values.
5 Geographical presentation
We can present results in a tabular form or in standard statistical graphs (such as bars, pie charts, etc.). But it can also be convenient to show geographical distribution on a map.
One possibility to present some values on a map are Geographical Information Systems (GIS). Many commercial and freely available GISes are available. However, most such systems are rather complex with many functions not used by our application. This complexity makes them difficult to install and configure.
We used a very simple approach - we took an empty contour map (showing just borders between countries) of the world and of the continents and we created a PHP script that uses PHP graphics library to fill in areas of individual countries surrounding a specified point up to the borders with a specified color. We need to specify a point inside a country area to fill it.
Surprisingly, finding freely available blank contour maps was not simple. We found suitable maps at About.com:Geography. Geographical coordinates of the capital cities of all countries in the world can be found in Wikipedia[capitals]. We then prepared a script that translates geographical coordinates to the corresponding pixel on a map, according to the map projection. Additional points were necessary for countries with non-contiguous areas.
6 System architecture
The system architecture is shown in Figure. We use a DAG card to capture packets. We chose nProbe Netflow exporter because we found that it has highest throughput from all software Netflow exporters that we tested [netflow-test]. Geoflow application is the main part of the system. It processes Netflow records generated by nProbe, uses GeoIP database to resolve geographical locations of flow sources and destinations, computes various network traffic characteristics and stores them into a MySQL database.
User interface is implemented as a set of PHP scripts. Each script can generate some type of graph from data in MySQL database on its output. The scripts take argument to specify start and end time, requested network traffic characteristics and various options. The URL to the script with encoded arguments can then be simply inserted in the <img src="..."> element on a web page.
7 Example output
Several sample graphs are included here. A bar chart showing incoming traffic distribution among specified number of top sources is shown in Figure. The same characteristics in a pie chart showing portions from the total traffic is illustrated in Figure. Geographical distribution in Europe is shown in Figure, with 100 % being the sum of traffic from all sources. It is also possible to create a graph with 100 % being traffic from the most active source, which is convenient when distribution among countries is more even so that a broader spectrum of colors is used to differentiate them. And finally a time evolution of traffic from a specified country is shown in Figure.
8 Conclusion
We combined IP address geolocation with passive network monitoring and Netflow exporter to enable computation of interesting network traffic geographical characteristics. These characteristics are useful for planning the capacity of international lines, peerings with other networks and finding optimal locations for servers and network caches.
We tested performance of our system with a hardware packet generator. On a PC with 2.33 GHz Xeon processor, the GeoIP database alone can resolve geographical locations of approx. 150000 IP addresses per second. The geoflow application can run in multiple threads, with each thread processing approx. 12800 flows per second. Therefore, we can process approx. 50000 flows per second on a quad-core PC and 100000 flows per second on a eight-core PC, which is sufficient even for highly loaded links.
In our future work we plan to extend the system to use certain extended IPFIX characteristics, such as short-term dynamics of network capacity used by individual flows.
References
| [capitals] | List of capitals by country (2008, January 16). In Wikipedia, The Free Encyclopedia. Available online. |
| [netflow-test] | Ubik S., Halák J.: Performance evaluation of Netflow exporters., CESNET Technical Report, under preparation. |