OTRS: CSIRT WorkFlow Improvements
CESNET technical report 8/2010
Pavel Kácha
Received 26.10.2010
Abstract
CSIRTs (Computer Security Response Teams) are the natural response to the widespread internet threats. Many of them have grown of small, but focused groups of people, by streamlining and expansion of what they have been already doing as part of their IT administrative work. Formalisation of the procedures and workflows brings the need for specialised tools, helping with incident categorisation, sanitization and general workflow. Also, special nature of incoming report emails introduces a new issues to otherwise well-known spam and backscatter fighting methods. As well as low level know-how, important part of security team practices are also higher level statistical analyses for pinpointing potential threats and trends. This report documents approaches to these problems and describes their implementation as modifications and supportive applications for Open Ticket Request System (OTRS), as well as experience from usage in the real world medium-sized security team.
Keywords: OTRS, CSIRT, security, incident, ticket management, issue management, metadata, Bayesian analysis, backscatter, statistics
1 Introduction
Before we get to specifics of CSIRT team, let us first look over the life-cycle of a typical security incident report.
Once the report is received, its relevancy is assessed and, where necessary, additional information is requested. Next, reports are categorized according to the networks affected and forwarded to their respective administrators, after consulting internal databases or WHOIS information. The responsible administrator then communicates directly with the original complainant (if needed) and finds a solution. If everything goes fine, from this point onwards CSIRT acts only as a spectator and a recorder. According to the severity of the report, the relevant administrator responsible may be contacted and response requested in case CSIRT had not been informed about the resolution in time. Afterwards, the report is finalised and marked with the appropriate outcome.
We have already created tools to support and semi-automation of certain steps, as documented in [1].
A range of tools for issue management exists (see [2] for an overview of suitable ones), however, none of them directly supports the incident report handling work-flow.
2 OTRS overview
OTRS (Open source Ticket Request System) is GPL licensed, Perl based trouble ticket (or issue management) system, used as the basis for our applications.
2.1 Tickets
The ticket is composed of a series of articles – textual updates to its state, usually e-mails. The ticket keeps a complete history of the changes made to it, either by human interference or through some automatic means. The ticket can be split into two, possible independent, cases, and more tickets relating to one case can be merged.
Aside from the usual data, the ticket can bear an arbitrary name/data pairs. This metadata can be unalterably named by the administrator, or left changeable for the storing of any information that seems to fit in the time of the creation of the article.
2.2 Queues and states
Tickets are organized into several queues that can be created by the administrator and connected to particular users with defined rights. The typical scenario in the security team could be two queues: incoming one which would be managed by the first line of basic-trained personnel who are able to solve or delegate via mail the basic types of incidents. The remaining ones would be moved into another queue, managed by specialists and highly-trained staff who can then then focus only on important or unusual incidents.
During its lifetime, each ticket goes through series of states. A state is property completely orthogonal to the queue and can represent important turning points in its history – external update, timeout or closing reason.
3 Discussed problems
Let us enumerate problems our team (and most likely any incident response team) is facing, which we are going to elaborate on.
Incident categorisation. Classification as per the incident type (and consequently its severity) forms the basis for statistics and trend analysis.
Incoming traffic sanitization. Spam, virus and backscatter are well known and documented fields of expertise. However in a specific case of incident reports, usual statistical and heuristic methods face unexpected challenges. An incoming incident report itself may contain a sample of spam, virus or unsolicited bounce, and often gets classified as such as a whole. Additional measures are therefore necessary.
Statistical visualisation. Reliable incident source authority identification and automatic incident classification gave us interesting data source for further statistical analysis to be able to compare the incident solving hit rate of our members and constituency, and to review incident type proportion rate trends.
4 Automated incident categorization
Each incident bears its characteristic features and can be categorized as a well known type. Categorization can be managed by human intervention, however if we could achieve a reliable machine classification beforehand, we would get a valuable clue on how to process a particular incident. Categorization is also necessary for further statistical and trend analyses.
Similar and a well studied problem is spam identification – free form mail text is analysed to decide whether message is allowed to reach the destination mailbox or whether it is malicious or unsolicited commercial message. Statistical methods, based on Naïve Bayesian probability analysis which are used for the purpose of spam identification, constitute a two-way decision process.
In general, these methods generate a weighted histogram of words (or of n-tuples of words, or of larger meshes as in the case of the hidden Markov model), based on previous learning history. Histogram values undergo a statistical cleaning and the combined representative value (based on particular method, it can be some kind of average or median value) determines the spam rate of a message.
However, there is nothing inherently two-way in these methods. One of the first Bayes statistics based filters, Jason Rennie's ifile [3], supports n-way filtering. By means of custom code we inserted Bayes classification into the incoming queue. The analyser output is then added as an associated header, and later it is used directly as an incident category in OTRS.
4.1 Learning
The success of statistical methods stands and falls with quality of learning. Our current work-flow guarantees that at the most one day old incidents metadata are already reviewed and corrected by human operator. To eliminate human slips, we use all tickets older than two days as the basis for building up the ifile's database.
Database is automatically rebuilt on daily basis based on this already sanitized real life data. Textual bodies of messages are taken from OTRS database along with their incident category tag and fed accordingly into ifile. There is no need to keep separate heaps of learning data, and bayes data are thus up to date with current forms of incident report messages.
The service on CESNET networks address range shows the need for a manual review of the data for 6.7 % of incident reports only.
4.2 Incident taxonomy
We use a simplistic approach to incident taxonomy. As exhaustive enumeration is not necessary, only incident types of nowadays highest proliferation have been used. As several incident types traces overlap (for example spam is a part of phishing), we declared a rule of the most fitting modus operandi – incident type which contains incident symptoms completely, fits.
Spam – usual unsolicited commercial email.
Bounce – mail backscatter (usually caused by spam).
Phishing – spam is used as advertisement for a website which imitates some well known institution in order to gain its clients' personal information (bank account credentials, credit card information).
Pharming – similar to phishing. More sophisticated DNS attacks are used to cover the redirection of the client to a fraudulent site.
Copyright – copyright infringement, usually by means of peer-to-peer networks.
Trojan – malicious code on a server attempting to attack server clients and spread on (by defaced web page or active probing).
Malware – malicious code on a client workstation, for example keylogger, rootkit or malware as a part of botnet. Trojan and Malware classes partially overlap, in many cases they can be in fact the same code. However we are trying to distinguish the situation where primary function is to spread and attack another machines (Trojan), while Malware mainly collects user data, sends spam, etc.
Probe – probing servers and networks. Portscan, portsweep, SSH (or other service) scan or unsuccessful attempts to crack service.
DOS – simple or distributed. Again it partially overlaps with a probe but DOS's primary aim is denying the service, not a compromise.
Crack – generally any other compromise.
Other – anything we are not able to classify into previous categories. Meant as a fallback category, which should get reviewed regularly, and the results of which should get incorporated back into this taxonomy.
Unknown – it is not possible to clearly state the incident type from report (usually some additional clarification from the complainant is needed).
5 Incoming traffic sanitization
The world of email nowadays is widely infected with unsolicited commercial emails, backscatter bounces and various kinds of worms and viruses. Some kind of filtering of incoming mails is therefore necessary to keep amounts of messages to be handled manageable.
However, an incident handling mailbox may face expectable problems – incident report messages themselves can contain samples of spam, bounce or viruses. Usual antispam and antiviral methods fail and some kind of additional treatment is necessary.
5.1 Unsolicited bounces
In the case of mail bounces (mail delivery report messages) we have achieved a significant advantage. We know we should only get bounces to messages originated by us. Therefore we are able to keep track of ticket identification numbers (which are injected into subject lines of each message sent). No bounce message (identifiable by an empty Return-Path header line) not containing existing ticket identifier younger than two months (to keep machine work low) anywhere in the subject line or body is allowed to enter the system.
We face a problem here – the format of mail delivery messages [4] is specified very vaguely. There are strict requirements to some of message headers, but subject and body of the message are completely free form. Some mail delivery agents (mainly certain qmail versions) do not attach enough of the original message to keep the ticket identifier. However according to our analysis conducted on nearly seven thousand of bounce messages shows only 0.5 % of such messages which is very acceptable loss ratio. Anyway, the situation with such stubborn agents has generally been improving.
5.2 Spam
This section is unfortunately short – we have yet to find a reliable method to distinguish spam reports and spam. The most efficient method so far consists of a manually selected (based on a vast amount of incident reports so far) subject-keyword whitelist, which causes messages containing them get classified as legitimate mail by antispam software. Messages which contain any of these words or phrases in subject line bypass spam analysis and are allowed to enter the system directly.
The list is maintained in the form of a regular expression:
/abuse mail|abuse-mail|abuse of|abuse report|abuse spam|e-mail spam|multiple spam|received spam|report abuse|reported spam|reporting spam|returned spam|spam:|spam abuse|spam complaint|spamcop|spam from|spam mail|spammails|spam mails|spammer|spamming|spam-rbl|stop the spam|ube:|ube-uce|ube\/uce|uce:|uce-ube|uce\/ube|ube from|uce from|\[uce\]|\[spam\]|spam received|uce complaint|ube complaint|phish|fraud/
The list gets updated regularly.
Last 6 months of mail logs show that 18 % of messages got “saved” from antispam measure by our whitelist, however 7 % were false negatives, which had to be identified and sorted out by human (see Figure 1).
![[Image]](fig01.png)
Figure 1. Spam whitelist results.
5.3 Viruses
All mail is handled and sanitized for viewing by OTRS. OTRS is a web based application, so security precautions before rendering arbitrary email content into a browser are necessary. The content is completely stripped of scripts and HTML tags, thus mere viewing is secure. The only risk remaining is for the operator to open mail attachments directly, however this can be addressed by a policy or necessary tools (antivirus, anti-malware) can be installed on operator workstations, should the used platform need it.
6 Statistics
Reliable incident source authority identification and automatic incident classification gave us interesting data source for further statistical analysis to be able to compare the incident solving hit rate of our members and constituency, and to review incident type proportion rate trends.
![[Image]](stats-states.png)
Figure 2. Example of generated resolution trend chart.
OTRS has some basic statistical module, however its functionality is limited to basic time/state/queue based counts. As the basic data model of OTRS is nicely transparent, fetching more complex data is just a case of straightforward use of conveniently crafted SQL queries. Again, we used our own Python module with subsequent processing of results and formatting them into a visually and factually convenient output via pychart library. We were also able to add some data from other sources (annotate institutions with their whole names instead of RIPE shortcuts) or apply some more visually convenient elements.
![[Image]](stats-types-pie.png)
Figure 3. Example of generated incident type volume pie chart.
7 Conclusions
Finding a tool which would be an added value to the incident response team and would not have any significant drawbacks is by no means an easy task. As it turns out, no ticket management tool is readily usable for small or mid-sized teams. Even the most advanced projects include nontrivial management or programming requirements.
Our OTRS ticketing system installation currently holds around 3800 tickets, not counting spam and unsolicited bounces. The OTRS interface is used by nine core team members as well as nine Monitoring centre operators to manage incident reports for several hundreds of assigned network ranges.
Automated statistical incident type deduction works better than expected – only reasonable fraction of reports needs human correction.
Detection of unsolicited bounces works flawlessly. We are not aware of any loss of valid delivery message on our side.
Our handmade whitelist worsens the efficiency of the antispam filter; however it is the price to pay for lowering the false positives rate to nearly zero.
Statistical tools have shown as an interesting source of information and as a way to visualize trends and efficiency of combating the electronic crime.
All the tools are separated from main codebase, and applied into workflow by means of standard OTRS configuration options, so porting to new versions should be straightforward and in most cases effortless.
According to the configuration and development experience as well as users' observations, the work invested into the customizations and the code is paying off, and the course set has worked well so far.
8 Acknowledgment
This work is supported by the research intent MSM6383917201.
References
| [1] | KÁCHA, P. OTRS: Tool for Security Incident Reports Management. Technical report 12/2007. Praha: CESNET, 2007. |
| [2] | KÁCHA, P. OTRS: Issue Management System Meets Workflow of Security Team. Technical report 4/2006. Praha: CESNET, 2006. |
| [3] | RENNIE, J. D. M. Improving Multi-class Text Classification with Naive Bayes, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, September 2001. |
| [4] | MOORE, K.; VAUDREUIL, G. An Extensible Message Format for Delivery Status Notifications. RFC 3464, IETF, January 2003. |