Friday, May 30, 2008

Code: Library and Plugins

I just realized that the name of the blog has technology in it, and I have hardly mentioned code. The Realeyes project was originally started as a network Intrusion Detection System project. I have worked on several systems in which an attempt was made to design them modularly, but gradually, functions that were supposed to be generic incorporated application specific data and code. This increases the chance of creating errors when such a function is called by many other functions.

So I decided to create a library and then build the application on it. The library is called the Realeyes Analysis Engine. Applications are built on the library by creating plugin programs that call library functions. The first application had nothing to do with networks--it was just a series of random numbers that were organized according to the high order digits and then the low order digits were analyzed for patterns.

When I started writing the network IDS code, I found that I needed more control over some of the library functions, so I added hooks for the application. For any of you who have read about these so-called hooks but aren't sure what they are, they also go by the name of 'callbacks'. And what that means is that the library function calls an external function with predefined parameters. The name of the function may be specified or a pointer to the function may be initialized, and I use both.

For example, the library's main function calls three functions that every plugin must include, even if all they do is immediately return:
  • local_plugin_init(): This allows anything that needs to be done before the parser runs to be handled

  • plugin_parser(xml_main_structure, xml_dom_tree): The XML file gets parsed for syntax by the analysis engine, then passes a Document Object Module (DOM) tree to the plugin which parses the values

  • plugin_process(): This is where the plugin does its main job
The plugin parser is particularly interesting. The analysis engine uses libxml2 to parse an XML configuration file and build a tree of the values in the configuration file. (And yes, I have used the Expat library, which implements the Simple API for XML (SAX). But SAX is simple only for implementing the library, not the application, and I would only use it if I was under serious memory constraints--which is not the case for most configuration files.) The DOM tree is read by the plugin parser. But the code to read the tree is a bit hard on the eyes, not to mention being a typo magnet, as this simplistic sample of getting a data value shows:

if (raeXML_NODE->xmlChildrenNode != NULL) {
value = xmlStrdup(XML_GET_CONTENT(raeXML_NODE->xmlChildrenNode));

So, the analysis engine library includes several macros that make writing the plugin parser look a little like a Basic program. It also includes macros to name the function and parameters. Et voila, writing a plugin parser is as easy as this:

Process attr_value ...
Process data_value ...
If you are thinking this should be contributed back to the libxml2 project, I don't think it would work. The Realeyes project only uses XML for configuration files with very simple syntax. Meanwhile, libxml2 handles the full range of XML capabilities. However, if you know someone who is working on an application and is annoyed (or annoying) about having to parse XML configuration files, point them to the Realeyes Analysis Engine subversion repository, where they can look at the XML parser source and include files.

The other type of hook/callback uses a function pointer. The reason for this is to make it optional. If the pointer is not initialized, then the callback function is not called. An example of this is the special handler for after an Analysis Record is built:
    This pointer must be initialized to a point to a function:
      raeRecordHandler raeEventRecordHandler

    The function may have any name, but must accept the specified parameter:
      erh_function (raeAnalysisRecord *rh_record)

The analysis engine library does all of the heavy lifting. Once the parsing is complete, plugins do not allocate any memory, unless there are specialized functions coded (I am particularly happy with the memory management, but that is a discussion for another post). There are library functions for managing multiple streams of data, matching values at specific locations in headers or strings in data, and building records for information that has been matched with a rule, just to name a few.

This makes the plugins fairly lightweight--the largest, the Action Analyzer, is just over 1,000 lines of code, most of which is parsing the options for collecting statistics. In fact, the statistics collection code, in a separate source file, is more than twice as large at over 2,200 lines of code, which gives a sense of how little the plugins have to do.

I gave a presentation to my local Linux User Group, and afterward one of the attendees talked to me about using it for some mathematical analysis he is involved in. I don't know if it will work for him, but I would be very happy if the library is found to be useful for other projects. The library is capable of handling multiple TCP sessions (35,000 simultaneously is the current peak), which are about as random as streams of data get, so it will certainly handle streams that are controlled. The output is created by a relatively simple plugin, which means it can be customized as much as necessary.

Later . . . Jim

Tuesday, May 27, 2008

Testing, testing

I have been testing the Realeyes IDS at a local college for about 8 months now. However, it took almost 10 months of planning before the testing began. I contacted a half dozen sites and only the one felt capable of letting me set up the system in their environment. Looking at this from the point of view of the sites contacted, I consider myself quite lucky to have had this response. My advice to anyone looking for a similar situation is:
  • Be clear: Write a letter or email that explains what you would like to do concisely and give some background on yourself to build credibility.

  • Be professional: In your first face to face meeting, have a presentation that covers the main points of your project, explain how your project might provide the site something in return, and dress conservatively (I wore my best suit). We have even signed an agreement which spelled out what was to be provided by both parties, including hardware, software, and time. Now that we have been working together for a while, the relationship has become less formal, but initially I believe the formality gave them confidence that they would not be creating problems for themselves.

  • Don't get frustrated: The site I am working with handled several major tasks while I was waiting for them to provide me with a single host and a single connection to the monitoring port of a switch. To me, it wasn't asking for much, but now that I have been there, I can see that for them it was quite a bit of time and planning.

  • Be gracious: I have found opportunities to thank the people I am working with, including management and sysadmins, at least twice a month. I have also pointed out how well run their operation is (and it is, so I'm not just brown-nosing).
What they provided for me was a 733MHz CPU with 2Gig of RAM, a 100Mbps network interface, and a 16Gig hard disk. The most important issue for me was the memory, so 2Gig of RAM is fantastic. As far as everything else goes, I would rather test on moderate equipment and make my code more efficient to get adequate performance than have the platform hide problems.

Of course, the least of my worries was about problems being hidden. In the first couple of weeks, the IDS failed within less than 1 hour. First it was buffer space issues, next it was bugs, then it was buffer space issues again. But after a month, I had it running long enough to actually detect a few incidents. Then, over the next several months I cleaned up formatting issues, improved the user interface, and fixed more IDS bugs.

In the meantime, I have been able to give the site some feedback on their environment. I have not created a lot of rules, but there is a fair amount of variety to those that are in use, and some were in response to information that they wanted to collect. The most interesting ones, for all of us, have been:
  • Non-http traffic on port 80: This reported very few hits, but the ones it did report gave them enough info to correct the use of a couple of applications

  • Brute force FTP logins: This just gave them more detail than what they were already seeing in logs, but at least it showed that none of the attempts were successful

  • Activity at unusual times: By monitoring Email servers between midnight and 5:00 am, we have seen a few cases of spam from site hosts, and some other activity that led to them discuss policy

  • Invalid TCP options: We are working on this one, stay tuned

  • Rules for specific exploits: Between the low number of these and the sysadmins' efforts to harden their site, there have not been any serious (or from my perspective, spectacular) hits on these, which from their perspective is good news
Overall, I count this experience as a huge success. And the best news is that a couple of months ago, I finally fixed the main buffering issue. So while there are still some bugs, the system has been stable enough to run for days in a row (as opposed to hours). And now it is reassembling and analyzing as many as 35,000 simultaneous TCP sessions.

I never would have reached this point without the help of the people at the test site. So once again, I want to say, "Thanks."

Later . . . Jim

Monday, May 19, 2008

Oh bother!

I have spent the past couple of weeks preparing and presenting a demo for my local Linux user group and working on a user manual, not to mention the routine things like checking on the pilot project and mowing the grass. During this time, I have found a few bugs in some of the download files and replaced the ones with errors. None of them are catastrophic, but they do affect some functionality.

This makes me wish for a full-time QA team. When I was working for a company that sold a TCP/IP stack for IBM mainframes, we had one and I came to appreciate what they did, even though they made me rework several fixes. The time and effort it takes to create automated tests, or worse run them manually, is huge.

Realeyes is fairly complex, in that it contains:
  • Source packages

  • Debian packages

  • Configuration scripts for both packages

  • C programs

  • A database with SQL scripts for building the schema

  • A database interface written in Java

  • A user interface written in Java

  • Configuration file definition forms in the user interface

Thanks to the pilot project, I get to see most of this in use regularly. And the changes I am making these days are mostly in response to issues that come up there. But it would be a fine thing indeed to be able to have an actual QA team.

Later . . . Jim

Friday, May 9, 2008

Starting to Realeyes

I just put up the new website and the latest downloads for the Realeyes project at SourceForge. The downloads can be found from the Downloads page, which also explains system requirements, etc.

Realeyes is a project to analyze large streams of data, and specifically, to build a network Intrusion Detection System. I have worked with computer networks for over 20 years, including 4 years of maintaining a TCP/IP stack for IBM mainframes, and 5 years of network security analysis and tool development. The security analysis team I worked with has a great reputation in certain government circles, but finds it ever more challenging to keep up with the exponential growth of nefarious activity.

The work I was doing was to integrate the security tools to improve analyst efficiency, but I came to believe that we really needed to start from scratch. Unfortunately, there was no money in the budget for that, so yet another FOSS project was born. The original project is named RenaissanceCore, and it was uploaded to SourceForge in Sept., 2005. We finally released downloads that sort of worked in July, 2007, and again in August.

In Sept., I started a pilot project at a local college, and over the past several months focused exclusively on that. This has resulted in tremendous improvements in reliability and performance, which justify new downloads. The original name was a compromise of a compromise that was hastily chosen. And since I have been flying solo on the development since Sept., I decided that this would be the right time to change the name. Which I did. And I created a website with (IMHO) interesting and useful information about the project.

The latest downloads have been tested as well I can test them with my limited resources. But they should install cleanly and the system should run reliably, with very acceptable performance even though it is still in Beta.

I will be discussing the project, interesting security related discoveries, coding, and probably several other things in this blog. But for now, you can go to SourceForge to Realeyes.

Later . . . Jim