Friday, May 30, 2008

Code: Library and Plugins

I just realized that the name of the blog has technology in it, and I have hardly mentioned code. The Realeyes project was originally started as a network Intrusion Detection System project. I have worked on several systems in which an attempt was made to design them modularly, but gradually, functions that were supposed to be generic incorporated application specific data and code. This increases the chance of creating errors when such a function is called by many other functions.

So I decided to create a library and then build the application on it. The library is called the Realeyes Analysis Engine. Applications are built on the library by creating plugin programs that call library functions. The first application had nothing to do with networks--it was just a series of random numbers that were organized according to the high order digits and then the low order digits were analyzed for patterns.

When I started writing the network IDS code, I found that I needed more control over some of the library functions, so I added hooks for the application. For any of you who have read about these so-called hooks but aren't sure what they are, they also go by the name of 'callbacks'. And what that means is that the library function calls an external function with predefined parameters. The name of the function may be specified or a pointer to the function may be initialized, and I use both.

For example, the library's main function calls three functions that every plugin must include, even if all they do is immediately return:
  • local_plugin_init(): This allows anything that needs to be done before the parser runs to be handled

  • plugin_parser(xml_main_structure, xml_dom_tree): The XML file gets parsed for syntax by the analysis engine, then passes a Document Object Module (DOM) tree to the plugin which parses the values

  • plugin_process(): This is where the plugin does its main job
The plugin parser is particularly interesting. The analysis engine uses libxml2 to parse an XML configuration file and build a tree of the values in the configuration file. (And yes, I have used the Expat library, which implements the Simple API for XML (SAX). But SAX is simple only for implementing the library, not the application, and I would only use it if I was under serious memory constraints--which is not the case for most configuration files.) The DOM tree is read by the plugin parser. But the code to read the tree is a bit hard on the eyes, not to mention being a typo magnet, as this simplistic sample of getting a data value shows:

if (raeXML_NODE->xmlChildrenNode != NULL) {
value = xmlStrdup(XML_GET_CONTENT(raeXML_NODE->xmlChildrenNode));

So, the analysis engine library includes several macros that make writing the plugin parser look a little like a Basic program. It also includes macros to name the function and parameters. Et voila, writing a plugin parser is as easy as this:

Process attr_value ...
Process data_value ...
If you are thinking this should be contributed back to the libxml2 project, I don't think it would work. The Realeyes project only uses XML for configuration files with very simple syntax. Meanwhile, libxml2 handles the full range of XML capabilities. However, if you know someone who is working on an application and is annoyed (or annoying) about having to parse XML configuration files, point them to the Realeyes Analysis Engine subversion repository, where they can look at the XML parser source and include files.

The other type of hook/callback uses a function pointer. The reason for this is to make it optional. If the pointer is not initialized, then the callback function is not called. An example of this is the special handler for after an Analysis Record is built:
    This pointer must be initialized to a point to a function:
      raeRecordHandler raeEventRecordHandler

    The function may have any name, but must accept the specified parameter:
      erh_function (raeAnalysisRecord *rh_record)

The analysis engine library does all of the heavy lifting. Once the parsing is complete, plugins do not allocate any memory, unless there are specialized functions coded (I am particularly happy with the memory management, but that is a discussion for another post). There are library functions for managing multiple streams of data, matching values at specific locations in headers or strings in data, and building records for information that has been matched with a rule, just to name a few.

This makes the plugins fairly lightweight--the largest, the Action Analyzer, is just over 1,000 lines of code, most of which is parsing the options for collecting statistics. In fact, the statistics collection code, in a separate source file, is more than twice as large at over 2,200 lines of code, which gives a sense of how little the plugins have to do.

I gave a presentation to my local Linux User Group, and afterward one of the attendees talked to me about using it for some mathematical analysis he is involved in. I don't know if it will work for him, but I would be very happy if the library is found to be useful for other projects. The library is capable of handling multiple TCP sessions (35,000 simultaneously is the current peak), which are about as random as streams of data get, so it will certainly handle streams that are controlled. The output is created by a relatively simple plugin, which means it can be customized as much as necessary.

Later . . . Jim

No comments: