Saturday, July 26, 2008

Modularity

Realeyes was planned with the intention of supporting IPv6, and now that the basic functionality is in place and (mostly) working, I am adding full support for it. This means several things, including:

  • Deploying a Realeyes IDS sensor on an IPv6 network

  • Analysis of IPv6 packets by the IDS application

  • Inserting IPv6 addresses and data in the Realeyes database

  • Defining rules for IPv6 addresses

  • Displaying IPv6 addresses and headers from the user interface


I will describe this in more detail in a later post, but for the moment, I need a motivational boost, so I decided to give myself a pat on the back.

The way that IDS functionality is added is through plugins that perform specific functions. At this point, data collection and high level analysis are essentially complete. I am adding a few new features in the IDS only to the session handler and low level analysis plugins. In Realeyes terminology, this is the Stream Handler and the Stream Analyzers.

The Stream Handler parses the IP header to find the session ID, and set the location of the payload, such as TCP or UDP headers and their data. I set up two hosts on my local network for IPv6 connections. On a Linux system, this is as simple as issuing an ifconfig command on both systems and, for ease of use, adding the remote host to the /etc/hosts file:

    host100> ifconfig eth0 inet6 add fec0:0:0:1::100/64
    host100>/etc/hosts:
      fec0:0:0:1::200 host200


    host200> ifconfig eth0 inet6 add fec0:0:0:1::200/64
    host200>/etc/hosts:
      fec0:0:0:1::100 host100



Next, I established SSH and FTP sessions between them. I had the code to find the payload written, but this was the first time I had tested it. It took a couple of tries to get it right because the way IPv6 extension headers work is a bit tricky. But when I actually captured some sessions, they were displayed correctly in the user interface.

I then added the code to display the IPv6 headers in the user interface. This formats the main header and each extension header using human readable field names followed by the actual values. Because the header type of each extension header is in the previous header, this was also a little tricky to get working.

The IDS Stream Handler is also where IP fragments must be reassembled. I was really happy that after copying the IPv4 reassembly code and changing all instances of "v4" to "v6" and handling a couple of variables differently, the IPv6 reassembly worked. This is an example of the value of modularity and the use of variables in code.

As an aside, I learned this lesson in my first year of Computer Science. The grad student instructor had us program an assembler that could handle about 8 operations, each with one operand of 6 characters. The next assignment was to modify the assembler to add a couple of operations, some of which took two operands, and the length of operands increased to 8 characters. Those who hard coded the original assignment had a lot of work to do (I had hard coded some things, but not all). And yes, the third assignment was more of the same.

I hated that guy (as did most of my classmates), because while the lesson was legitimate, laboratory exercises are not applications that will grow in the real world. As first year students, most of us did not have enough experience to develop programs with even that level of sophistication, and he did not recommend that we incorporate it in the assignment. And with other classes to deal with, getting a single program to work at all took time that was in limited supply. His excuse was, in the real world we would constantly be faced with changing requirements at a moments notice, and thus he was doing us a favor.

Having been out here for over 20 years, I have yet to run into anything remotely like what he described, although I do make it a point to be anal about getting adequate requirements descriptions up front. The people I have worked for wanted me to succeed, because it reflected on them. Some were more helpful than others, but I have never worked on a task where the requirements changed wildly, making it impossible to complete. Maybe I'm just lucky.

Incidentally, the way I tested reassembly was to use the latest version of netcat, which supports IPv6 sessions. I created a file that had over 8K of test data, and then sent it over a UDP session, which tried to send the entire file in a single datagram, and forced the TCP/IP stack to fragment it:
    server> nc -6 -u -l -p 2000 fec0:0:0:1::100

    client> nc -6 -u fec0:0:0:1::200 2000 < test.data

Anyhow, I am now working on analyzing the IPv6 extension headers and expect that to be done within a week. After some tidying up, I will be building a new package for download with IPv6 and several other new features. So, back to work.

Later . . . Jim

Wednesday, July 9, 2008

Introducing, Your Network

For the first three months after I started working on a government network security team I had nightmares. Then I decided that the grunts on the other side were probably in the same boat as we were--undermanned, underfunded, and misunderstood. Whether it's true or not, it let me sleep at night.

But the reasons I was so unsettled didn't go away. I tell people that most security analysts (and I include system and network admins who take responsibility for security in this group) are very good at the easy attacks, probably catching close to 100% of them quickly. They are pretty good at the moderately sophisticated ones--I would guess that well over 50% are eventually caught, although there is a fair amount of luck involved in a lot of those.

But what has always bothered me is that we don't have any idea of how much we don't know. Honeynets are easily avoided by the most sophisticated attackers, especially since they are focused on specific targets in the first place. What makes a target specific? Look at brick-and-mortar crime to get a sense of the possibilities. Based on this, I am guessing that because of the skills required, the problem is not rampant but is still significant.

So one of my goals in developing Realeyes was to provide a tool for security analysts to dig into the goings on in their networks. Those who are familiar with a network can tell when conditions just don't feel right, but what can they do with limited time and resources? After running Realeyes in a pilot program for over six months, it is beginning to deliver on its potential to be that tool.

The site's network admin had periodically seen large bursts of outbound traffic from several EMail servers in the early morning and suspected that there were some hosts being used to send spam. I defined rules to report all of the EMail server traffic between midnight and 6:00 am. Unfortunately, the school year ended shortly after the monitoring was set up and there has not been any of the traffic that it was defined to capture. However, there have been some interesting results.

Initially, there were a lot of reports of automatic software updates, so I used the exclude rule to prevent certain host or network addresses from being reported. It turned out that a couple of these servers were also listening on port 80, so there were a lot of reports of search engine web crawlers. These were eliminated by defining rules with keywords found in those sessions with the NOT flag set to cause the sessions to be ignored.

There were several 'Invalid Relay' errors issued by EMail servers, and some of the emails causing them were sent from and to the same address. At first I created a rule to monitor for the invalid relay message from the server. This captured a lot of simple mistakes, so I have started defining rules to capture email addresses that are used more than once. What I am trying to do is refine the definition of 'probes' which can then be used for further monitoring.

The further monitoring is accomplished using the 'Hot IP' rule. When a Hot IP is defined for an event, all sessions of the IP address (source, destination, or both) specified are captured for a defined time period after the event is detected, 24 hours for example. Using this technique, I have recently seen one of the probing hosts send an HTTP GET request to port 25, as well as some other apparent exploits.

This process is more interactive than the method used by most security tools. But by giving more control over what is monitored to those who know the environment best, I am trying to help build a better understanding of how much we don't know. And I hope that lets more good grunts sleep better at night.

Later . . . Jim

Monday, June 23, 2008

Loose Threads

Realeyes is a somewhat complex application, both in terms of the number of components that interact with each other (4), and the complexity of those components, in particular the analysis engine/IDS, but also the database and database interface. The analysis engine and IDS are written in C, while the database interface and user interface are written in Java.

When I was planning the design of the analysis engine, I knew from the start that there would be multiple processes running. That left me with the decision of whether to use threads or interprocess communication. I know from painful experience that writing thread-safe code is hard (the TCP/IP stack written in System/390 assembler that I helped maintain). Therefore I chose to use interprocess communication. I actually had several reasons for choosing this over threads:

  • Writing thread-safe code is really hard.

  • Threads share the same address space and, while all analysis engine processes share some memory, some of them also use significant amounts of unshared memory. I was concerned that this might lead to the application running out of virtual memory.

  • For security reasons, the external interface runs under an ID that has lowered access and in a chroot jail. This means that interprocess communication would have to be used for at least this function.

  • The pcap library for capturing network traffic from the interface was going to be used, and I was pretty sure it could not be used in a threaded process.

  • I wanted to be able to control the priority of processes dynamically, and while the pthread_setschedparam man page says, "See sched_setpolicy(2) for more information on scheduling policies,", there is no man page for sched_setpolicy (I have searched the web for it).

  • Writing thread-safe code is really hard.


Long after going through this thought process, I discovered this paper by Dr. Edward A. Lee at UC Berkeley that supports my reasoning. After performing a formal analysis to prove that writing thread-safe code is really hard, Dr. Lee recommends that code be written single-threaded and then elements of threading (or interprocess communication) be added only as needed. Thank you, Dr. Lee.

This left me with the decision of which IPC techniques to use. There are essentially three:

  • Pipes

  • Message queues

  • Shared memory


I read an article about a test that compared the three (which I cannot find now) and shared memory won hands down (an order of magnitude faster, as I recall). Therefore, while pipes are used in the analysis engine to transfer small pieces of data or low priority information, shared memory is the primary mechanism.

Of course, shared memory is the most difficult to program because it requires a way of guaranteeing that the data stored in every memory location is correct at all times. This is handled in the analysis engine by all of the following methods:

  • Assigning memory locations to a single process that others cannot access

  • Using locks (or semaphores in glibc-speak) to serialize access, which means the operating system allows only the process holding the lock to access the locked memory location

  • Using a mechanism similar to locks (but without the overhead) to serialize access


The center piece of this is the memory manager. When the application starts, a single large block is allocated and made non-swappable. This means that the application never has to wait for a block to be swapped in from disk, which is not done for memory allocated by a process for its own use. This block is chopped up into pools which are in turn chopped up into buffers. (Note: This is an oversimplification, see the analysis engine slide show on the Realeyes technology page for more detail.)

The memory manager sets an "in use" flag to indicate that a buffer is being used, and clears it when the buffer is released. Each level of the analysis engine uses specific structures, and when the "in use" flag is set for a buffer, other processes are not allowed to access it unless the structure is explicitly passed to them. This is the way the first access method is implemented.

The second access method is actually used by the memory manager to obtain or release a buffer. But it is also used by processes to modify structures in memory that could be potentially modified by two processes simultaneously. Most books on programming with semaphores usually start by saying that POSIX semaphores are overly complicated. I don't disagree, but after a little experimentation, I simply wrote a set of functions to initialize, get, free, and release a single lock. As it turned out, my first attempt did not work well across all platforms where the application was tested. But the correction was basically confined to the functions, with only the addition of an index parameter to one of them that meant changing about a dozen calls in the analysis engine code.

The third access method is very much like message queues, but with the performance of shared memory. When a process has information (in a structure) to pass to another, it puts the structure on a queue that only it may add to and only one other process may remove from. The rule governing most of these queues is that the first structure in the queue may only be removed if there is another one following it. In programming terms, there must be a non-NULL next pointer. So the first process modifies the structure to be added, and the very last step is to set the pointer of the last item in the queue to the new structure's address.

Special handling is necessary for some queues. For example, if there is very little activity, a single structure could be on a queue by itself for a long time (in computer cycles). This is handled in some queues by adding a dummy structure after the one to be processed after a brief wait (maybe a hundredth of a second).

A side effect of the choice of processes over threads is that it is much easier to monitor a process than a thread. It is also quite a bit more straightforward to use a debugger on a process. So, all things considered, I recommend this over threads unless there are strong reasons against it.

Finally, I have to say that the Java code does use threads. However, they are treated like separate processes in that they don't share memory. All data is passed in arguments to the methods being called or in the method return value. This eliminates the most problematic aspects of making code thread-safe, but (I have discovered) not all of them. The other issues are memory-related, but it is memory that the application does not control, such as the window in which the application is displayed, or network connections.

Overall, I agree with Dr. Lee when he says that threads "are wildly nondeterministic. The job of the programmer is to prune away that nondeterminism." And I don't find it to be too much of a stretch when he continues that, "a folk definition of insanity is to do the same thing over and over again and to expect the results to be different. By this definition, we in fact require that programmers of multithreaded systems be insane. Were they sane, they could not understand their programs."

Later . . . Jim

Wednesday, June 11, 2008

Elitism Improves Productivity

The Realeyes IDS application includes multiple plugins that interact with each other. The basic means of communication is a structure with information about the status of a network session, put on a queue by one plugin and taken off by the next one to process the session.

At the lowest level, this is a Data structure, which defines the packet captured by the Collector. The Data structure is then taken by the Stream Handler which determines which session it belongs to and sets some information, such as the start time, and then puts a Stream Analysis Work Element (SAWE) on another queue. The Stream Analyzers perform matching operations on the packets based on the rules defined for each one. Then the Action Analyzer and Event Analyzer perform correlation on the results of the Stream Analyzers.

This works very smoothly, except for the fact that there are multiple Stream Analyzers and one Action Analyzer. The Action Analyzer can free Data structures, and it must not free any that are still being processed. Because all of this analysis is happening asynchronously, the fields that indicate the state can change while being tested.

To handle this, I created a separate field that is set once when the session is ready for the Action Analyzer. Initially, I tried to wait briefly for the Stream Analyzers to update these fields. Of course, briefly is in the eye of the beholder. I set the wait value to 1 microsecond, which is 0.000001 second.

But the standard clock in most Intel computers is actually ticking once per 0.1 millisecond, or 0.0001 second. This is like saying, "Give me a second," and then taking over a minute and a half. The result was that work piled up waiting on the Action Analyzer. Buffers could not be freed and the application could not run for more than a couple of hours in the pilot environment.

I finally realized that instead of waiting for the first SAWE on the queue, the Action Analyzer should try to find one that was ready. In other words, it should ignore the structures that didn't meet its standards, and only choose that of the highest quality. In still other words, it should be an elitist.

And low and behold, buffer usage became almost a non-issue. The application now runs for days without running out of buffers. (In fact, it usually crashes from a bug before it runs out of buffers, but I'm working on fixing those.)

This demonstrates that being described as an elitist can be a compliment.

Later . . . Jim

Friday, May 30, 2008

Code: Library and Plugins

I just realized that the name of the blog has technology in it, and I have hardly mentioned code. The Realeyes project was originally started as a network Intrusion Detection System project. I have worked on several systems in which an attempt was made to design them modularly, but gradually, functions that were supposed to be generic incorporated application specific data and code. This increases the chance of creating errors when such a function is called by many other functions.

So I decided to create a library and then build the application on it. The library is called the Realeyes Analysis Engine. Applications are built on the library by creating plugin programs that call library functions. The first application had nothing to do with networks--it was just a series of random numbers that were organized according to the high order digits and then the low order digits were analyzed for patterns.

When I started writing the network IDS code, I found that I needed more control over some of the library functions, so I added hooks for the application. For any of you who have read about these so-called hooks but aren't sure what they are, they also go by the name of 'callbacks'. And what that means is that the library function calls an external function with predefined parameters. The name of the function may be specified or a pointer to the function may be initialized, and I use both.

For example, the library's main function calls three functions that every plugin must include, even if all they do is immediately return:
  • local_plugin_init(): This allows anything that needs to be done before the parser runs to be handled

  • plugin_parser(xml_main_structure, xml_dom_tree): The XML file gets parsed for syntax by the analysis engine, then passes a Document Object Module (DOM) tree to the plugin which parses the values

  • plugin_process(): This is where the plugin does its main job
The plugin parser is particularly interesting. The analysis engine uses libxml2 to parse an XML configuration file and build a tree of the values in the configuration file. (And yes, I have used the Expat library, which implements the Simple API for XML (SAX). But SAX is simple only for implementing the library, not the application, and I would only use it if I was under serious memory constraints--which is not the case for most configuration files.) The DOM tree is read by the plugin parser. But the code to read the tree is a bit hard on the eyes, not to mention being a typo magnet, as this simplistic sample of getting a data value shows:

if (raeXML_NODE->xmlChildrenNode != NULL) {
value = xmlStrdup(XML_GET_CONTENT(raeXML_NODE->xmlChildrenNode));
}

So, the analysis engine library includes several macros that make writing the plugin parser look a little like a Basic program. It also includes macros to name the function and parameters. Et voila, writing a plugin parser is as easy as this:

int raePLUG_PARSER(raeXML_PARM)
{
GET_NEXT_ELEMENT;
IF_ELEMENT("Element_name")
{
WHILE_ATTR_LIST
{
IF_ATTR("Attribute")
{
GET_ATTR(attr_value);
{
Process attr_value ...
}
}
}
GET_DATA(data_value);
Process data_value ...
}
GET_NEXT_NODE(status);
}
If you are thinking this should be contributed back to the libxml2 project, I don't think it would work. The Realeyes project only uses XML for configuration files with very simple syntax. Meanwhile, libxml2 handles the full range of XML capabilities. However, if you know someone who is working on an application and is annoyed (or annoying) about having to parse XML configuration files, point them to the Realeyes Analysis Engine subversion repository, where they can look at the XML parser source and include files.

The other type of hook/callback uses a function pointer. The reason for this is to make it optional. If the pointer is not initialized, then the callback function is not called. An example of this is the special handler for after an Analysis Record is built:
    This pointer must be initialized to a point to a function:
      raeRecordHandler raeEventRecordHandler

    The function may have any name, but must accept the specified parameter:
      erh_function (raeAnalysisRecord *rh_record)

The analysis engine library does all of the heavy lifting. Once the parsing is complete, plugins do not allocate any memory, unless there are specialized functions coded (I am particularly happy with the memory management, but that is a discussion for another post). There are library functions for managing multiple streams of data, matching values at specific locations in headers or strings in data, and building records for information that has been matched with a rule, just to name a few.

This makes the plugins fairly lightweight--the largest, the Action Analyzer, is just over 1,000 lines of code, most of which is parsing the options for collecting statistics. In fact, the statistics collection code, in a separate source file, is more than twice as large at over 2,200 lines of code, which gives a sense of how little the plugins have to do.

I gave a presentation to my local Linux User Group, and afterward one of the attendees talked to me about using it for some mathematical analysis he is involved in. I don't know if it will work for him, but I would be very happy if the library is found to be useful for other projects. The library is capable of handling multiple TCP sessions (35,000 simultaneously is the current peak), which are about as random as streams of data get, so it will certainly handle streams that are controlled. The output is created by a relatively simple plugin, which means it can be customized as much as necessary.

Later . . . Jim