Wednesday, September 24, 2008

Realeyes in the Real World

For about a year, I have been running Realeyes in a pilot project at a local college. The first six months were spent making it scale to a real world environment. But for the past six months, I have been able to experiment with rules to achieve the objectives of the system:
  • Capture exploits: This one is obvious, but that doesn't make it easy.

  • Reduce false positives: This was an original design goal, and my testing shows a lot of promise.

  • Determine if exploits are successful: This was also an original design goal, and it too is showing promise.

  • Capture zero-day exploits: While this capability hasn't been designed into the system, the flexibility of the rules and the statistics collection are showing promise in accomplishing this. We have also found some incidents of ongoing invalid activity that was apparently flying under the rader.
There is not much I can say about writing rules to detect exploits. If you have never done it, you just can't imagine how hard it is. First, finding enough information to even start is very difficult. But even with decent information, picking out the critical pieces that will consistently reveal the exploit is much more of an art than a science. I have written a few dozen rules at this point, but certainly don't consider myself to be an expert. With that said, rules do get written and, after some trial and error, are fairly effective.

The holy grail of creating IDS rules is to increase the capture rate while simultaneously reducing the false positive rate. When I was on the front lines, monitoring the IDSes, there was always the fear that a major attack would be buried in the false positives. Therefore, when I started this project, my personal goal was to completely eliminate false positives. I may not succeed, but I figure that if I am shooting for 100%, I will get closer than if my target is a lot lower.

I had spent quite a bit of time thinking about and discussing this with others. The team I worked with used multiple IDSes and correlated the information from them. Our success rate was very good, compared to our counterparts in other agencies. So a part of the puzzle seemed to be collecting enough data to see the incident for what it really is.

As an analogy, imagine taking a sheet of paper and punching a hole in it with a pencil. Then try to read a magazine article through that hole. At the very least, I would want to make the hole bigger. But this is as far as the analogy takes me, because what I really need is context, and a single word or phrase doesn't do that for me.

But even using multiple IDSes wasn't the complete solution. Each IDS had limitations that reduced its contributions to the total picture. If a rule was producing too many false positives, then the rule had to be modified, often in a way that reduced its effectiveness. This meant that there were important pieces missing.

So the solution appeared to be, put the capabilities of all the IDSes in a single IDS and do the correlation at the point of data capture. And that is how the Realeyes Analysis Engine is designed. Three levels of data that are analyzed: Triggers, Actions, and Events. Each one feeds the next, and the correlation of multiple inputs gives a broader view of the activity.

Triggers are the smallest unit of data detected. They can be header data such as a source or destination port, a string such as "admin.php" or "\xeb\x03\x59\xeb\x05\xe8\xf8", or a special condition such as an invalid TCP option, handled by special code. These are correlated by Action definitions, where an Action definition may be the Triggers: 'Dest Port 80' AND 'Admin PHP'.

Actions are the equivalent of rules in most IDSes available today. It is the next level of correlation that is showing the promise of reaching the goals listed above. Events are the correlation of Actions, possibly in both halves of a TCP session. (Actually, Realeyes assumes that UDP has sessions defined by the same 4-tuple as TCP.) And this is where is gets interesting.

One of the first rules defined was to search for FTP brute force login attempts. This Event is defined by two Actions, the first being 'Dest Port 21' and 'FTP Password', the second being 'Source Port 21' and 'FTP login errors' (which is any of message 332, 530, or 532). The rule was defined to require more than three login attempts to allow for typos. It has reported a few dozen incidents and zero false positives. Considering that the string for the FTP password is "PASS ", I am quite pleased with this result.

A more recent rule was defined to detect an SQL Injection exploit. The exploit uses the 'cast' function to convert hexadecimal values to ASCII text. The rule's Triggers search for four instances of testing for specific values in a table column. The Action is defined to fire if any two of them is found in any order. Although the exploit is sent by a browser to a server, there is no port defined. This rule is reporting a couple hundred attempts per day and none of them are false positives.

It really got exciting when the rule captured a server sending the exploit. It turned out that the server was simply echoing what it received from the browser in certain instances. However, the web admins wanted to know what web pages this was happening for, and this is where the third design goal is demonstrated. Both halves of these sessions were reported and displayed in the playback window. This gave us the full URL data being sent to the server, and the web admins were able to address the problem quickly.

To address the issue of detecting new exploits, Realeyes has a couple of somewhat unique features. The first is statistics collection. The information maintained for each session includes the number of bytes sent and received. When a session ends, these are added to the totals for the server port. Then, three times a day, the totals are reported to the database. This allows for the busiest ports to be displayed. Or the least busy, which might point to a back door.

But there are other statistics that can be collected, as well. It is possible to monitor a specific host or port and collect the total data for each session.

For example, I monitored UDP port 161, which is used for SNMP, and saw two host addresses using it that belonged to the site, but 3 or 4 a day that did not. Since there was not much traffic for the port, I simply created a rule to capture all data for it. This showed me that there were unauthorized attempts to read management data from devices in the network, but that none were successful.

Using the same technique, I monitored TCP port 22, used for SSH, and found several sessions that appeared to be attempts to steal private keys. I reported this to the admin of the target hosts, and while he had applied the most recent patches, I also suggested that he regenerate his private keys, to be on the safe side.

Another feature for discovering new exploits is time monitoring. This is setting a time range for all activity to a host or subnet to be captured. I defined rules to monitor the EMail servers between midnight and 5:00 am. The original intent was to watch for large bursts of outbound EMail to detect any site hosts being used for spam. We have found one of these.

But we have discovered several other things from this. First was a large amount of traffic using a port that the network admin thought had been discontinued. Second, there have been several attempts to have EMail forwarded through the site's servers. This may be a misconfiguration at the sender, or it may be an attempt to send spam through several hops to avoid detection. From this I created rules to monitor for the most common domains.

So far, there have not been any earth shattering discoveries (to my disappointment, but to the the admins' relief). But as I said, the signs are promising that the system is capable of meeting the design goals. Until a couple of weeks ago, I have spent the majority of my time working on code. But I am starting to spend more time on rules. So I am looking forward to making some new discoveries. Stay tuned.

Later . . . Jim

Tuesday, September 23, 2008

My App Fails the LSB

My position on the Linux Standard Base has evolved. When I first heard about it, I was all for it. The LSB as a standard could be useful to some, but now I disagree with the goals of the LSB working group. To be sure, this post is not about dissing the Linux Foundation. They have many worthwhile projects.

What follows is my experience with the LSB Application Checker, my take on the purpose of the LSB, and my own suggested solution for installing applications on GNU/Linux distributions. The Realeyes application failed to certify using the checker v2.0.3, which certifies against the LSB v3.2. Everything that it called out could be changed to pass the tests, but I will only consider correcting a few of the 'errors'.

After building the Realeyes v0.9.3 release, I collected all executable files in a common directory tree, downloaded the LSB application checker, and untarred it. The instructions say to run the Perl script, app-checker-start.pl, and a browser window should open. The browser window did not open, but a message was issued saying that I should connect to http://myhost:8889. This did work, and I was presented with the Application Check screen.

There was a text box to enter my application name for messages and one to enter the files to be tested. Fortunately, there was a button to select the files, and when I clicked on it a window opened that let me browse my file system to find the directories where the files were located. For each file to be tested, I clicked on the checkbox next to it, and was able to select all of the files, even though they were not all in the same directory. Then I clicked on the Finish button and all 87 of the selected files were displayed in the file list window.

When I clicked on the Run Test button, a list of about a dozen tasks was displayed. Each was highlighted as the test progressed. This took less than a minute. Then the results were displayed.

There were four tabs on the results page:
  • Distribution Compatability: There were 27 GNU/Linux distributions checked, including 2 versions of Debian, 4 of Ubuntu, 3 of openSUSE, 3 of Fedora, etc. Realeyes passed with warnings on 14 and failed on the rest.

  • Required Libraries: These are the external libraries required by the programs written in C. There were nine for Realeyes, and three (libcrypto, libssl, and libpcap) are not allowed by the LSB. This means that distros are not required to include the libs in a basic install, so they are not guaranteed to be available.

  • Required interfaces: These are the calls to functions in the libraries. There were almost a thousand in all, and the interfaces in the libraries not allowed by the LSB were called out.

  • LSB Certification: This is the meat of the report and is described in some detail below.

The test summary gives an overview of the issues:
  • Incorrect program loader: Failures = 11

  • Non-LSB library used: Failures = 4

  • Non-LSB interface used: Failures = 60

  • Bashism used in shell script: Failures = 21

  • Non-LSB command used: Failures = 53

  • Parse error: Failures = 5

  • Other: Failures = 53

The C executables were built on a Debian Etch system and use /lib/ld-linux.so.2 instead of /lib/ld-lsb.so.3. The Non-LSB libraries and interfaces were described above, but there was an additional one. The Bashisms were all a case of either using the 'source' built in command or using a test like:
  while (( $PORT < 0 )) || (( 65535 < $PORT )); do

which, in other Bourne shells, requires a '$(( ... ))'. The parse errors were from using the OR ("||") symbol.

The fixes for these are:
  • Use the recommended loader

  • Statically link the Non-LSB libraries

  • Use '.' instead of 'source'

  • Rework the numeric test and OR condition

So far, all of this is doable, sort of. But every time a statically linked library is updated, the app must be rebuilt and updates sent out. Also, the additional non-LSB library is used by another one. So I would have to build that library myself and statically link the non-LSB library (which happens to be part of Xorg) in it (and that one is part of the GTK). The reason I am using it is because the user interface is built on the Eclipse SWT classes, which uses the local graphics system calls to build widgets.

The non-LSB commands include several Debian specific commands (such as adduser), and for my source packages I had to rework the scripts to allow for alternatives (such as useradd). But the other disallowed commands are:
  • free: To display the amount of free memory

  • sysctl: To set system values based on the available memory

  • scp: Apparently the whole SSL issue is a can of worms

  • java

  • psql

Finally, all of the other 'errors' are: Failed to determine the file type, for the file types:
  • JAR

  • AWK

  • SQL

  • DTD and XML

Part of the problem with the LSB is that it has bitten off more than it can chew. Apparently Java apps are non-LSB compliant. So are apps written in PHP, Ruby, Erlang, Lisp, BASIC, Smalltalk, Tcl, Forth, REXX, S-Lang, Prolog, Awk, ... From my reading, Perl and Python are the only non-compiled languages that are supported by the LSB, but I don't know what that really means (although I heartily recommend to the Application Checker developers that they test all of the executables in their own app ;-). And I suspect that apps written in certain compiled languages, such as Pascal or Haskell, will run into many non-LSB library issues.

Then there are databases. Realeyes uses PostgreSQL, and provides scripts to build and maintain the schema. Because of changes at version 8, some of these scripts (roles defining table authorizations) only work for version 8.0+. The LSB Application Checker cannot give me a guarantee that these will work on all supported distros because it didn't test them. I have heard that there is some consideration being given to MySQL, but from what I can tell, it is only to certifying MySQL, not scripts to build a schema in MySQL.

After all this kvetching, I have to say that the Application Checker application is very well written. It works pretty much as advertised, it is fairly intuitive, and it provides enough information to resolve the issues reported by tests. My question is, "Why is so much effort being put into this when almost no one is using it?"

An argument can be made that the LSB helps keep the distros from becoming too different from each other and without the promise of certified apps, the distros would not be motivated to become compliant. But I only see about a dozen distros on the list, with Debian being noticeably absent. And yet, there is no more sign of fragmentation in the GNU/Linux world than there ever was.

My theory on why UNIX fragmented is that proprietary licenses prevented the sharing of information which led to major differences in the libraries, in spite of POSIX and other efforts to provide a common framework. In the GNU/Linux world, what reduces fragmentation is the GPL and other FOSS licenses, not the LSB. All distros are using most of the same libraries, and the differences in versions are not nearly as significant as every UNIX having libraries written from scratch.

I have to confess, I couldn't care less whether Realeyes is LSB compliant, because it is licensed under the GPL. Any distro that would like to package it is welcome. In fact, I will help them. That resolves all of the dependency issues.

While I am not a conspiracy theorist, I do believe in the law of unintended consequences. And I have a nagging feeling that the LSB could actually be detrimental to GNU/Linux. The only apps that benefit from LSB compliance are proprietary apps. The theory behind being LSB compliant is that proprietary apps can be guaranteed a successful installation on any LSB compliant GNU/Linux distro. I'm not arguing against proprietary apps. If a company can successfully sell them for GNU/Linux distros, more power to them. However, what if proprietary libraries manage to sneak in? This is where the biggest threat of fragmentation comes from.

But even more importantly, one of the most wonderful features of GNU/Linux distros is updates, especially security updates. They are all available from the same source, using the same package manager, with automatic notifications. If the LSB is successful, the result is an end run around package managers, and users get to deal with updates in the Balkanized way of other operating systems. That is a step in the wrong direction.

The right direction is to embrace and support the existing distro ecosystems. There should be a way for application teams to package their own apps for multiple distros, with repositories for all participating distros. The packages would be supported by the application development team, but would be as straightforward to install and update as distro supported packages.

There is such a utility, developed by the folks who created CUPS. It is called the ESP Package Manager. It claims to create packages for AIX, Debian GNU/Linux, FreeBSD, HP-UX, IRIX, Mac OS X, NetBSD, OpenBSD, Red Hat Linux, Slackware Linux, Solaris, and Tru64 UNIX. If the effort that has gone into LSB certification were put into this project or one like it, applications could be packaged for dozens of distros.

And these would not just be proprietary apps. There are many FOSS apps that don't get packaged by distros for various reasons, and they could be more widely distributed. Since the distros would get more apps without having to devote resources to building packages, they should be motivated to at least cooperate with the project. And don't forget the notification and availability of updates.

As a developer and longtime user of GNU/Linux ('95), I believe that all of the attempts to create a universal installer for GNU/Linux distros are misguided and should be discouraged. I say to developers, users, and the LSB working group, "Please use the package managers. A lot of effort has been put into making them the best at what they do."

Later . . . Jim

Monday, September 22, 2008

What is happening to IPv6?

I have read several articles lately on the state of IPv6. Most of them take the position that implementation is moving slowly, and that is just fine. I went to a conference a few weeks ago that gave me a fairly different impression. It brought together people from the US federal government, business, and academia to describe the state of IPv6 in the government, and start a discussion of where to go next. My main interest in the conference was the security issues surrounding IPv6, and of course, there is no serious discussion about the Internet that doesn't include security. However, I did learn some interesting things about the status of IPv6.


At the time, the government was on the verge of announcing that agencies could demonstrate IPv6 capabilty. This basically amounts to the fact that they have a plan in place to only purchase equipment that supports IPv6, although some have pilot programs in place. The DoD is apparently furthest along, but is nowhere near ready to "flip the switch". However, there has been quite a bit of planning and NIST is working on setting standards for testing equipment, that will eventually lead to purchase specifications.

The government, and in particular the DoD, is not very concerned about the increased address space or encryption/authentication features of IPv6. Instead, their interest is in mobile networks, especially the auto-configuration features, and multicast. This is for using dynamic networks on the battlefield, and while they didn't give any details on the subject, it isn't hard to imagine how both features could be used. There is a pretty good article from Linux Journal on IPv6 mobility in Linux, which starts with a really good overview of the principles of using it.

Near the end of the conference, there was a panel discussion which included a couple of government representatives, a guy from Cisco and one from AT&T, and a member of the IETF IPv6 working group. The guys from industry said that there is a steady, and fairly significant increase in the IPv6 network infrastructure. If you live in the US, you are probably used to reading articles that claim deployments are almost non-existant and are surprised to hear this. However, the deployments are happening in Asia primarily, and to a lesser extent in Europe.

I don't know how true it is, but someone at the conference said that the University of Califonia system has more IPv4 addresses than the entire country of China. Even if it isn't the case, the point is well taken. Latecomers to the Internet got the crumbs of the IPv4 address space, and now they are desperate to use IPv6.

This has many implications. The entire set of TCP/IP and related protocols are very loose. In some people's minds, the rollout of IPv6 is an opportunity to get several things right that IPv4 didn't. However, implementations will not all support the same features. The reason for this is that IPv6 has a common header with very little in it other than the source and destination addresses. The new features are in extension headers, which are optional. And the protocols to be used for encryption and authentication are continuing to be developed:
  • Security Architecture for IP: RFC1825 obsoleted by RFC2401 obsoleted by RFC4301 (IPSec) and updated by RFC3168

  • IP Authentication Header: RFC1826 obsoleted by RFC2402 obsoleted by RFC4302 and RFC4305 obsoleted by RFC4835

The newest of these is dated April, 2007. And none of them specify algorithms or methods, just the requirements for using them. Meanwhile, thousands of routers for IPv6 networks have already been deployed and the number is growing steadily.

The question was raised during the panel discussion of how much assurance agencies will have that the products they purchase will support the requirements that are mandated. The answer was a resounding, "Well ..." The member of the IETF pointed out that they are not a standards committee, but more like the maintainer of the documentation. RFCs have their status upgraded to Standard by concensus. And the Asian and other countries using IPv6 as 'IPv4 with big addresses' are not interested in spending time haggling over features that interfere with their deployments.

But as I said before, my interest was in the security concerns for IPv6. The main ones are the same as in IPv4. To begin with, a very small percentage of exploits go against the IP layer. And security issues for the upper layers of the stack will not change.

But once mobile IPv6 networks with auto-configuration begin to be deployed, there will be vulnerabilities to be dealt with. Of course, the first vulnerability will be admins on the learning curve. Hopefully, these folks will be motivated and supported to get good training early on. To get a taste of actual protocol issues, read the RFC for Secure Neighbor Discovery (SEND) mechanisms.

I certainly hope that the admins don't get stuck with the bulk of the responsibility for security. But the current state of networking products, according to someone who is testing them for the NSA, is that the low level infrastructure (routers, host OSes) is fairly good, but not 100%. Even a little way up the stack, that falls to little, if any, support for IPv6. This includes firewalls and IDSes, and some server applications.

So what has to be handled? Well, the first step is to find the upper level data, such as a TCP header. This means parsing the extension headers, which is a little weird. First, the header type is not in each header, but in the preceeding header. So the first two octets of each extension header are the type of the next header and the length of the current header.

Then there is tunneling. It is possible to tunnel IPv6 in IPv4 or vice versa. What this means for finding the upper level header is that first the IPvX header must be parsed, then the IPvY header. And this could be done more than once in crafted packets. I sure hope that normal traffic at the point of a firewall would not have multiple IPv4 and IPv6 tunnels.

The other issue is the order and number of IPv6 extension headers. Most packets are only allowed to have one of each type of extension header. The exception is the Destinations header, and there may be two of these. Oh, and the order is not mandatory, just recommended.

And what is the status of Realeyes IPv6 support? I have added the capability to find upper level headers in untunneled packets. And there is some capability to parse IPv6 extension headers. In particular, the number of each type of header is validated, and for now the recommended order is treated as mandatory, so a report is issued for packets that don't follow the recommendation.

The common IPv6 header is formatted in playbacks, but there is a hex dump of the extension header data. I will be gradually adding support for tunneling and formatting of the extension header data. It is similar to TCP options, so I don't see the code being difficult. The biggest problem is testing. I configured two hosts on my local network to talk to each other with IPv6, but that is a proof of concept, at best.

So that is my perspective of what is happening with IPv6. I suggest that if you work on networks, you show more than a passing interest in it, because it will probably be a case of hurry up and wait, until all of a sudden we need it yesterday.

Later . . . Jim

Sunday, September 21, 2008

Realeyes IDS v0.9.3 Ready for Download

The packages for Realeyes IDS release 0.9.3 are ready for download. This version improves stability over the previous one and includes several new features.

If there are any problems, please notify the team. Thanks.

You may have noticed that I finally finished the First Edition of the Realeyes IDS Manual. I made it as useful as I could. But any comments are welcome.

Later . . . Jim