Friday, December 5, 2008

Punishment vs. Prevention

Punishment

Recently, F-Secure released a report titled, "Growth in Internet crime calls for growth in punishment". The article and the associated report cite F-Secure's research and several specific incidents to make the case for creating an 'Internetpol' to fight cybercrime. It is their conclusion that "against a background of steeply increasing Internet crime, the obvious inefficiency of the international and national authorities in catching, prosecuting and sentencing Internet criminals is a problem that needs to be solved."

The data that is used to reach this conclusion is tenuous at best. The primary fact cited is that the number of signatures used in the F-Secure detection database has increased three times over a year ago. This could be explained in many ways, with one of the main ones being that exploit creators have adapted to signature based detection by automatically generating variations of the original, which requires many more signatures to detect a single basic exploit. Numbers alone do not tell the story.

As a side note, this is one of the strengths of the Realeyes IDS. While the rules include specific characters to be matched, they can be detected in any order and then correlated with an Action. At the next level, multiple Actions can be correlated with an Event. This allows many variations of an exploit to be defined by a single Event rule.

F-Secure's anecdotal evidence of outbreaks is even less convincing. It is just as easy to conclude that attacks are more targetted than a few years ago, when a single worm could infect millions of systems, and infer from this that software development has become at least good enough to deter the easy attacks. Yet, neither scenario is absolutely supported by the evidence.

But even if the problem were defined correctly, the solution presented is not. First and foremost, what is a cybercrime in international terms? Most countries have not updated their own laws to meet the conditions presented by the Internet. The thought that Brazil, Russia, India, China, the UK, the US, and all the other countries with Internet access could agree on a common set of laws to govern Internet usage is a stretch, to say the least.

Then there is the issue of prosecution. The situation of a perpetrator who is in a different country from the computers attacked would probably not be any different from how that is handled today. And it is all too common for bureaucratic agencies to use quantity instead of quality to prove success. This initiative would very likely result in many low-level 'criminals' and even some innocent people being swept up in the new dragnet.

Finally, I find it extremely simplistic to suggest dumping society's problems on law enforcement. A huge question is how this internetpol organization would be staffed, especially considering that existing law enforcement agencies are finding it challenging to enforce existing laws in the Internet environment. Between jurisdictional issues and competition for the qualified candidates, the new agency would certainly create inefficiencies. And where is the funding to come from?

Prevention

I believe that legislatures need to update laws to define what is cyber crime. The recent case on cyberbullying has produced potentially bad precedents that need to be addressed, and soon. But most of this effort should focus on adapting current law to the Internet and only creating new laws where they are justified by a unique situation.

The truth is, much of the problem is technological. SQL injection attacks are an example. Currently, every application programmer is expected to parse input for this. But many application programmers hardly know what a database is, much less how to protect against all the possible variations of SQL injection. The ones who do know that are the database developers. Therefore, the security community should be calling for all xDBC libraries to include methods to validate input for applications.

The F-Secure report cited botnets as one of the primary security concerns. The root cause of botnets is spam Email. If this were not such a lucrative business, it would not be such a problem. One of the solutions is to force strong authentication in Email protocols. And this is just one example. The security community should support an organization that could act as consultants to protocol committees to define strong security solutions for Internet protocols. That organization could also focus on convincing vendors and users to implement those solutions.

There are many guides on secure programming, but how many application developers have studied them? This should be mandatory, because if exploiting vulnerabilities were hard, there would not be nearly as many attacks. The security community could help produce more secure applications by establishing a certification program for secure programming.

Realistically however, the biggest part of the problem is unaware users. We in the industry talk about best practices, but that is meaningless to most users. We need to convince management to ensure that users get adequate training about good security practices and we need to be specific about what that training includes.

Finally, I feel compelled to issue the warning, "Be careful what you wish for, because you might just get it." If the government takes over Internet security, there is sure to be a large amount of new regulation imposed. And this could mean security companies like F-Secure would have to devote a lot of resources towards compliance. I think it would be much better for us to take responsibility for finding solutions ourselves.

Later . . . Jim

Tuesday, November 18, 2008

Take Five

Jazz fans will recognize the title of this post as one of the most famous jazz pieces ever written. It was composed and performed by the Dave Brubeck Quartet and was part of the album Time Out, which contained several pieces in unusual time signatures.

This is one of a wide range of music that I enjoy, which taken as a whole is called progressive music, although it used to be called having eclectic taste. Regardless, I simply enjoy hearing boundaries being stretched. And when I need a break from programming and network data analysis, I noodle around on the piano trying to stretch my own boundaries.

As a side note, I dropped out of college to pursue the dream of becoming a professional musician. When that bubble burst, I discovered a love of computers on a friend's Apple II, and went back to get a Bachelor's in Computer Science. When I was interviewing for my first job after college, I had to explain the gap, with some embarassment. But the interviewer told me that some of their best developers are musicians, which I have found to be true over the years. From then on, I have been proud to tell people that I play piano and guitar.

Sometimes my noodling turns into a song. In the past, many such songs have disappeared into the ether when I stopped playing them. But because I do my development on Kubuntu Linux, I have access to the many Free/Open Source multimedia applications that have been developed. So after the latest Realeyes packages were up on SourceForge, I decided to save my current repetoire for posterity -- a personal 'Kilroy Was Here' carved on the wall of time.

First I transcribed the music. To do this, I used two applications in conjunction with each other, NoteEdit and Lilypond. NoteEdit has a graphical interface that makes it possible to enter the music visually. Lilypond uses text files to engrave musical scores. NoteEdit exports Lilypond definition files, as well as MusiXTex, ABC, PMX, MusicXML, and Midi files.

I have tried to play Midi files using my SoundBlaster Live unsuccessfully in the past. I suspected that it was a sound font issue, but had not taken the time to research it. This project motivated me to deal with it. And it is amazingly easy. There is a program called sfxload (or in some cases, asfxload), which, in Kubuntu, is part of the awesfx package. It has a good man page, but the command I use is:
    sfxload -v -D -1 sound_font_file
and the sound font files are on the SoundBlaster installation disk with a 'SF2' extension.

With Midi working, I could transcribe my song and listen to it immediately, which made finding errors a lot easier than trying to edit the music itself. Also, it turns out that my music is a bit more rythmically complex than I realized. There are several switches between 3/4 and 4/4, and even a few measures of 5/4 and 13/16 (seriously!).

As another side note, I remember the laments of musicians when electronic music equipment reached a point where it could be used to produce the sounds of acoustic instruments. However, my experience is that it takes a whole lot of work to create the tempo and volume dynamics of a live performer. I only put enough time into it to be able to verify that the notes and tempos were correct, and that took a couple of weeks just for a dozen piano pieces. So, between the amount of effort required to produce electronic music and the many benefits of live performances, I expect that we will be listening to performers for as long as we enjoy music.

When I exported to Lilypond, there were several errors, so I had to learn the syntax of the Lilypond definition files. NoteEdit puts the measure number in a comment which makes cross referencing much easier, as well as keeping the learning curve shallower. Also, the Lilypond forums are excellent. One of my questions received two good answers in less than 24 hours. Since I consider myself a member of the Free/Open Source community, I created a small NoteEdit file with the errors I found, exported it to Lilypond and corrected the syntax, and opened a bug report which included both files.

Lilypond does a beautiful job with the music, exporting it to PostScript and PDF files. The only thing I was unable to do was to create a book of multiple pieces where each piece started on a new page. The examples I followed had a piece starting on the same page as the end of the previous piece if there was enough room.

Doing what I wanted might be possible, but I simply converted each piece from PDF to PNG using the ImageMagick convert command. I then loaded the PNG files in the Gimp. I have found the Gimp to be great for quick conversions and editing. For example, I used it to convert color pictures to grayscale for the Realeyes IDS System Manual.

With the pictures of the music cropped, I imported the images into an ODF file created in Open Office. This made it easy to lay out the book, including creating a title page and table of contents. And, of course, the entire manuscript could be exported to PDF format. I am self-publishing the collection at lulu.com and they prefer PDF files.

But I wanted people to hear the music as much as having it transcribed. So I started learning how to use Audacity. This was a very shallow learning curve. The most difficult part was getting my microphone to provide decent input. I had to buy a low-end pre-amplifier (about $70 US), and then after an hour of experimentation, it was working fine. The Audacity interface is extremely intuitive, so I was able to start recording right away. The only times I referred to the documentation were to find out what some of the more esoteric effects were supposed to do.

I used the following effects, in the order listed, for every piece:
  • Noise Removal

  • Amplify

  • Normalize
For some pieces, I pasted the best parts of multiple takes together. To make this work, I used additional effects:
  • Bass Boost

  • Change Tempo
The one piece that I attempted to sing required a lot of editing. To make it less painful than it would have been, I also used Change Pitch and Echo.

I have built a personal web site to showcase the SO Suite (word play on the titles). The recordings are in MP3 format. Unfortunately, the web site does not offer shell accounts, and they only support a limited number of audio files formats, and Ogg is not one of them. If anyone knows of a site that would host my amateur performances, I would be happy to upload the 23Meg of Ogg files.

Now, even if I don't sell any manuscripts, at least I can share my music with family and friends. And it won't be totally lost if I stop playing it. Break time is over, got to get back to work.

Later . . . Jim

Thursday, October 30, 2008

More Results from Realeyes

For the past few weeks, I have been learning a lot about the site where the Realeyes pilot project is being run. After seeing several reports of incidents from Europe and Asia, it occurred to me that I could create a rule to monitor non-US IP addresses.

To do this, I got the IANA IPv4 Address Assignments, and created a list of the high order octet assigned to each of Europe, Asia, Africa, and Latin America. The rule was simply a SYN packet and any match on the first octet of the source address with a value in the list.

At first, I simply turned it loose, which generated over 20,000 reports. I was able to reduce that quickly by filtering on the "Referer:" field. First were the requests being referred by one of the site's own web servers. Then I found other sites, such as Google, that were referring browsers to the monitored network. These were all defined in a single Action , which was then defined in the Event with the 'NOT' flag set. This resulted in about 5,000 reports which have been further reduced by adding some of the site's commonly requested web pages to the filter.

The rule is now: Any connection requested by an IP address in Europe, Asia, Africa, or Latin America that is not referred by a site server or one in a list of other servers, and is not requesting one of a list of web pages. If a match is found, the first 64k of both halves of the session are reported. I was thinking of adding a filter where the web server responds with a 200 message, but that could miss a successful exploit.

Of the reported sessions, many are web crawlers for various international search engines. A large number are being referred by other servers. And a fair number appear to be overseas students. Many of the web crawler and overseas student connections consisted of over 100 sessions. Using the 'Ignore Source Address' option, I could close the incidents for a single source IP address without creating a report in a single click. This allowed me to reduce the reports by 2,000 - 3,000 fairly quickly.

And that left me with about 1,000 connections of 1 - 5 sessions each. It was easy to display the playback window and see the client request and the server response and make a decision on whether it was valid or not. Usually, the server responded with a 200 code and sent the requested page. I was able to check about 10 of these per minute, so it only took a couple of hours to run through the entire list.

As far as invalid activity, there have been several targetted scans. By this I mean that the requests are only sent to web servers, and they actually make HTTP requests. These were easy to see by sorting the reports on the source IP address and looking for connections to multiple servers.

The most interesting one was 'GET /manager/html'. This appears to be a Tomcat exploit which tries to gain access to the administrator account. Of the dozen web servers that received this request, all but one replied with 404 "Not Found". The other one replied with 401 "Unauthorized" and the source host then sent over 150 variations of the authorization code field. The codes were mixtures of numbers and mixed case letters that looked like they were taken from a table. Some were as long as 25 characters, while others were only 5 or 6 characters. Fortunately, none were successful.

Another interesting discovery was that one of the monitored site's web servers was being used to store data. An application to allow students to participate in workgroup activities had been broken into and data was stored for some of those sites that are links in spam. It was the response from the server that alerted me to this. I saw a list of keywords meant to generate a lot of hits in search engines. I was then able to report the full path of the request to the web administrator and the server was cleaned of this and a few other pages.

The lesson I take from this is that Realeyes is capable of collecting a broad range of data, filtering it effectively, and providing enough information to analysts to very quickly determine the severity level of incidents. The rules for monitoring can be customized and tuned for the site's requirements, giving analysts and administrators a deep view of their environment. And since that is what I set out to do with this project, I am quite pleased.

Later . . . Jim

Thursday, October 9, 2008

It's a Big Cloud

I have recently read several articles that comment on the issues surrounding 'cloud computing'. However, they all seem to be the proverbial blind men describing an elephant. Doc Searls covers more ground than most and promises a follow up discussion, but all of them tend to limit the issues to their own perspective.

I don't have a problem with their facts, just the level of incompleteness. First, I'd like a really good definition of 'cloud computing'. Since I have not seen one, I will take a shot at it. As a network guy, I have seen clouds in network diagrams for a long time, so I tend to build on that understanding to relate to the current state of the technology.

The essence of 'cloud computing' is that it extends the personal computer to utilize Internet resources. These days, a personal computer may be a desktop, a kiosk station, a laptop, a mini-laptop, a smart phone, or an applicance. The resources that can be used include services, such as weather or stock information, interactive applications, such as Email or social network sites, storage sites, such as Flickr, or computer-to-computer applications, such as tracking packages or vehicles.

Using this definition, it is obvious that 'cloud computing' is simply a buzzword. Anyone who makes online purchases, has an online Email account, or has joined a social networking site is participating in 'cloud computing'. Even if the requirement that data that would otherwise be stored on a local disk be involved, all three of these examples meet the definition.

So what's the big deal? Well, as usual, social mores are behind the technology, and some of the discussion is about trying to catch up. Also, if the definition can be controlled, it can be sold as a new product, with the inescapable "caveat emptor" warnings from consumer advocates. With that in mind, I see the following issues involved in using this technology. Not surprisingly, many of them are the same as the issues of using computer technology in general, but the addition of the Internet puts a new spin on them:
  • Cost: The proponents of 'cloud computing' tend to tout this as a big plus. That sounds to me like they are trying to sell a web version of out-sourcing. The easiest argument against it would be that traditional out-sourcing has not proven to be a huge cost saver across the board.

    But the real issue is the question of, "Who is the target market?" I cannot imagine a retail company putting it's inventory on storage managed by Google. So realistically, the target market is individual consumers. The type of data being handled is mostly Email, audio/video files, and blogs. For this, the online storage -- and backups -- are very cost effective.

    Will we ever see companies out-sourcing to web services/storage? I never say never, but I think it is a really hard sell. So I predict that it will eventually take hold, but in a limited way. Applications that help companies interact with their customers could be beneficial to both parties. And then there are the ones that no one has thought of yet.

  • Reliability: Adding the Internet to the equation makes
    reliability a huge issue. The components that must all be working are:

    • The personal computer
    • The local ISP
    • The cloud
    • The remote ISP
    • The remote services

    The further out you go, the higher the possibility of failure. But so what? How many times in your life have you missed a critical phone call or Email, where minutes or even hours made a difference? In my mind, the backups at the storage site are far superior to the procedures done by the majority of consumers, myself included. And this outweighs the few times that the site is inaccessible, and is more cost effective at the same time. Even downtime for businesses would not lead to their demise.

  • Access to data: The remaining items are where much of the current discussion is centered. There are some online Email services where it is difficult to retrieve Emails to the personal computer. But this is an issue that can be managed. It essentially boils down to read before you sign, and if you're the type who doesn't do this, well shame on you. Also, it would be pretty silly to not keep a copy of at least the most important data locally, such as photos, which adds to the cost. But the cost of losing it forever is even higher.

  • Privacy: If there is anyone who hasn't figured it out yet, let me put it as plainly as possible. Nothing on the Internet is private. All the privacy policies and laws in the world cannot stop someone who has a real desire to take whatever data they want and do whatever they want with it. Data that is more important, such as financial information can be more carefully protected (this is where I get to plug Realeyes), but computer security is a matter of probabilities, not guarantees. As far as what the Internet companies do with your data, again, read before you sign. But if there is information that should never be exposed, it should never be accessible from the Internet.

  • Ownership: I believe that this is the reason that Richard Stallman said using 'cloud computing' is stupid. He has long championed freedom of information, not just program source code. The ownership policies of most sites promoting 'cloud computing' goes against this in that they consider your information to be free, but not their own. Therefore, his position is consistent and reasonable. However, if you consider losing $1,000 in Las Vegas to be part of the fun, then I guess you can be forgiven for thinking he is a spoil sport.

    It is true that most of the sites that handle consumer data reserve the right to use that data as they please. Their literature basically says this is for promotional purposes. And Google finds keywords in Email to display ads. I have known people who take the dealer logo off their cars because they object to being a billboard. If that is you, you are probably going to be uncomfortable using these sites. But having a social networking account is not a right. You have to pay to play. Just remember the rule about Internet privacy, which is that there is none.

  • Security: This is not a rehash of the storage site's security policies. It is a serving of food for thought. The Internet is a virtual wild west. The quality of security from one site to the next varies widely, and there are many who are happy to take advantage of that. The more you use the Internet, the higher your chances of having a vulnerability exploited. So if you use the same personal computer for social networking that you use for banking, you may become a victim of identity theft. I can think of a couple of analogies here, and I expect that you can too. I would say that the same types of rules apply. At the very least, please keep your security patches up to date.
To 'cloud compute' or not, that is the question. Of course, if you are reading this, you have already answered it. Now it is simply a matter of degree. Are you going to participate fully, or limit your involvement to a few interests? The most important thing is to be aware of the issues. I hope that I have contributed some useful thoughts to that end.

Later . . . Jim

Wednesday, September 24, 2008

Realeyes in the Real World

For about a year, I have been running Realeyes in a pilot project at a local college. The first six months were spent making it scale to a real world environment. But for the past six months, I have been able to experiment with rules to achieve the objectives of the system:
  • Capture exploits: This one is obvious, but that doesn't make it easy.

  • Reduce false positives: This was an original design goal, and my testing shows a lot of promise.

  • Determine if exploits are successful: This was also an original design goal, and it too is showing promise.

  • Capture zero-day exploits: While this capability hasn't been designed into the system, the flexibility of the rules and the statistics collection are showing promise in accomplishing this. We have also found some incidents of ongoing invalid activity that was apparently flying under the rader.
There is not much I can say about writing rules to detect exploits. If you have never done it, you just can't imagine how hard it is. First, finding enough information to even start is very difficult. But even with decent information, picking out the critical pieces that will consistently reveal the exploit is much more of an art than a science. I have written a few dozen rules at this point, but certainly don't consider myself to be an expert. With that said, rules do get written and, after some trial and error, are fairly effective.

The holy grail of creating IDS rules is to increase the capture rate while simultaneously reducing the false positive rate. When I was on the front lines, monitoring the IDSes, there was always the fear that a major attack would be buried in the false positives. Therefore, when I started this project, my personal goal was to completely eliminate false positives. I may not succeed, but I figure that if I am shooting for 100%, I will get closer than if my target is a lot lower.

I had spent quite a bit of time thinking about and discussing this with others. The team I worked with used multiple IDSes and correlated the information from them. Our success rate was very good, compared to our counterparts in other agencies. So a part of the puzzle seemed to be collecting enough data to see the incident for what it really is.

As an analogy, imagine taking a sheet of paper and punching a hole in it with a pencil. Then try to read a magazine article through that hole. At the very least, I would want to make the hole bigger. But this is as far as the analogy takes me, because what I really need is context, and a single word or phrase doesn't do that for me.

But even using multiple IDSes wasn't the complete solution. Each IDS had limitations that reduced its contributions to the total picture. If a rule was producing too many false positives, then the rule had to be modified, often in a way that reduced its effectiveness. This meant that there were important pieces missing.

So the solution appeared to be, put the capabilities of all the IDSes in a single IDS and do the correlation at the point of data capture. And that is how the Realeyes Analysis Engine is designed. Three levels of data that are analyzed: Triggers, Actions, and Events. Each one feeds the next, and the correlation of multiple inputs gives a broader view of the activity.

Triggers are the smallest unit of data detected. They can be header data such as a source or destination port, a string such as "admin.php" or "\xeb\x03\x59\xeb\x05\xe8\xf8", or a special condition such as an invalid TCP option, handled by special code. These are correlated by Action definitions, where an Action definition may be the Triggers: 'Dest Port 80' AND 'Admin PHP'.

Actions are the equivalent of rules in most IDSes available today. It is the next level of correlation that is showing the promise of reaching the goals listed above. Events are the correlation of Actions, possibly in both halves of a TCP session. (Actually, Realeyes assumes that UDP has sessions defined by the same 4-tuple as TCP.) And this is where is gets interesting.

One of the first rules defined was to search for FTP brute force login attempts. This Event is defined by two Actions, the first being 'Dest Port 21' and 'FTP Password', the second being 'Source Port 21' and 'FTP login errors' (which is any of message 332, 530, or 532). The rule was defined to require more than three login attempts to allow for typos. It has reported a few dozen incidents and zero false positives. Considering that the string for the FTP password is "PASS ", I am quite pleased with this result.

A more recent rule was defined to detect an SQL Injection exploit. The exploit uses the 'cast' function to convert hexadecimal values to ASCII text. The rule's Triggers search for four instances of testing for specific values in a table column. The Action is defined to fire if any two of them is found in any order. Although the exploit is sent by a browser to a server, there is no port defined. This rule is reporting a couple hundred attempts per day and none of them are false positives.

It really got exciting when the rule captured a server sending the exploit. It turned out that the server was simply echoing what it received from the browser in certain instances. However, the web admins wanted to know what web pages this was happening for, and this is where the third design goal is demonstrated. Both halves of these sessions were reported and displayed in the playback window. This gave us the full URL data being sent to the server, and the web admins were able to address the problem quickly.

To address the issue of detecting new exploits, Realeyes has a couple of somewhat unique features. The first is statistics collection. The information maintained for each session includes the number of bytes sent and received. When a session ends, these are added to the totals for the server port. Then, three times a day, the totals are reported to the database. This allows for the busiest ports to be displayed. Or the least busy, which might point to a back door.

But there are other statistics that can be collected, as well. It is possible to monitor a specific host or port and collect the total data for each session.

For example, I monitored UDP port 161, which is used for SNMP, and saw two host addresses using it that belonged to the site, but 3 or 4 a day that did not. Since there was not much traffic for the port, I simply created a rule to capture all data for it. This showed me that there were unauthorized attempts to read management data from devices in the network, but that none were successful.

Using the same technique, I monitored TCP port 22, used for SSH, and found several sessions that appeared to be attempts to steal private keys. I reported this to the admin of the target hosts, and while he had applied the most recent patches, I also suggested that he regenerate his private keys, to be on the safe side.

Another feature for discovering new exploits is time monitoring. This is setting a time range for all activity to a host or subnet to be captured. I defined rules to monitor the EMail servers between midnight and 5:00 am. The original intent was to watch for large bursts of outbound EMail to detect any site hosts being used for spam. We have found one of these.

But we have discovered several other things from this. First was a large amount of traffic using a port that the network admin thought had been discontinued. Second, there have been several attempts to have EMail forwarded through the site's servers. This may be a misconfiguration at the sender, or it may be an attempt to send spam through several hops to avoid detection. From this I created rules to monitor for the most common domains.

So far, there have not been any earth shattering discoveries (to my disappointment, but to the the admins' relief). But as I said, the signs are promising that the system is capable of meeting the design goals. Until a couple of weeks ago, I have spent the majority of my time working on code. But I am starting to spend more time on rules. So I am looking forward to making some new discoveries. Stay tuned.

Later . . . Jim

Tuesday, September 23, 2008

My App Fails the LSB

My position on the Linux Standard Base has evolved. When I first heard about it, I was all for it. The LSB as a standard could be useful to some, but now I disagree with the goals of the LSB working group. To be sure, this post is not about dissing the Linux Foundation. They have many worthwhile projects.

What follows is my experience with the LSB Application Checker, my take on the purpose of the LSB, and my own suggested solution for installing applications on GNU/Linux distributions. The Realeyes application failed to certify using the checker v2.0.3, which certifies against the LSB v3.2. Everything that it called out could be changed to pass the tests, but I will only consider correcting a few of the 'errors'.

After building the Realeyes v0.9.3 release, I collected all executable files in a common directory tree, downloaded the LSB application checker, and untarred it. The instructions say to run the Perl script, app-checker-start.pl, and a browser window should open. The browser window did not open, but a message was issued saying that I should connect to http://myhost:8889. This did work, and I was presented with the Application Check screen.

There was a text box to enter my application name for messages and one to enter the files to be tested. Fortunately, there was a button to select the files, and when I clicked on it a window opened that let me browse my file system to find the directories where the files were located. For each file to be tested, I clicked on the checkbox next to it, and was able to select all of the files, even though they were not all in the same directory. Then I clicked on the Finish button and all 87 of the selected files were displayed in the file list window.

When I clicked on the Run Test button, a list of about a dozen tasks was displayed. Each was highlighted as the test progressed. This took less than a minute. Then the results were displayed.

There were four tabs on the results page:
  • Distribution Compatability: There were 27 GNU/Linux distributions checked, including 2 versions of Debian, 4 of Ubuntu, 3 of openSUSE, 3 of Fedora, etc. Realeyes passed with warnings on 14 and failed on the rest.

  • Required Libraries: These are the external libraries required by the programs written in C. There were nine for Realeyes, and three (libcrypto, libssl, and libpcap) are not allowed by the LSB. This means that distros are not required to include the libs in a basic install, so they are not guaranteed to be available.

  • Required interfaces: These are the calls to functions in the libraries. There were almost a thousand in all, and the interfaces in the libraries not allowed by the LSB were called out.

  • LSB Certification: This is the meat of the report and is described in some detail below.

The test summary gives an overview of the issues:
  • Incorrect program loader: Failures = 11

  • Non-LSB library used: Failures = 4

  • Non-LSB interface used: Failures = 60

  • Bashism used in shell script: Failures = 21

  • Non-LSB command used: Failures = 53

  • Parse error: Failures = 5

  • Other: Failures = 53

The C executables were built on a Debian Etch system and use /lib/ld-linux.so.2 instead of /lib/ld-lsb.so.3. The Non-LSB libraries and interfaces were described above, but there was an additional one. The Bashisms were all a case of either using the 'source' built in command or using a test like:
  while (( $PORT < 0 )) || (( 65535 < $PORT )); do

which, in other Bourne shells, requires a '$(( ... ))'. The parse errors were from using the OR ("||") symbol.

The fixes for these are:
  • Use the recommended loader

  • Statically link the Non-LSB libraries

  • Use '.' instead of 'source'

  • Rework the numeric test and OR condition

So far, all of this is doable, sort of. But every time a statically linked library is updated, the app must be rebuilt and updates sent out. Also, the additional non-LSB library is used by another one. So I would have to build that library myself and statically link the non-LSB library (which happens to be part of Xorg) in it (and that one is part of the GTK). The reason I am using it is because the user interface is built on the Eclipse SWT classes, which uses the local graphics system calls to build widgets.

The non-LSB commands include several Debian specific commands (such as adduser), and for my source packages I had to rework the scripts to allow for alternatives (such as useradd). But the other disallowed commands are:
  • free: To display the amount of free memory

  • sysctl: To set system values based on the available memory

  • scp: Apparently the whole SSL issue is a can of worms

  • java

  • psql

Finally, all of the other 'errors' are: Failed to determine the file type, for the file types:
  • JAR

  • AWK

  • SQL

  • DTD and XML

Part of the problem with the LSB is that it has bitten off more than it can chew. Apparently Java apps are non-LSB compliant. So are apps written in PHP, Ruby, Erlang, Lisp, BASIC, Smalltalk, Tcl, Forth, REXX, S-Lang, Prolog, Awk, ... From my reading, Perl and Python are the only non-compiled languages that are supported by the LSB, but I don't know what that really means (although I heartily recommend to the Application Checker developers that they test all of the executables in their own app ;-). And I suspect that apps written in certain compiled languages, such as Pascal or Haskell, will run into many non-LSB library issues.

Then there are databases. Realeyes uses PostgreSQL, and provides scripts to build and maintain the schema. Because of changes at version 8, some of these scripts (roles defining table authorizations) only work for version 8.0+. The LSB Application Checker cannot give me a guarantee that these will work on all supported distros because it didn't test them. I have heard that there is some consideration being given to MySQL, but from what I can tell, it is only to certifying MySQL, not scripts to build a schema in MySQL.

After all this kvetching, I have to say that the Application Checker application is very well written. It works pretty much as advertised, it is fairly intuitive, and it provides enough information to resolve the issues reported by tests. My question is, "Why is so much effort being put into this when almost no one is using it?"

An argument can be made that the LSB helps keep the distros from becoming too different from each other and without the promise of certified apps, the distros would not be motivated to become compliant. But I only see about a dozen distros on the list, with Debian being noticeably absent. And yet, there is no more sign of fragmentation in the GNU/Linux world than there ever was.

My theory on why UNIX fragmented is that proprietary licenses prevented the sharing of information which led to major differences in the libraries, in spite of POSIX and other efforts to provide a common framework. In the GNU/Linux world, what reduces fragmentation is the GPL and other FOSS licenses, not the LSB. All distros are using most of the same libraries, and the differences in versions are not nearly as significant as every UNIX having libraries written from scratch.

I have to confess, I couldn't care less whether Realeyes is LSB compliant, because it is licensed under the GPL. Any distro that would like to package it is welcome. In fact, I will help them. That resolves all of the dependency issues.

While I am not a conspiracy theorist, I do believe in the law of unintended consequences. And I have a nagging feeling that the LSB could actually be detrimental to GNU/Linux. The only apps that benefit from LSB compliance are proprietary apps. The theory behind being LSB compliant is that proprietary apps can be guaranteed a successful installation on any LSB compliant GNU/Linux distro. I'm not arguing against proprietary apps. If a company can successfully sell them for GNU/Linux distros, more power to them. However, what if proprietary libraries manage to sneak in? This is where the biggest threat of fragmentation comes from.

But even more importantly, one of the most wonderful features of GNU/Linux distros is updates, especially security updates. They are all available from the same source, using the same package manager, with automatic notifications. If the LSB is successful, the result is an end run around package managers, and users get to deal with updates in the Balkanized way of other operating systems. That is a step in the wrong direction.

The right direction is to embrace and support the existing distro ecosystems. There should be a way for application teams to package their own apps for multiple distros, with repositories for all participating distros. The packages would be supported by the application development team, but would be as straightforward to install and update as distro supported packages.

There is such a utility, developed by the folks who created CUPS. It is called the ESP Package Manager. It claims to create packages for AIX, Debian GNU/Linux, FreeBSD, HP-UX, IRIX, Mac OS X, NetBSD, OpenBSD, Red Hat Linux, Slackware Linux, Solaris, and Tru64 UNIX. If the effort that has gone into LSB certification were put into this project or one like it, applications could be packaged for dozens of distros.

And these would not just be proprietary apps. There are many FOSS apps that don't get packaged by distros for various reasons, and they could be more widely distributed. Since the distros would get more apps without having to devote resources to building packages, they should be motivated to at least cooperate with the project. And don't forget the notification and availability of updates.

As a developer and longtime user of GNU/Linux ('95), I believe that all of the attempts to create a universal installer for GNU/Linux distros are misguided and should be discouraged. I say to developers, users, and the LSB working group, "Please use the package managers. A lot of effort has been put into making them the best at what they do."

Later . . . Jim

Monday, September 22, 2008

What is happening to IPv6?

I have read several articles lately on the state of IPv6. Most of them take the position that implementation is moving slowly, and that is just fine. I went to a conference a few weeks ago that gave me a fairly different impression. It brought together people from the US federal government, business, and academia to describe the state of IPv6 in the government, and start a discussion of where to go next. My main interest in the conference was the security issues surrounding IPv6, and of course, there is no serious discussion about the Internet that doesn't include security. However, I did learn some interesting things about the status of IPv6.


At the time, the government was on the verge of announcing that agencies could demonstrate IPv6 capabilty. This basically amounts to the fact that they have a plan in place to only purchase equipment that supports IPv6, although some have pilot programs in place. The DoD is apparently furthest along, but is nowhere near ready to "flip the switch". However, there has been quite a bit of planning and NIST is working on setting standards for testing equipment, that will eventually lead to purchase specifications.

The government, and in particular the DoD, is not very concerned about the increased address space or encryption/authentication features of IPv6. Instead, their interest is in mobile networks, especially the auto-configuration features, and multicast. This is for using dynamic networks on the battlefield, and while they didn't give any details on the subject, it isn't hard to imagine how both features could be used. There is a pretty good article from Linux Journal on IPv6 mobility in Linux, which starts with a really good overview of the principles of using it.

Near the end of the conference, there was a panel discussion which included a couple of government representatives, a guy from Cisco and one from AT&T, and a member of the IETF IPv6 working group. The guys from industry said that there is a steady, and fairly significant increase in the IPv6 network infrastructure. If you live in the US, you are probably used to reading articles that claim deployments are almost non-existant and are surprised to hear this. However, the deployments are happening in Asia primarily, and to a lesser extent in Europe.

I don't know how true it is, but someone at the conference said that the University of Califonia system has more IPv4 addresses than the entire country of China. Even if it isn't the case, the point is well taken. Latecomers to the Internet got the crumbs of the IPv4 address space, and now they are desperate to use IPv6.

This has many implications. The entire set of TCP/IP and related protocols are very loose. In some people's minds, the rollout of IPv6 is an opportunity to get several things right that IPv4 didn't. However, implementations will not all support the same features. The reason for this is that IPv6 has a common header with very little in it other than the source and destination addresses. The new features are in extension headers, which are optional. And the protocols to be used for encryption and authentication are continuing to be developed:
  • Security Architecture for IP: RFC1825 obsoleted by RFC2401 obsoleted by RFC4301 (IPSec) and updated by RFC3168

  • IP Authentication Header: RFC1826 obsoleted by RFC2402 obsoleted by RFC4302 and RFC4305 obsoleted by RFC4835

The newest of these is dated April, 2007. And none of them specify algorithms or methods, just the requirements for using them. Meanwhile, thousands of routers for IPv6 networks have already been deployed and the number is growing steadily.

The question was raised during the panel discussion of how much assurance agencies will have that the products they purchase will support the requirements that are mandated. The answer was a resounding, "Well ..." The member of the IETF pointed out that they are not a standards committee, but more like the maintainer of the documentation. RFCs have their status upgraded to Standard by concensus. And the Asian and other countries using IPv6 as 'IPv4 with big addresses' are not interested in spending time haggling over features that interfere with their deployments.

But as I said before, my interest was in the security concerns for IPv6. The main ones are the same as in IPv4. To begin with, a very small percentage of exploits go against the IP layer. And security issues for the upper layers of the stack will not change.

But once mobile IPv6 networks with auto-configuration begin to be deployed, there will be vulnerabilities to be dealt with. Of course, the first vulnerability will be admins on the learning curve. Hopefully, these folks will be motivated and supported to get good training early on. To get a taste of actual protocol issues, read the RFC for Secure Neighbor Discovery (SEND) mechanisms.

I certainly hope that the admins don't get stuck with the bulk of the responsibility for security. But the current state of networking products, according to someone who is testing them for the NSA, is that the low level infrastructure (routers, host OSes) is fairly good, but not 100%. Even a little way up the stack, that falls to little, if any, support for IPv6. This includes firewalls and IDSes, and some server applications.

So what has to be handled? Well, the first step is to find the upper level data, such as a TCP header. This means parsing the extension headers, which is a little weird. First, the header type is not in each header, but in the preceeding header. So the first two octets of each extension header are the type of the next header and the length of the current header.

Then there is tunneling. It is possible to tunnel IPv6 in IPv4 or vice versa. What this means for finding the upper level header is that first the IPvX header must be parsed, then the IPvY header. And this could be done more than once in crafted packets. I sure hope that normal traffic at the point of a firewall would not have multiple IPv4 and IPv6 tunnels.

The other issue is the order and number of IPv6 extension headers. Most packets are only allowed to have one of each type of extension header. The exception is the Destinations header, and there may be two of these. Oh, and the order is not mandatory, just recommended.

And what is the status of Realeyes IPv6 support? I have added the capability to find upper level headers in untunneled packets. And there is some capability to parse IPv6 extension headers. In particular, the number of each type of header is validated, and for now the recommended order is treated as mandatory, so a report is issued for packets that don't follow the recommendation.

The common IPv6 header is formatted in playbacks, but there is a hex dump of the extension header data. I will be gradually adding support for tunneling and formatting of the extension header data. It is similar to TCP options, so I don't see the code being difficult. The biggest problem is testing. I configured two hosts on my local network to talk to each other with IPv6, but that is a proof of concept, at best.

So that is my perspective of what is happening with IPv6. I suggest that if you work on networks, you show more than a passing interest in it, because it will probably be a case of hurry up and wait, until all of a sudden we need it yesterday.

Later . . . Jim

Sunday, September 21, 2008

Realeyes IDS v0.9.3 Ready for Download

The packages for Realeyes IDS release 0.9.3 are ready for download. This version improves stability over the previous one and includes several new features.

If there are any problems, please notify the team. Thanks.

You may have noticed that I finally finished the First Edition of the Realeyes IDS Manual. I made it as useful as I could. But any comments are welcome.

Later . . . Jim

Saturday, August 30, 2008

Building a Debian GNU/Linux package

I recently built packages for the 0.9.3 release of Realeyes. These include both source and Debian GNU/Linux packages. For reasons that I have forgotten, I built the Debian packages before I started working on the source packages, but I'm glad that I did. Going through that process made me add several things that I would have overlooked, especially man pages.

I actually built an entire distribution as part of the process of creating my packages. It does not meet the requirements for re-distribution, but is very handy for laying down a fresh install with exactly what is needed to run Realeyes. I will provide the steps for that in another post.

The source is comparatively easy, once all of the files are collected, a tar file is created of the directory. The trick here is to verify that the C code will compile. I have read a lot of comments about how straight-forward the standard configure/make installation procedure is. But from a developer's perspective, there are several issues, mainly having to do with autoconf and automake. I don't have enough space here to discuss these, and I really don't have any wisdom to impart if I did, because I have only learned as much as I needed for my own packages.

Debian packaging is somewhat more complicated. A package that is included in a Debian distribution must go through fairly rigorous tests. There are several scripts for checking correct packaging procedures, including lintian and linda. These verify such conditions as:
  • executables are built correctly

  • man pages exist for every executable file

  • naming conventions are followed

  • Debian documentation is formatted correctly
I have seen a few HOWTOs on building a Debian package, and there is a huge amount of information on the Debian site. But I still had to piece together a working plan for myself, so I thought it would be worth sharing it. There may be better ways to do some of it, but I have written scripts that get me through the process with only a little manual effort. And even when there is an automated process, it is still good to know what is happening under the hood. So without further ado, I offer my experience.

Building a Debian Package


I. Read enough of the manuals to get a sense how Debian packages are built, and then keep links to them for reference:

II. Create a working directory for the package

  • Get a package to use as a model, using the following commands to extract the package files and the Debian metadata files:

    apt-get -d -y --reinstall install package_name
    dpkg -x package_name
    cd package_dir
    dpkg -e ../package_name

  • Create a working directory, and under it make the directories to be installed for the package, even if there are no files to be saved in them. These may include:

    working_dir/etc/package_name
    working_dir/etc/init.d
    working_dir/usr/sbin
    working_dir/usr/share/package_name
    working_dir/usr/share/doc/package_name
    working_dir/usr/share/man
    working_dir/var/log/package_name

  • Make the Debian control directory with its files (as needed)

    working_dir/DEBIAN

    • control: This contains the description of the package, including dependencies, architecture, and the description used by aptitude or synaptic -- use the model to create this for the first time

    • conffiles: This contains any configuration files installed with the package -- I put mine in /etc/realeyes

    • preinst: This is a shell script that runs before the package is installed, if it exists

    • postinst: This is a shell script that runs after the package is installed, if it exists -- I use it to create user IDs

    • prerm: This is a shell script that runs before the package is de-installed, if it exists

    • postrm: This is a shell script that runs after the package is de-installed, if it exists

  • Populate the directories with the application files: Use the model to help understand what goes where

III. Use the maintainer tools to verify package acceptability
  • Build the package:
    cd working_dir
    dpkg-deb --build package_dir package_name
  • lintian/linda: Check for package discrepancies. Note that lintian and linda are not in the standard package and must be installed separately. Lintian uses Perl and linda use Python, so there may be several dependencies installed with them.
    lintian -i package_name > package_name.lintian
    linda package_name > package_name.linda
  • Fix all the problems, here are some helpful hints from my experience:

    • Man pages: txt2man is a program that takes ascii text and converts it to a man page. It works for simple pages, but the resulting groff file may have to be edited manually in some cases. Use 'gzip -9' to compress man pages.

    • Compiled programs: Compiled programs must be stripped. Use the command:
      install -s
    • Identify all non-executable files in system directories (ie. /etc/package_name) in the package's DEBIAN/conffiles.

    • Lintian provides the section in the Debian Policy manual that describes the requirement that was flagged.

  • Sign the package: Create a GPG key for the package and sign each package with the key
    gpg --gen-key
    dpkg-sig -s builder pkg.deb
    NOTES:

    • The public keyring is in $HOME/.gnupg/pubring.gpg

    • There should be a lot of entropy on the system to help the random number generator, (grep -R abc /usr/* seems to work well)

    • Issue the command 'cat /proc/sys/kernel/random/entropy_avail' to find out how much entropy is currently available, it should be at least above 1,500
At this point, the package can be installed using the dpkg command. However, if there are dependencies that must be installed, dpkg will issue a warning about them, but does not handle their installation. So if you want to go to the next level, here is what you have to do.

IV. Repository directories

A Debian repository has a relatively simple directory structure to maintain the packages and metadata about them. An installation ISO is basically a repository tree with just the stable packages. A custom repository tree can be added to the apt sources.list to be accessed just like officially maintained packages, with aptitude or synaptic.

In the top repository directory, the following are mandatory:
  • md5sum.txt: The list of all files in the tree with their md5 checksums

    To create the md5sum.txt file for my mini-distro, I wrote a script that ran in the top ISO directory. It did a recursive ls, ran md5sum on all regular files, and wrote the output to the md5sum.txt file. I keep that as a template and only update the files that change.

  • pool: The subdirectory where packages are kept. Under the pool directory, there are a few pre-defined directories where the different categories of packages are kept. Anyone who has edited a sources.list file has seen most of these:

    • main: Technically, these are packages that meet the Debian Free Software Guidelines (DFSG), but I think of them as the officially maintained packages

    • contrib: Contributed packages are DFSG, but depend on packages that are not -- I use this for my own packages, even though I don't have any non-DFSG dependencies

    • non-free: These are non-DFSG packages

    The structure of each of these is the same. The package directories are in subdirectories named with the first letter of the package name. The exception is libraries, which are under directories named libletter, which is the prefix of the library package name. Below these subdirectories are the directories with the actual package files.

  • dists: The dists directory contains the metadata about packages. There is a directory for the distribution, in this case, etch. In an ISO, there are also the directories, frozen, stable, testing, and unstable, which are links to the distribution directory. In a repository, these may have their own files. But for my purposes, I only include the distribution subdirectory.

    Under the distribution subdirectory are the following:

    • Release: This file describes the packages, including the architecture, the components, and contains md5sums for the package metadata files -- the file information also includes the file size, and since there are only a few of these, I created the original by hand

    • main: This directory contains the metadata about the main packages

    • contrib: This directory contains the metadata about the contrib packages

    The structure of main and contrib is the same, and again, I only use contrib. The architecture directories are below contrib, and in my case, there is only binary-i386. In the architecture directory there are three files:

    • Packages: This uses information from the DEBIAN/control file and adds such things as the full path of the package file

    • Packages.gz

    • Release: This contains metadata about the contrib directory

  • I also put a few optional files in the top level directory. These include a copy of the GPL (I use version 3), installation instructions, and the public key for the signed packages. The installation instructions explain how to add the public key so that aptitude and synaptic can validate the packages.

V. Add the packages
  • Copy the packages to the appropriate pool directory. In my case, this means copying them to:

      iso_dir/pool/contrib/r/realeyescomponent

  • Create an override file for the packages. This consists of a line for each packages with the following information:

      package priority section

    In my case it looks like this:
    realeyesDB   optional  net
    realeyesDBD optional net
    realeyesGUI optional net
    realeyesIDS optional net
    The man page for dpkg-scanpackages (the next command) has a description of each field and says that the override file for official packages is in the indices directory on Debian mirrors.

  • Build the metadata:
    dpkg-scanpackages \
    pool/contrib/ override.etch.contrib > \
    dists/etch/contrib/binary-i386/Packages
    cd dists/etch/contrib/binary-i386
    gzip -c -9 Packages > Packages.gz
    cd ../..
    md5sum contrib/binary-i386/Packages* > md5.tmp
    ls -l contrib/binary-i386/Packages* >> md5.tmp
  • The file, md5.tmp, is edited to put the file size after the md5 checksum and before the file name, and the file listing lines are deleted. Then the file, Release, is edited to read in the file, md5.tmp, at the end and the duplicate lines are deleted.
    gpg --sign -ba -o Release.gpg Release
    cd ../..
    cp -a ~/.gnupg/pubring.gpg RE_pubring.gpg
    md5sum ./INSTALL* > md5sum.txt
    md5sum ./GPL* >> md5sum.txt
    md5sum ./dists/etch/Release >> md5sum.txt
    md5sum ./dists/etch/contrib/binary-i386/* >> md5sum.txt
    md5sum ./pool/contrib/r/*/* >> md5sum.txt

VI. Installing the package

The instructions for installing this package are:
  • Copy the debian packages to a directory that will be used for the initial installation and future updates, such as, /var/tmp/realeyes. Untar the packages:
    tar xvzf realeyes_debian.tar.gz
  • Change to the top level packages directory and add the public key file, RE_pubring.gpg, to the Debian trusted sources with the command:
    apt-key add RE_pubring.gpg
  • Edit the file /etc/apt/sources.list to add the line:
    deb file://install_dir/realeyes/ etch contrib
  • Update the package lists with one of the following methods:

    • On the command line enter:
      apt-get update
    • In aptitude, select Actions -> Update package list

  • Install the package using aptitude or synaptic
So there you have it. I hope it shortcuts the learning curve a little.

Later . . . Jim

Saturday, August 23, 2008

Security Unobscured

I just read this post about the state of information security. There are two main points that seem to be related, and a third that is implied.

First, Jason feels that security professionals are not being taken seriously by other IT professionals. Maybe it's because I live in the Washington, DC area, but I don't have that impression. However, if it is true at all, I can't help but think that it is because of some people in the field shouting, "The sky is falling," combined with an ever increasing number of real security breaches that are poorly addressed, ranging from web based exploits and phishing scams to stolen laptops containing the unencrypted personal information of thousands of people. It makes the general public wonder, "What do those security people do?"

I had a discussion with an IT professional the other day about a recent SQL injection exploit. Over time, the lower levels of the stack, including the operating system and server daemons, have become fairly hard targets, while applications are still pretty soft. And predictably, the attacks are moving up the stack. A large part of the problem is that many (if not most) programmers have little understanding of the infrastructure on which their application is built, in this case, databases. But it is left to the programmers to test the data for SQL injection attempts. Why isn't the security community clamoring for database APIs to include a higher degree of data assurance, as well as more security in other APIs?

And that segues into Jason's second point, "the market is flooded with these so called CISSP certified IS professionals". I get the impression that he considers them to be charlatans. And in the era of terrorist attacks and Sarbanes-Oxley, there are IT professionals and managers who are wary of buying IS snake oil. Referring back to my comment about database APIs, the IT security field must be seen to be solving security problems. Instead, vendors and consultants often seem to be peddling their own wares, while the security of people's information is a secondary concern. In fact for some, the increase of real threats almost seems to be a marketing tool.

And that brings me to the third point that Jason implied when he said, "I remember a day when security people were feared." Having been on both sides of this fence, I see a tension between IT security and performance, with performance almost always being the top priority. In my experience, the security people don't get their own budget, they get the leftovers from the system and network budget -- the exception being financial institutions and some government agencies. After all, security is a lot like insurance, and everyone prefers the cheapest insurance, especially when the economy is down.

While I agree with Jason that IT professionals should take the security professional's concerns seriously, I think it is even more important for security people to focus on solutions. Most IT people look at security breaches the same way they look at hardware failures, a problem to be solved as quickly as possible, so that the real work of their environment can continue. The security people will get more respect by helping to keep these to a minimum than by coming on as tyrannical hall monitors.

We all know that there is no security silver bullet, it requires that the people using and maintaining the systems do the right thing. But if they don't know what the right thing is, because of poor training, being unaware of potential hazards, or lack of information about their environment, the security professionals have to take some responsibility for it. Computer technology is still a relatively young field, and both innovation and expansion are changing the landscape every year. The job of security professionals will continue to be providing security solutions for current conditions by doing research, explaining the findings, and developing tools to help the people in the trenches.

My own approach is to give the IT people as much useful information as possible. The reason I started Realeyes was to reduce false positives and provide the admins with enough data to make quick decisions and increase awareness of what is happening in their networks. They know the systems and networks best, so it is my job to give them the best tools to do their job right.

If security people want attaboys, we have to provide more light and less shadows.


Later . . . Jim

Friday, August 22, 2008

Messages

I recently saw a question on Slashdot asking how to handle application messages. Most of the responses were along the lines of "only output what is important". Of course that implies that the programmer knows what is important to every user, which isn't very likely.

In Realeyes, the approach is different. First, there is a message for just about everything except normal data collection and analysis. This includes parsing configuration files, network connection activity, administrative commands, and, of course, errors. Each message is assigned a type code, such as critical, error, warning, informational, etc. Then, in the main configuration file, it is necessary to explicitly request for warnings and informational messages to be logged.

This way, a newcomer to the application can see all messages to get a sense of if and how the program works. When the repetitious messages are no longer useful, they can be ignored. Although the warnings may be helpful in troubleshooting, they generally relate information about configuration issues that are already known, and could become a bit irritating, so the user is given the choice of recording them or not.

Of course, error messages are not optional. And there is a NOTE message type, that is not optional, used for things like the startup and shutdown messages.

If you are interested in seeing how this is coded, check out the files, RealeyesAE/src/rae_control.c and RealeyesAE/include/rae_messages.h, in the Realeyes subversion repository. In rae_control.c, the function, control_messages, handles writing the log file. And down about line 380, there is a message.

This brings up a couple of points. First, the application is actually a collection of processes. The child processes do not write messages to disk. Instead, they create the message in shared memory and put it on a queue using the macro, raeMESSAGE, which is defined in rae_messages.h. The parent process (called the manager) periodically checks the message queues and writes messages to the log files. It also has a shutdown function that is called even after a system interrupt, which prints messages one last time.

Second, the message documentation is in the source code. I use doxygen to create program documentation. I found a way to create separate files for inline documentation and that is how the message logs are created. This makes it a lot easier to keep messages up to date, and to be sure that all messages are documented. Unfortunately, Dmitri did something after version 1.3.6 of doxygen, so that this no longer works. For now, I keep an older version of doxygen around, but eventually, I plan to create a script to handle it. In that, I would like to output messages in both HTML and ODF.

If this has been of interest and you have any tips for making maintenance of applications more efficient, please share.

Later . . . Jim

Monday, August 4, 2008

Program Security

Good security practices are multi-layered. The levels that are addressed in the Realeyes application are:
  • code vulnerabilities

  • program interaction

  • privileges and access

Code vulnerabilities are bugs that can be exploited to gain control of a program simply by interacting with it. Therefore secure programming starts with good coding practices. I saw David A. Wheeler present his HOWTO on secure programming practices, and highly recommend it. He covers issues in reading and writing files, how to prevent buffer overflows, user privileges, and much more. I was glad to have seen his presentation early in the design phase of the Realeyes project.

The only problem with his, and almost every other tutorial/reference I have ever read, is that it covers so much ground that each individual topic is a little short on detail. Even my favorite reference books by W. Richard Stevens leave a lot of code as an exercise for the reader. I am not going to write a tutorial, but the way issues specific to the Realeyes application have been handled are detailed implementation examples. Where files are referenced, each file path is relative to the subversion repository for Realeyes. And BTW, I have a pretty good background in this, but I am sure that there are others who have more than me, so I would be happy to hear of any suggestions for improvements.

There are four components in the Realeyes application: the IDS, the database, the user interface, and the IDS-database interface (database daemon or DBD). Each has unique security issues, including interacting with other components.

The database is where user IDs are maintained. In the PostgreSQL database, it is possible to create groups that are granted specific access rights to each table. Then each user is assigned to a group and inherits those rights. The groups in the Realeyes database schema are defined in RealeyesDB/sql_roles/create_roles.sql and include:

  • realeyesdb_dbd: Only used by the DBD program to insert data from the IDS sensors and retrieve commands and new rules to be sent to the sensors

  • realeyesdb_analyst_ro: An analyst-read-only can view data and rules, and produce reports

  • realeyesdb_analyst: An analyst can view data and rules, create incident reports, and produce reports

  • realeyesdb_analyst_dr: An analyst-define-rules can do everything an analyst can, plus define new rules

  • realeyesdb_admin: An application administrator can do everything an analyst-define-rules can, plus create and modify new users and other application information

The user interface and DBD are written in Java, and connect to the database directly. The connection is always encrypted. So this layer of security is an administrative issue, to make the database host as secure as possible. The only additional feature offered here is the selection of a different listening port for the database than the default. The classes that interface with the database are in the files RealeyesDBD/DBD_Database.java and RealeyesGUI/Database.java.

One issue that was raised early in the pilot project I am running at a local college was how secure the captured data is. The data is stored in the database but not in raw form. The user interface reads this and reformats it, but does not print that to a file. An analyst could cut and paste the data, so that becomes a personnel issue. I have debated whether to provide the capability to write to file, but for the time being am leaning away from it. The user interface does generate summary reports and writes those to file, but that does not include any of the captured data.

The DBD connects to both the database and the IDS sensors. The IDS connection is optionally encrypted so that if it and the DBD are on the same host, the encryption overhead can be eliminated. Also, the address of the DBD host must be defined to the IDS sensor for the connection to be accepted, and the ports that are used are defined in the configuration. A sample configuration is in the file RealeyesDBD/sample_config/realeyesDBD.xml, and the code to parse it is in RealeyesDBD/RealeyesDBD.java.

One big issue I discovered regarding Java and encrypted connections is that, in JRE 1.4, it is possible to maintain multiple connections using the SocketChannel class, and it is possible to encrypt them using the SSLSocket class. However, it is not possible to do both at the same time. In JRE 1.5, the flaw is addressed, but the solution is very ugly. It essentially requires an application programmer to write a TCP/IP API. The argument for this is that Java might be used on networks other than TCP/IP, so the solution must be broad. Hopefully, JRE 1.6 will provide a solution for 99.99% of Java application programmers, and then the remaining 0.01% will still have a partial solution for their needs.

While I have considered porting the DBD to C++, the current code handles this in two ways. To begin with, there are two connections between the DBD and each IDS sensor. One is for data and the other is for control information. The data connection is handled by starting a thread that is dedicated to that connection, and that code is in the file RealeyesDBD/DBD_Handler.java. The control information is more sporadic, which is why I would have liked to use the SocketChannel selector. The workaround is to have the DBD poll each IDS sensor every 8 seconds if there has been no activity on the connection. The code for this is in the file RealeyesDBD/DBD_Control.java.

The IDS is a collection of C programs. These are started by running 'realeyesIDS', which spawns child processes. I discussed the reasons for choosing interprocess communication over threads in "Loose Threads". The main process, called the Manager, and all but one of the child processes run under the superuser ID. This is because they use a large shared memory buffer, and the way that is built it can only be accessed by the superuser. The files that handle managing this buffer are:
  • RealeyesAE/src/rae_mem_ctl.c: Contains the code that the Manager uses to allocate, initialize, and do garbage collection for the buffer

  • RealeyesAE/src/rae_mem_mgmt.c: Contains the code that all processes use to allocate and free individual buffers

  • RealeyesAE/src/rae_lock_mgmt.c: Contains the code that all processes use to prevent memory locations from being changed incorrectly

The process that communicates with the DBD, called the Spooler, is designed with several security features. First, as the Spooler is started, it changes the user ID to one which has very limited access. It then changes the current directory to one that contains only the files it uses, and sets that to its root. The only communication from the Spooler to any of the other processes is through pipes, which means that it is serialized and straightforward to validate. Finally, the configuration file specifies the DBD host, and only connections from it are accepted.

The files that handle the Spooler communications are:
  • RealeyesIDS/data/rae_analysis.xml: The configuration file where the DBD host is defined (it is the manager's configuration file and contains a lot more, but the Spooler definitions are in their own section)

  • RealeyesIDS/src/rids_spooler.c: The Spooler initialization function where the user ID and directory are set is near the end of the spooler_init function

  • RealeyesIDS/src/rids_net_mgmt.c: The Spooler is the only process to use the network management functions, including the SSL setup, the listener which validates the connection request, and the exchange of data

Ultimately, the best way to program security is to think about how to exploit vulnerabilities in the code. And since the purpose of the Realeyes IDS to detect exploits, I spend a lot of time thinking about it in general. So I have a fair amount of confidence that it is a good example of a securely coded network application.

Later . . . Jim

Saturday, July 26, 2008

Modularity

Realeyes was planned with the intention of supporting IPv6, and now that the basic functionality is in place and (mostly) working, I am adding full support for it. This means several things, including:

  • Deploying a Realeyes IDS sensor on an IPv6 network

  • Analysis of IPv6 packets by the IDS application

  • Inserting IPv6 addresses and data in the Realeyes database

  • Defining rules for IPv6 addresses

  • Displaying IPv6 addresses and headers from the user interface


I will describe this in more detail in a later post, but for the moment, I need a motivational boost, so I decided to give myself a pat on the back.

The way that IDS functionality is added is through plugins that perform specific functions. At this point, data collection and high level analysis are essentially complete. I am adding a few new features in the IDS only to the session handler and low level analysis plugins. In Realeyes terminology, this is the Stream Handler and the Stream Analyzers.

The Stream Handler parses the IP header to find the session ID, and set the location of the payload, such as TCP or UDP headers and their data. I set up two hosts on my local network for IPv6 connections. On a Linux system, this is as simple as issuing an ifconfig command on both systems and, for ease of use, adding the remote host to the /etc/hosts file:

    host100> ifconfig eth0 inet6 add fec0:0:0:1::100/64
    host100>/etc/hosts:
      fec0:0:0:1::200 host200


    host200> ifconfig eth0 inet6 add fec0:0:0:1::200/64
    host200>/etc/hosts:
      fec0:0:0:1::100 host100



Next, I established SSH and FTP sessions between them. I had the code to find the payload written, but this was the first time I had tested it. It took a couple of tries to get it right because the way IPv6 extension headers work is a bit tricky. But when I actually captured some sessions, they were displayed correctly in the user interface.

I then added the code to display the IPv6 headers in the user interface. This formats the main header and each extension header using human readable field names followed by the actual values. Because the header type of each extension header is in the previous header, this was also a little tricky to get working.

The IDS Stream Handler is also where IP fragments must be reassembled. I was really happy that after copying the IPv4 reassembly code and changing all instances of "v4" to "v6" and handling a couple of variables differently, the IPv6 reassembly worked. This is an example of the value of modularity and the use of variables in code.

As an aside, I learned this lesson in my first year of Computer Science. The grad student instructor had us program an assembler that could handle about 8 operations, each with one operand of 6 characters. The next assignment was to modify the assembler to add a couple of operations, some of which took two operands, and the length of operands increased to 8 characters. Those who hard coded the original assignment had a lot of work to do (I had hard coded some things, but not all). And yes, the third assignment was more of the same.

I hated that guy (as did most of my classmates), because while the lesson was legitimate, laboratory exercises are not applications that will grow in the real world. As first year students, most of us did not have enough experience to develop programs with even that level of sophistication, and he did not recommend that we incorporate it in the assignment. And with other classes to deal with, getting a single program to work at all took time that was in limited supply. His excuse was, in the real world we would constantly be faced with changing requirements at a moments notice, and thus he was doing us a favor.

Having been out here for over 20 years, I have yet to run into anything remotely like what he described, although I do make it a point to be anal about getting adequate requirements descriptions up front. The people I have worked for wanted me to succeed, because it reflected on them. Some were more helpful than others, but I have never worked on a task where the requirements changed wildly, making it impossible to complete. Maybe I'm just lucky.

Incidentally, the way I tested reassembly was to use the latest version of netcat, which supports IPv6 sessions. I created a file that had over 8K of test data, and then sent it over a UDP session, which tried to send the entire file in a single datagram, and forced the TCP/IP stack to fragment it:
    server> nc -6 -u -l -p 2000 fec0:0:0:1::100

    client> nc -6 -u fec0:0:0:1::200 2000 < test.data

Anyhow, I am now working on analyzing the IPv6 extension headers and expect that to be done within a week. After some tidying up, I will be building a new package for download with IPv6 and several other new features. So, back to work.

Later . . . Jim

Wednesday, July 9, 2008

Introducing, Your Network

For the first three months after I started working on a government network security team I had nightmares. Then I decided that the grunts on the other side were probably in the same boat as we were--undermanned, underfunded, and misunderstood. Whether it's true or not, it let me sleep at night.

But the reasons I was so unsettled didn't go away. I tell people that most security analysts (and I include system and network admins who take responsibility for security in this group) are very good at the easy attacks, probably catching close to 100% of them quickly. They are pretty good at the moderately sophisticated ones--I would guess that well over 50% are eventually caught, although there is a fair amount of luck involved in a lot of those.

But what has always bothered me is that we don't have any idea of how much we don't know. Honeynets are easily avoided by the most sophisticated attackers, especially since they are focused on specific targets in the first place. What makes a target specific? Look at brick-and-mortar crime to get a sense of the possibilities. Based on this, I am guessing that because of the skills required, the problem is not rampant but is still significant.

So one of my goals in developing Realeyes was to provide a tool for security analysts to dig into the goings on in their networks. Those who are familiar with a network can tell when conditions just don't feel right, but what can they do with limited time and resources? After running Realeyes in a pilot program for over six months, it is beginning to deliver on its potential to be that tool.

The site's network admin had periodically seen large bursts of outbound traffic from several EMail servers in the early morning and suspected that there were some hosts being used to send spam. I defined rules to report all of the EMail server traffic between midnight and 6:00 am. Unfortunately, the school year ended shortly after the monitoring was set up and there has not been any of the traffic that it was defined to capture. However, there have been some interesting results.

Initially, there were a lot of reports of automatic software updates, so I used the exclude rule to prevent certain host or network addresses from being reported. It turned out that a couple of these servers were also listening on port 80, so there were a lot of reports of search engine web crawlers. These were eliminated by defining rules with keywords found in those sessions with the NOT flag set to cause the sessions to be ignored.

There were several 'Invalid Relay' errors issued by EMail servers, and some of the emails causing them were sent from and to the same address. At first I created a rule to monitor for the invalid relay message from the server. This captured a lot of simple mistakes, so I have started defining rules to capture email addresses that are used more than once. What I am trying to do is refine the definition of 'probes' which can then be used for further monitoring.

The further monitoring is accomplished using the 'Hot IP' rule. When a Hot IP is defined for an event, all sessions of the IP address (source, destination, or both) specified are captured for a defined time period after the event is detected, 24 hours for example. Using this technique, I have recently seen one of the probing hosts send an HTTP GET request to port 25, as well as some other apparent exploits.

This process is more interactive than the method used by most security tools. But by giving more control over what is monitored to those who know the environment best, I am trying to help build a better understanding of how much we don't know. And I hope that lets more good grunts sleep better at night.

Later . . . Jim

Monday, June 23, 2008

Loose Threads

Realeyes is a somewhat complex application, both in terms of the number of components that interact with each other (4), and the complexity of those components, in particular the analysis engine/IDS, but also the database and database interface. The analysis engine and IDS are written in C, while the database interface and user interface are written in Java.

When I was planning the design of the analysis engine, I knew from the start that there would be multiple processes running. That left me with the decision of whether to use threads or interprocess communication. I know from painful experience that writing thread-safe code is hard (the TCP/IP stack written in System/390 assembler that I helped maintain). Therefore I chose to use interprocess communication. I actually had several reasons for choosing this over threads:

  • Writing thread-safe code is really hard.

  • Threads share the same address space and, while all analysis engine processes share some memory, some of them also use significant amounts of unshared memory. I was concerned that this might lead to the application running out of virtual memory.

  • For security reasons, the external interface runs under an ID that has lowered access and in a chroot jail. This means that interprocess communication would have to be used for at least this function.

  • The pcap library for capturing network traffic from the interface was going to be used, and I was pretty sure it could not be used in a threaded process.

  • I wanted to be able to control the priority of processes dynamically, and while the pthread_setschedparam man page says, "See sched_setpolicy(2) for more information on scheduling policies,", there is no man page for sched_setpolicy (I have searched the web for it).

  • Writing thread-safe code is really hard.


Long after going through this thought process, I discovered this paper by Dr. Edward A. Lee at UC Berkeley that supports my reasoning. After performing a formal analysis to prove that writing thread-safe code is really hard, Dr. Lee recommends that code be written single-threaded and then elements of threading (or interprocess communication) be added only as needed. Thank you, Dr. Lee.

This left me with the decision of which IPC techniques to use. There are essentially three:

  • Pipes

  • Message queues

  • Shared memory


I read an article about a test that compared the three (which I cannot find now) and shared memory won hands down (an order of magnitude faster, as I recall). Therefore, while pipes are used in the analysis engine to transfer small pieces of data or low priority information, shared memory is the primary mechanism.

Of course, shared memory is the most difficult to program because it requires a way of guaranteeing that the data stored in every memory location is correct at all times. This is handled in the analysis engine by all of the following methods:

  • Assigning memory locations to a single process that others cannot access

  • Using locks (or semaphores in glibc-speak) to serialize access, which means the operating system allows only the process holding the lock to access the locked memory location

  • Using a mechanism similar to locks (but without the overhead) to serialize access


The center piece of this is the memory manager. When the application starts, a single large block is allocated and made non-swappable. This means that the application never has to wait for a block to be swapped in from disk, which is not done for memory allocated by a process for its own use. This block is chopped up into pools which are in turn chopped up into buffers. (Note: This is an oversimplification, see the analysis engine slide show on the Realeyes technology page for more detail.)

The memory manager sets an "in use" flag to indicate that a buffer is being used, and clears it when the buffer is released. Each level of the analysis engine uses specific structures, and when the "in use" flag is set for a buffer, other processes are not allowed to access it unless the structure is explicitly passed to them. This is the way the first access method is implemented.

The second access method is actually used by the memory manager to obtain or release a buffer. But it is also used by processes to modify structures in memory that could be potentially modified by two processes simultaneously. Most books on programming with semaphores usually start by saying that POSIX semaphores are overly complicated. I don't disagree, but after a little experimentation, I simply wrote a set of functions to initialize, get, free, and release a single lock. As it turned out, my first attempt did not work well across all platforms where the application was tested. But the correction was basically confined to the functions, with only the addition of an index parameter to one of them that meant changing about a dozen calls in the analysis engine code.

The third access method is very much like message queues, but with the performance of shared memory. When a process has information (in a structure) to pass to another, it puts the structure on a queue that only it may add to and only one other process may remove from. The rule governing most of these queues is that the first structure in the queue may only be removed if there is another one following it. In programming terms, there must be a non-NULL next pointer. So the first process modifies the structure to be added, and the very last step is to set the pointer of the last item in the queue to the new structure's address.

Special handling is necessary for some queues. For example, if there is very little activity, a single structure could be on a queue by itself for a long time (in computer cycles). This is handled in some queues by adding a dummy structure after the one to be processed after a brief wait (maybe a hundredth of a second).

A side effect of the choice of processes over threads is that it is much easier to monitor a process than a thread. It is also quite a bit more straightforward to use a debugger on a process. So, all things considered, I recommend this over threads unless there are strong reasons against it.

Finally, I have to say that the Java code does use threads. However, they are treated like separate processes in that they don't share memory. All data is passed in arguments to the methods being called or in the method return value. This eliminates the most problematic aspects of making code thread-safe, but (I have discovered) not all of them. The other issues are memory-related, but it is memory that the application does not control, such as the window in which the application is displayed, or network connections.

Overall, I agree with Dr. Lee when he says that threads "are wildly nondeterministic. The job of the programmer is to prune away that nondeterminism." And I don't find it to be too much of a stretch when he continues that, "a folk definition of insanity is to do the same thing over and over again and to expect the results to be different. By this definition, we in fact require that programmers of multithreaded systems be insane. Were they sane, they could not understand their programs."

Later . . . Jim