Tuesday, November 18, 2008

Take Five

Jazz fans will recognize the title of this post as one of the most famous jazz pieces ever written. It was composed and performed by the Dave Brubeck Quartet and was part of the album Time Out, which contained several pieces in unusual time signatures.

This is one of a wide range of music that I enjoy, which taken as a whole is called progressive music, although it used to be called having eclectic taste. Regardless, I simply enjoy hearing boundaries being stretched. And when I need a break from programming and network data analysis, I noodle around on the piano trying to stretch my own boundaries.

As a side note, I dropped out of college to pursue the dream of becoming a professional musician. When that bubble burst, I discovered a love of computers on a friend's Apple II, and went back to get a Bachelor's in Computer Science. When I was interviewing for my first job after college, I had to explain the gap, with some embarassment. But the interviewer told me that some of their best developers are musicians, which I have found to be true over the years. From then on, I have been proud to tell people that I play piano and guitar.

Sometimes my noodling turns into a song. In the past, many such songs have disappeared into the ether when I stopped playing them. But because I do my development on Kubuntu Linux, I have access to the many Free/Open Source multimedia applications that have been developed. So after the latest Realeyes packages were up on SourceForge, I decided to save my current repetoire for posterity -- a personal 'Kilroy Was Here' carved on the wall of time.

First I transcribed the music. To do this, I used two applications in conjunction with each other, NoteEdit and Lilypond. NoteEdit has a graphical interface that makes it possible to enter the music visually. Lilypond uses text files to engrave musical scores. NoteEdit exports Lilypond definition files, as well as MusiXTex, ABC, PMX, MusicXML, and Midi files.

I have tried to play Midi files using my SoundBlaster Live unsuccessfully in the past. I suspected that it was a sound font issue, but had not taken the time to research it. This project motivated me to deal with it. And it is amazingly easy. There is a program called sfxload (or in some cases, asfxload), which, in Kubuntu, is part of the awesfx package. It has a good man page, but the command I use is:
    sfxload -v -D -1 sound_font_file
and the sound font files are on the SoundBlaster installation disk with a 'SF2' extension.

With Midi working, I could transcribe my song and listen to it immediately, which made finding errors a lot easier than trying to edit the music itself. Also, it turns out that my music is a bit more rythmically complex than I realized. There are several switches between 3/4 and 4/4, and even a few measures of 5/4 and 13/16 (seriously!).

As another side note, I remember the laments of musicians when electronic music equipment reached a point where it could be used to produce the sounds of acoustic instruments. However, my experience is that it takes a whole lot of work to create the tempo and volume dynamics of a live performer. I only put enough time into it to be able to verify that the notes and tempos were correct, and that took a couple of weeks just for a dozen piano pieces. So, between the amount of effort required to produce electronic music and the many benefits of live performances, I expect that we will be listening to performers for as long as we enjoy music.

When I exported to Lilypond, there were several errors, so I had to learn the syntax of the Lilypond definition files. NoteEdit puts the measure number in a comment which makes cross referencing much easier, as well as keeping the learning curve shallower. Also, the Lilypond forums are excellent. One of my questions received two good answers in less than 24 hours. Since I consider myself a member of the Free/Open Source community, I created a small NoteEdit file with the errors I found, exported it to Lilypond and corrected the syntax, and opened a bug report which included both files.

Lilypond does a beautiful job with the music, exporting it to PostScript and PDF files. The only thing I was unable to do was to create a book of multiple pieces where each piece started on a new page. The examples I followed had a piece starting on the same page as the end of the previous piece if there was enough room.

Doing what I wanted might be possible, but I simply converted each piece from PDF to PNG using the ImageMagick convert command. I then loaded the PNG files in the Gimp. I have found the Gimp to be great for quick conversions and editing. For example, I used it to convert color pictures to grayscale for the Realeyes IDS System Manual.

With the pictures of the music cropped, I imported the images into an ODF file created in Open Office. This made it easy to lay out the book, including creating a title page and table of contents. And, of course, the entire manuscript could be exported to PDF format. I am self-publishing the collection at lulu.com and they prefer PDF files.

But I wanted people to hear the music as much as having it transcribed. So I started learning how to use Audacity. This was a very shallow learning curve. The most difficult part was getting my microphone to provide decent input. I had to buy a low-end pre-amplifier (about $70 US), and then after an hour of experimentation, it was working fine. The Audacity interface is extremely intuitive, so I was able to start recording right away. The only times I referred to the documentation were to find out what some of the more esoteric effects were supposed to do.

I used the following effects, in the order listed, for every piece:
  • Noise Removal

  • Amplify

  • Normalize
For some pieces, I pasted the best parts of multiple takes together. To make this work, I used additional effects:
  • Bass Boost

  • Change Tempo
The one piece that I attempted to sing required a lot of editing. To make it less painful than it would have been, I also used Change Pitch and Echo.

I have built a personal web site to showcase the SO Suite (word play on the titles). The recordings are in MP3 format. Unfortunately, the web site does not offer shell accounts, and they only support a limited number of audio files formats, and Ogg is not one of them. If anyone knows of a site that would host my amateur performances, I would be happy to upload the 23Meg of Ogg files.

Now, even if I don't sell any manuscripts, at least I can share my music with family and friends. And it won't be totally lost if I stop playing it. Break time is over, got to get back to work.

Later . . . Jim

Thursday, October 30, 2008

More Results from Realeyes

For the past few weeks, I have been learning a lot about the site where the Realeyes pilot project is being run. After seeing several reports of incidents from Europe and Asia, it occurred to me that I could create a rule to monitor non-US IP addresses.

To do this, I got the IANA IPv4 Address Assignments, and created a list of the high order octet assigned to each of Europe, Asia, Africa, and Latin America. The rule was simply a SYN packet and any match on the first octet of the source address with a value in the list.

At first, I simply turned it loose, which generated over 20,000 reports. I was able to reduce that quickly by filtering on the "Referer:" field. First were the requests being referred by one of the site's own web servers. Then I found other sites, such as Google, that were referring browsers to the monitored network. These were all defined in a single Action , which was then defined in the Event with the 'NOT' flag set. This resulted in about 5,000 reports which have been further reduced by adding some of the site's commonly requested web pages to the filter.

The rule is now: Any connection requested by an IP address in Europe, Asia, Africa, or Latin America that is not referred by a site server or one in a list of other servers, and is not requesting one of a list of web pages. If a match is found, the first 64k of both halves of the session are reported. I was thinking of adding a filter where the web server responds with a 200 message, but that could miss a successful exploit.

Of the reported sessions, many are web crawlers for various international search engines. A large number are being referred by other servers. And a fair number appear to be overseas students. Many of the web crawler and overseas student connections consisted of over 100 sessions. Using the 'Ignore Source Address' option, I could close the incidents for a single source IP address without creating a report in a single click. This allowed me to reduce the reports by 2,000 - 3,000 fairly quickly.

And that left me with about 1,000 connections of 1 - 5 sessions each. It was easy to display the playback window and see the client request and the server response and make a decision on whether it was valid or not. Usually, the server responded with a 200 code and sent the requested page. I was able to check about 10 of these per minute, so it only took a couple of hours to run through the entire list.

As far as invalid activity, there have been several targetted scans. By this I mean that the requests are only sent to web servers, and they actually make HTTP requests. These were easy to see by sorting the reports on the source IP address and looking for connections to multiple servers.

The most interesting one was 'GET /manager/html'. This appears to be a Tomcat exploit which tries to gain access to the administrator account. Of the dozen web servers that received this request, all but one replied with 404 "Not Found". The other one replied with 401 "Unauthorized" and the source host then sent over 150 variations of the authorization code field. The codes were mixtures of numbers and mixed case letters that looked like they were taken from a table. Some were as long as 25 characters, while others were only 5 or 6 characters. Fortunately, none were successful.

Another interesting discovery was that one of the monitored site's web servers was being used to store data. An application to allow students to participate in workgroup activities had been broken into and data was stored for some of those sites that are links in spam. It was the response from the server that alerted me to this. I saw a list of keywords meant to generate a lot of hits in search engines. I was then able to report the full path of the request to the web administrator and the server was cleaned of this and a few other pages.

The lesson I take from this is that Realeyes is capable of collecting a broad range of data, filtering it effectively, and providing enough information to analysts to very quickly determine the severity level of incidents. The rules for monitoring can be customized and tuned for the site's requirements, giving analysts and administrators a deep view of their environment. And since that is what I set out to do with this project, I am quite pleased.

Later . . . Jim

Thursday, October 9, 2008

It's a Big Cloud

I have recently read several articles that comment on the issues surrounding 'cloud computing'. However, they all seem to be the proverbial blind men describing an elephant. Doc Searls covers more ground than most and promises a follow up discussion, but all of them tend to limit the issues to their own perspective.

I don't have a problem with their facts, just the level of incompleteness. First, I'd like a really good definition of 'cloud computing'. Since I have not seen one, I will take a shot at it. As a network guy, I have seen clouds in network diagrams for a long time, so I tend to build on that understanding to relate to the current state of the technology.

The essence of 'cloud computing' is that it extends the personal computer to utilize Internet resources. These days, a personal computer may be a desktop, a kiosk station, a laptop, a mini-laptop, a smart phone, or an applicance. The resources that can be used include services, such as weather or stock information, interactive applications, such as Email or social network sites, storage sites, such as Flickr, or computer-to-computer applications, such as tracking packages or vehicles.

Using this definition, it is obvious that 'cloud computing' is simply a buzzword. Anyone who makes online purchases, has an online Email account, or has joined a social networking site is participating in 'cloud computing'. Even if the requirement that data that would otherwise be stored on a local disk be involved, all three of these examples meet the definition.

So what's the big deal? Well, as usual, social mores are behind the technology, and some of the discussion is about trying to catch up. Also, if the definition can be controlled, it can be sold as a new product, with the inescapable "caveat emptor" warnings from consumer advocates. With that in mind, I see the following issues involved in using this technology. Not surprisingly, many of them are the same as the issues of using computer technology in general, but the addition of the Internet puts a new spin on them:
  • Cost: The proponents of 'cloud computing' tend to tout this as a big plus. That sounds to me like they are trying to sell a web version of out-sourcing. The easiest argument against it would be that traditional out-sourcing has not proven to be a huge cost saver across the board.

    But the real issue is the question of, "Who is the target market?" I cannot imagine a retail company putting it's inventory on storage managed by Google. So realistically, the target market is individual consumers. The type of data being handled is mostly Email, audio/video files, and blogs. For this, the online storage -- and backups -- are very cost effective.

    Will we ever see companies out-sourcing to web services/storage? I never say never, but I think it is a really hard sell. So I predict that it will eventually take hold, but in a limited way. Applications that help companies interact with their customers could be beneficial to both parties. And then there are the ones that no one has thought of yet.

  • Reliability: Adding the Internet to the equation makes
    reliability a huge issue. The components that must all be working are:

    • The personal computer
    • The local ISP
    • The cloud
    • The remote ISP
    • The remote services

    The further out you go, the higher the possibility of failure. But so what? How many times in your life have you missed a critical phone call or Email, where minutes or even hours made a difference? In my mind, the backups at the storage site are far superior to the procedures done by the majority of consumers, myself included. And this outweighs the few times that the site is inaccessible, and is more cost effective at the same time. Even downtime for businesses would not lead to their demise.

  • Access to data: The remaining items are where much of the current discussion is centered. There are some online Email services where it is difficult to retrieve Emails to the personal computer. But this is an issue that can be managed. It essentially boils down to read before you sign, and if you're the type who doesn't do this, well shame on you. Also, it would be pretty silly to not keep a copy of at least the most important data locally, such as photos, which adds to the cost. But the cost of losing it forever is even higher.

  • Privacy: If there is anyone who hasn't figured it out yet, let me put it as plainly as possible. Nothing on the Internet is private. All the privacy policies and laws in the world cannot stop someone who has a real desire to take whatever data they want and do whatever they want with it. Data that is more important, such as financial information can be more carefully protected (this is where I get to plug Realeyes), but computer security is a matter of probabilities, not guarantees. As far as what the Internet companies do with your data, again, read before you sign. But if there is information that should never be exposed, it should never be accessible from the Internet.

  • Ownership: I believe that this is the reason that Richard Stallman said using 'cloud computing' is stupid. He has long championed freedom of information, not just program source code. The ownership policies of most sites promoting 'cloud computing' goes against this in that they consider your information to be free, but not their own. Therefore, his position is consistent and reasonable. However, if you consider losing $1,000 in Las Vegas to be part of the fun, then I guess you can be forgiven for thinking he is a spoil sport.

    It is true that most of the sites that handle consumer data reserve the right to use that data as they please. Their literature basically says this is for promotional purposes. And Google finds keywords in Email to display ads. I have known people who take the dealer logo off their cars because they object to being a billboard. If that is you, you are probably going to be uncomfortable using these sites. But having a social networking account is not a right. You have to pay to play. Just remember the rule about Internet privacy, which is that there is none.

  • Security: This is not a rehash of the storage site's security policies. It is a serving of food for thought. The Internet is a virtual wild west. The quality of security from one site to the next varies widely, and there are many who are happy to take advantage of that. The more you use the Internet, the higher your chances of having a vulnerability exploited. So if you use the same personal computer for social networking that you use for banking, you may become a victim of identity theft. I can think of a couple of analogies here, and I expect that you can too. I would say that the same types of rules apply. At the very least, please keep your security patches up to date.
To 'cloud compute' or not, that is the question. Of course, if you are reading this, you have already answered it. Now it is simply a matter of degree. Are you going to participate fully, or limit your involvement to a few interests? The most important thing is to be aware of the issues. I hope that I have contributed some useful thoughts to that end.

Later . . . Jim

Wednesday, September 24, 2008

Realeyes in the Real World

For about a year, I have been running Realeyes in a pilot project at a local college. The first six months were spent making it scale to a real world environment. But for the past six months, I have been able to experiment with rules to achieve the objectives of the system:
  • Capture exploits: This one is obvious, but that doesn't make it easy.

  • Reduce false positives: This was an original design goal, and my testing shows a lot of promise.

  • Determine if exploits are successful: This was also an original design goal, and it too is showing promise.

  • Capture zero-day exploits: While this capability hasn't been designed into the system, the flexibility of the rules and the statistics collection are showing promise in accomplishing this. We have also found some incidents of ongoing invalid activity that was apparently flying under the rader.
There is not much I can say about writing rules to detect exploits. If you have never done it, you just can't imagine how hard it is. First, finding enough information to even start is very difficult. But even with decent information, picking out the critical pieces that will consistently reveal the exploit is much more of an art than a science. I have written a few dozen rules at this point, but certainly don't consider myself to be an expert. With that said, rules do get written and, after some trial and error, are fairly effective.

The holy grail of creating IDS rules is to increase the capture rate while simultaneously reducing the false positive rate. When I was on the front lines, monitoring the IDSes, there was always the fear that a major attack would be buried in the false positives. Therefore, when I started this project, my personal goal was to completely eliminate false positives. I may not succeed, but I figure that if I am shooting for 100%, I will get closer than if my target is a lot lower.

I had spent quite a bit of time thinking about and discussing this with others. The team I worked with used multiple IDSes and correlated the information from them. Our success rate was very good, compared to our counterparts in other agencies. So a part of the puzzle seemed to be collecting enough data to see the incident for what it really is.

As an analogy, imagine taking a sheet of paper and punching a hole in it with a pencil. Then try to read a magazine article through that hole. At the very least, I would want to make the hole bigger. But this is as far as the analogy takes me, because what I really need is context, and a single word or phrase doesn't do that for me.

But even using multiple IDSes wasn't the complete solution. Each IDS had limitations that reduced its contributions to the total picture. If a rule was producing too many false positives, then the rule had to be modified, often in a way that reduced its effectiveness. This meant that there were important pieces missing.

So the solution appeared to be, put the capabilities of all the IDSes in a single IDS and do the correlation at the point of data capture. And that is how the Realeyes Analysis Engine is designed. Three levels of data that are analyzed: Triggers, Actions, and Events. Each one feeds the next, and the correlation of multiple inputs gives a broader view of the activity.

Triggers are the smallest unit of data detected. They can be header data such as a source or destination port, a string such as "admin.php" or "\xeb\x03\x59\xeb\x05\xe8\xf8", or a special condition such as an invalid TCP option, handled by special code. These are correlated by Action definitions, where an Action definition may be the Triggers: 'Dest Port 80' AND 'Admin PHP'.

Actions are the equivalent of rules in most IDSes available today. It is the next level of correlation that is showing the promise of reaching the goals listed above. Events are the correlation of Actions, possibly in both halves of a TCP session. (Actually, Realeyes assumes that UDP has sessions defined by the same 4-tuple as TCP.) And this is where is gets interesting.

One of the first rules defined was to search for FTP brute force login attempts. This Event is defined by two Actions, the first being 'Dest Port 21' and 'FTP Password', the second being 'Source Port 21' and 'FTP login errors' (which is any of message 332, 530, or 532). The rule was defined to require more than three login attempts to allow for typos. It has reported a few dozen incidents and zero false positives. Considering that the string for the FTP password is "PASS ", I am quite pleased with this result.

A more recent rule was defined to detect an SQL Injection exploit. The exploit uses the 'cast' function to convert hexadecimal values to ASCII text. The rule's Triggers search for four instances of testing for specific values in a table column. The Action is defined to fire if any two of them is found in any order. Although the exploit is sent by a browser to a server, there is no port defined. This rule is reporting a couple hundred attempts per day and none of them are false positives.

It really got exciting when the rule captured a server sending the exploit. It turned out that the server was simply echoing what it received from the browser in certain instances. However, the web admins wanted to know what web pages this was happening for, and this is where the third design goal is demonstrated. Both halves of these sessions were reported and displayed in the playback window. This gave us the full URL data being sent to the server, and the web admins were able to address the problem quickly.

To address the issue of detecting new exploits, Realeyes has a couple of somewhat unique features. The first is statistics collection. The information maintained for each session includes the number of bytes sent and received. When a session ends, these are added to the totals for the server port. Then, three times a day, the totals are reported to the database. This allows for the busiest ports to be displayed. Or the least busy, which might point to a back door.

But there are other statistics that can be collected, as well. It is possible to monitor a specific host or port and collect the total data for each session.

For example, I monitored UDP port 161, which is used for SNMP, and saw two host addresses using it that belonged to the site, but 3 or 4 a day that did not. Since there was not much traffic for the port, I simply created a rule to capture all data for it. This showed me that there were unauthorized attempts to read management data from devices in the network, but that none were successful.

Using the same technique, I monitored TCP port 22, used for SSH, and found several sessions that appeared to be attempts to steal private keys. I reported this to the admin of the target hosts, and while he had applied the most recent patches, I also suggested that he regenerate his private keys, to be on the safe side.

Another feature for discovering new exploits is time monitoring. This is setting a time range for all activity to a host or subnet to be captured. I defined rules to monitor the EMail servers between midnight and 5:00 am. The original intent was to watch for large bursts of outbound EMail to detect any site hosts being used for spam. We have found one of these.

But we have discovered several other things from this. First was a large amount of traffic using a port that the network admin thought had been discontinued. Second, there have been several attempts to have EMail forwarded through the site's servers. This may be a misconfiguration at the sender, or it may be an attempt to send spam through several hops to avoid detection. From this I created rules to monitor for the most common domains.

So far, there have not been any earth shattering discoveries (to my disappointment, but to the the admins' relief). But as I said, the signs are promising that the system is capable of meeting the design goals. Until a couple of weeks ago, I have spent the majority of my time working on code. But I am starting to spend more time on rules. So I am looking forward to making some new discoveries. Stay tuned.

Later . . . Jim

Tuesday, September 23, 2008

My App Fails the LSB

My position on the Linux Standard Base has evolved. When I first heard about it, I was all for it. The LSB as a standard could be useful to some, but now I disagree with the goals of the LSB working group. To be sure, this post is not about dissing the Linux Foundation. They have many worthwhile projects.

What follows is my experience with the LSB Application Checker, my take on the purpose of the LSB, and my own suggested solution for installing applications on GNU/Linux distributions. The Realeyes application failed to certify using the checker v2.0.3, which certifies against the LSB v3.2. Everything that it called out could be changed to pass the tests, but I will only consider correcting a few of the 'errors'.

After building the Realeyes v0.9.3 release, I collected all executable files in a common directory tree, downloaded the LSB application checker, and untarred it. The instructions say to run the Perl script, app-checker-start.pl, and a browser window should open. The browser window did not open, but a message was issued saying that I should connect to http://myhost:8889. This did work, and I was presented with the Application Check screen.

There was a text box to enter my application name for messages and one to enter the files to be tested. Fortunately, there was a button to select the files, and when I clicked on it a window opened that let me browse my file system to find the directories where the files were located. For each file to be tested, I clicked on the checkbox next to it, and was able to select all of the files, even though they were not all in the same directory. Then I clicked on the Finish button and all 87 of the selected files were displayed in the file list window.

When I clicked on the Run Test button, a list of about a dozen tasks was displayed. Each was highlighted as the test progressed. This took less than a minute. Then the results were displayed.

There were four tabs on the results page:
  • Distribution Compatability: There were 27 GNU/Linux distributions checked, including 2 versions of Debian, 4 of Ubuntu, 3 of openSUSE, 3 of Fedora, etc. Realeyes passed with warnings on 14 and failed on the rest.

  • Required Libraries: These are the external libraries required by the programs written in C. There were nine for Realeyes, and three (libcrypto, libssl, and libpcap) are not allowed by the LSB. This means that distros are not required to include the libs in a basic install, so they are not guaranteed to be available.

  • Required interfaces: These are the calls to functions in the libraries. There were almost a thousand in all, and the interfaces in the libraries not allowed by the LSB were called out.

  • LSB Certification: This is the meat of the report and is described in some detail below.

The test summary gives an overview of the issues:
  • Incorrect program loader: Failures = 11

  • Non-LSB library used: Failures = 4

  • Non-LSB interface used: Failures = 60

  • Bashism used in shell script: Failures = 21

  • Non-LSB command used: Failures = 53

  • Parse error: Failures = 5

  • Other: Failures = 53

The C executables were built on a Debian Etch system and use /lib/ld-linux.so.2 instead of /lib/ld-lsb.so.3. The Non-LSB libraries and interfaces were described above, but there was an additional one. The Bashisms were all a case of either using the 'source' built in command or using a test like:
  while (( $PORT < 0 )) || (( 65535 < $PORT )); do

which, in other Bourne shells, requires a '$(( ... ))'. The parse errors were from using the OR ("||") symbol.

The fixes for these are:
  • Use the recommended loader

  • Statically link the Non-LSB libraries

  • Use '.' instead of 'source'

  • Rework the numeric test and OR condition

So far, all of this is doable, sort of. But every time a statically linked library is updated, the app must be rebuilt and updates sent out. Also, the additional non-LSB library is used by another one. So I would have to build that library myself and statically link the non-LSB library (which happens to be part of Xorg) in it (and that one is part of the GTK). The reason I am using it is because the user interface is built on the Eclipse SWT classes, which uses the local graphics system calls to build widgets.

The non-LSB commands include several Debian specific commands (such as adduser), and for my source packages I had to rework the scripts to allow for alternatives (such as useradd). But the other disallowed commands are:
  • free: To display the amount of free memory

  • sysctl: To set system values based on the available memory

  • scp: Apparently the whole SSL issue is a can of worms

  • java

  • psql

Finally, all of the other 'errors' are: Failed to determine the file type, for the file types:
  • JAR

  • AWK

  • SQL

  • DTD and XML

Part of the problem with the LSB is that it has bitten off more than it can chew. Apparently Java apps are non-LSB compliant. So are apps written in PHP, Ruby, Erlang, Lisp, BASIC, Smalltalk, Tcl, Forth, REXX, S-Lang, Prolog, Awk, ... From my reading, Perl and Python are the only non-compiled languages that are supported by the LSB, but I don't know what that really means (although I heartily recommend to the Application Checker developers that they test all of the executables in their own app ;-). And I suspect that apps written in certain compiled languages, such as Pascal or Haskell, will run into many non-LSB library issues.

Then there are databases. Realeyes uses PostgreSQL, and provides scripts to build and maintain the schema. Because of changes at version 8, some of these scripts (roles defining table authorizations) only work for version 8.0+. The LSB Application Checker cannot give me a guarantee that these will work on all supported distros because it didn't test them. I have heard that there is some consideration being given to MySQL, but from what I can tell, it is only to certifying MySQL, not scripts to build a schema in MySQL.

After all this kvetching, I have to say that the Application Checker application is very well written. It works pretty much as advertised, it is fairly intuitive, and it provides enough information to resolve the issues reported by tests. My question is, "Why is so much effort being put into this when almost no one is using it?"

An argument can be made that the LSB helps keep the distros from becoming too different from each other and without the promise of certified apps, the distros would not be motivated to become compliant. But I only see about a dozen distros on the list, with Debian being noticeably absent. And yet, there is no more sign of fragmentation in the GNU/Linux world than there ever was.

My theory on why UNIX fragmented is that proprietary licenses prevented the sharing of information which led to major differences in the libraries, in spite of POSIX and other efforts to provide a common framework. In the GNU/Linux world, what reduces fragmentation is the GPL and other FOSS licenses, not the LSB. All distros are using most of the same libraries, and the differences in versions are not nearly as significant as every UNIX having libraries written from scratch.

I have to confess, I couldn't care less whether Realeyes is LSB compliant, because it is licensed under the GPL. Any distro that would like to package it is welcome. In fact, I will help them. That resolves all of the dependency issues.

While I am not a conspiracy theorist, I do believe in the law of unintended consequences. And I have a nagging feeling that the LSB could actually be detrimental to GNU/Linux. The only apps that benefit from LSB compliance are proprietary apps. The theory behind being LSB compliant is that proprietary apps can be guaranteed a successful installation on any LSB compliant GNU/Linux distro. I'm not arguing against proprietary apps. If a company can successfully sell them for GNU/Linux distros, more power to them. However, what if proprietary libraries manage to sneak in? This is where the biggest threat of fragmentation comes from.

But even more importantly, one of the most wonderful features of GNU/Linux distros is updates, especially security updates. They are all available from the same source, using the same package manager, with automatic notifications. If the LSB is successful, the result is an end run around package managers, and users get to deal with updates in the Balkanized way of other operating systems. That is a step in the wrong direction.

The right direction is to embrace and support the existing distro ecosystems. There should be a way for application teams to package their own apps for multiple distros, with repositories for all participating distros. The packages would be supported by the application development team, but would be as straightforward to install and update as distro supported packages.

There is such a utility, developed by the folks who created CUPS. It is called the ESP Package Manager. It claims to create packages for AIX, Debian GNU/Linux, FreeBSD, HP-UX, IRIX, Mac OS X, NetBSD, OpenBSD, Red Hat Linux, Slackware Linux, Solaris, and Tru64 UNIX. If the effort that has gone into LSB certification were put into this project or one like it, applications could be packaged for dozens of distros.

And these would not just be proprietary apps. There are many FOSS apps that don't get packaged by distros for various reasons, and they could be more widely distributed. Since the distros would get more apps without having to devote resources to building packages, they should be motivated to at least cooperate with the project. And don't forget the notification and availability of updates.

As a developer and longtime user of GNU/Linux ('95), I believe that all of the attempts to create a universal installer for GNU/Linux distros are misguided and should be discouraged. I say to developers, users, and the LSB working group, "Please use the package managers. A lot of effort has been put into making them the best at what they do."

Later . . . Jim