Saturday, August 30, 2008

Building a Debian GNU/Linux package

I recently built packages for the 0.9.3 release of Realeyes. These include both source and Debian GNU/Linux packages. For reasons that I have forgotten, I built the Debian packages before I started working on the source packages, but I'm glad that I did. Going through that process made me add several things that I would have overlooked, especially man pages.

I actually built an entire distribution as part of the process of creating my packages. It does not meet the requirements for re-distribution, but is very handy for laying down a fresh install with exactly what is needed to run Realeyes. I will provide the steps for that in another post.

The source is comparatively easy, once all of the files are collected, a tar file is created of the directory. The trick here is to verify that the C code will compile. I have read a lot of comments about how straight-forward the standard configure/make installation procedure is. But from a developer's perspective, there are several issues, mainly having to do with autoconf and automake. I don't have enough space here to discuss these, and I really don't have any wisdom to impart if I did, because I have only learned as much as I needed for my own packages.

Debian packaging is somewhat more complicated. A package that is included in a Debian distribution must go through fairly rigorous tests. There are several scripts for checking correct packaging procedures, including lintian and linda. These verify such conditions as:
  • executables are built correctly

  • man pages exist for every executable file

  • naming conventions are followed

  • Debian documentation is formatted correctly
I have seen a few HOWTOs on building a Debian package, and there is a huge amount of information on the Debian site. But I still had to piece together a working plan for myself, so I thought it would be worth sharing it. There may be better ways to do some of it, but I have written scripts that get me through the process with only a little manual effort. And even when there is an automated process, it is still good to know what is happening under the hood. So without further ado, I offer my experience.

Building a Debian Package


I. Read enough of the manuals to get a sense how Debian packages are built, and then keep links to them for reference:

II. Create a working directory for the package

  • Get a package to use as a model, using the following commands to extract the package files and the Debian metadata files:

    apt-get -d -y --reinstall install package_name
    dpkg -x package_name
    cd package_dir
    dpkg -e ../package_name

  • Create a working directory, and under it make the directories to be installed for the package, even if there are no files to be saved in them. These may include:

    working_dir/etc/package_name
    working_dir/etc/init.d
    working_dir/usr/sbin
    working_dir/usr/share/package_name
    working_dir/usr/share/doc/package_name
    working_dir/usr/share/man
    working_dir/var/log/package_name

  • Make the Debian control directory with its files (as needed)

    working_dir/DEBIAN

    • control: This contains the description of the package, including dependencies, architecture, and the description used by aptitude or synaptic -- use the model to create this for the first time

    • conffiles: This contains any configuration files installed with the package -- I put mine in /etc/realeyes

    • preinst: This is a shell script that runs before the package is installed, if it exists

    • postinst: This is a shell script that runs after the package is installed, if it exists -- I use it to create user IDs

    • prerm: This is a shell script that runs before the package is de-installed, if it exists

    • postrm: This is a shell script that runs after the package is de-installed, if it exists

  • Populate the directories with the application files: Use the model to help understand what goes where

III. Use the maintainer tools to verify package acceptability
  • Build the package:
    cd working_dir
    dpkg-deb --build package_dir package_name
  • lintian/linda: Check for package discrepancies. Note that lintian and linda are not in the standard package and must be installed separately. Lintian uses Perl and linda use Python, so there may be several dependencies installed with them.
    lintian -i package_name > package_name.lintian
    linda package_name > package_name.linda
  • Fix all the problems, here are some helpful hints from my experience:

    • Man pages: txt2man is a program that takes ascii text and converts it to a man page. It works for simple pages, but the resulting groff file may have to be edited manually in some cases. Use 'gzip -9' to compress man pages.

    • Compiled programs: Compiled programs must be stripped. Use the command:
      install -s
    • Identify all non-executable files in system directories (ie. /etc/package_name) in the package's DEBIAN/conffiles.

    • Lintian provides the section in the Debian Policy manual that describes the requirement that was flagged.

  • Sign the package: Create a GPG key for the package and sign each package with the key
    gpg --gen-key
    dpkg-sig -s builder pkg.deb
    NOTES:

    • The public keyring is in $HOME/.gnupg/pubring.gpg

    • There should be a lot of entropy on the system to help the random number generator, (grep -R abc /usr/* seems to work well)

    • Issue the command 'cat /proc/sys/kernel/random/entropy_avail' to find out how much entropy is currently available, it should be at least above 1,500
At this point, the package can be installed using the dpkg command. However, if there are dependencies that must be installed, dpkg will issue a warning about them, but does not handle their installation. So if you want to go to the next level, here is what you have to do.

IV. Repository directories

A Debian repository has a relatively simple directory structure to maintain the packages and metadata about them. An installation ISO is basically a repository tree with just the stable packages. A custom repository tree can be added to the apt sources.list to be accessed just like officially maintained packages, with aptitude or synaptic.

In the top repository directory, the following are mandatory:
  • md5sum.txt: The list of all files in the tree with their md5 checksums

    To create the md5sum.txt file for my mini-distro, I wrote a script that ran in the top ISO directory. It did a recursive ls, ran md5sum on all regular files, and wrote the output to the md5sum.txt file. I keep that as a template and only update the files that change.

  • pool: The subdirectory where packages are kept. Under the pool directory, there are a few pre-defined directories where the different categories of packages are kept. Anyone who has edited a sources.list file has seen most of these:

    • main: Technically, these are packages that meet the Debian Free Software Guidelines (DFSG), but I think of them as the officially maintained packages

    • contrib: Contributed packages are DFSG, but depend on packages that are not -- I use this for my own packages, even though I don't have any non-DFSG dependencies

    • non-free: These are non-DFSG packages

    The structure of each of these is the same. The package directories are in subdirectories named with the first letter of the package name. The exception is libraries, which are under directories named libletter, which is the prefix of the library package name. Below these subdirectories are the directories with the actual package files.

  • dists: The dists directory contains the metadata about packages. There is a directory for the distribution, in this case, etch. In an ISO, there are also the directories, frozen, stable, testing, and unstable, which are links to the distribution directory. In a repository, these may have their own files. But for my purposes, I only include the distribution subdirectory.

    Under the distribution subdirectory are the following:

    • Release: This file describes the packages, including the architecture, the components, and contains md5sums for the package metadata files -- the file information also includes the file size, and since there are only a few of these, I created the original by hand

    • main: This directory contains the metadata about the main packages

    • contrib: This directory contains the metadata about the contrib packages

    The structure of main and contrib is the same, and again, I only use contrib. The architecture directories are below contrib, and in my case, there is only binary-i386. In the architecture directory there are three files:

    • Packages: This uses information from the DEBIAN/control file and adds such things as the full path of the package file

    • Packages.gz

    • Release: This contains metadata about the contrib directory

  • I also put a few optional files in the top level directory. These include a copy of the GPL (I use version 3), installation instructions, and the public key for the signed packages. The installation instructions explain how to add the public key so that aptitude and synaptic can validate the packages.

V. Add the packages
  • Copy the packages to the appropriate pool directory. In my case, this means copying them to:

      iso_dir/pool/contrib/r/realeyescomponent

  • Create an override file for the packages. This consists of a line for each packages with the following information:

      package priority section

    In my case it looks like this:
    realeyesDB   optional  net
    realeyesDBD optional net
    realeyesGUI optional net
    realeyesIDS optional net
    The man page for dpkg-scanpackages (the next command) has a description of each field and says that the override file for official packages is in the indices directory on Debian mirrors.

  • Build the metadata:
    dpkg-scanpackages \
    pool/contrib/ override.etch.contrib > \
    dists/etch/contrib/binary-i386/Packages
    cd dists/etch/contrib/binary-i386
    gzip -c -9 Packages > Packages.gz
    cd ../..
    md5sum contrib/binary-i386/Packages* > md5.tmp
    ls -l contrib/binary-i386/Packages* >> md5.tmp
  • The file, md5.tmp, is edited to put the file size after the md5 checksum and before the file name, and the file listing lines are deleted. Then the file, Release, is edited to read in the file, md5.tmp, at the end and the duplicate lines are deleted.
    gpg --sign -ba -o Release.gpg Release
    cd ../..
    cp -a ~/.gnupg/pubring.gpg RE_pubring.gpg
    md5sum ./INSTALL* > md5sum.txt
    md5sum ./GPL* >> md5sum.txt
    md5sum ./dists/etch/Release >> md5sum.txt
    md5sum ./dists/etch/contrib/binary-i386/* >> md5sum.txt
    md5sum ./pool/contrib/r/*/* >> md5sum.txt

VI. Installing the package

The instructions for installing this package are:
  • Copy the debian packages to a directory that will be used for the initial installation and future updates, such as, /var/tmp/realeyes. Untar the packages:
    tar xvzf realeyes_debian.tar.gz
  • Change to the top level packages directory and add the public key file, RE_pubring.gpg, to the Debian trusted sources with the command:
    apt-key add RE_pubring.gpg
  • Edit the file /etc/apt/sources.list to add the line:
    deb file://install_dir/realeyes/ etch contrib
  • Update the package lists with one of the following methods:

    • On the command line enter:
      apt-get update
    • In aptitude, select Actions -> Update package list

  • Install the package using aptitude or synaptic
So there you have it. I hope it shortcuts the learning curve a little.

Later . . . Jim

Saturday, August 23, 2008

Security Unobscured

I just read this post about the state of information security. There are two main points that seem to be related, and a third that is implied.

First, Jason feels that security professionals are not being taken seriously by other IT professionals. Maybe it's because I live in the Washington, DC area, but I don't have that impression. However, if it is true at all, I can't help but think that it is because of some people in the field shouting, "The sky is falling," combined with an ever increasing number of real security breaches that are poorly addressed, ranging from web based exploits and phishing scams to stolen laptops containing the unencrypted personal information of thousands of people. It makes the general public wonder, "What do those security people do?"

I had a discussion with an IT professional the other day about a recent SQL injection exploit. Over time, the lower levels of the stack, including the operating system and server daemons, have become fairly hard targets, while applications are still pretty soft. And predictably, the attacks are moving up the stack. A large part of the problem is that many (if not most) programmers have little understanding of the infrastructure on which their application is built, in this case, databases. But it is left to the programmers to test the data for SQL injection attempts. Why isn't the security community clamoring for database APIs to include a higher degree of data assurance, as well as more security in other APIs?

And that segues into Jason's second point, "the market is flooded with these so called CISSP certified IS professionals". I get the impression that he considers them to be charlatans. And in the era of terrorist attacks and Sarbanes-Oxley, there are IT professionals and managers who are wary of buying IS snake oil. Referring back to my comment about database APIs, the IT security field must be seen to be solving security problems. Instead, vendors and consultants often seem to be peddling their own wares, while the security of people's information is a secondary concern. In fact for some, the increase of real threats almost seems to be a marketing tool.

And that brings me to the third point that Jason implied when he said, "I remember a day when security people were feared." Having been on both sides of this fence, I see a tension between IT security and performance, with performance almost always being the top priority. In my experience, the security people don't get their own budget, they get the leftovers from the system and network budget -- the exception being financial institutions and some government agencies. After all, security is a lot like insurance, and everyone prefers the cheapest insurance, especially when the economy is down.

While I agree with Jason that IT professionals should take the security professional's concerns seriously, I think it is even more important for security people to focus on solutions. Most IT people look at security breaches the same way they look at hardware failures, a problem to be solved as quickly as possible, so that the real work of their environment can continue. The security people will get more respect by helping to keep these to a minimum than by coming on as tyrannical hall monitors.

We all know that there is no security silver bullet, it requires that the people using and maintaining the systems do the right thing. But if they don't know what the right thing is, because of poor training, being unaware of potential hazards, or lack of information about their environment, the security professionals have to take some responsibility for it. Computer technology is still a relatively young field, and both innovation and expansion are changing the landscape every year. The job of security professionals will continue to be providing security solutions for current conditions by doing research, explaining the findings, and developing tools to help the people in the trenches.

My own approach is to give the IT people as much useful information as possible. The reason I started Realeyes was to reduce false positives and provide the admins with enough data to make quick decisions and increase awareness of what is happening in their networks. They know the systems and networks best, so it is my job to give them the best tools to do their job right.

If security people want attaboys, we have to provide more light and less shadows.


Later . . . Jim

Friday, August 22, 2008

Messages

I recently saw a question on Slashdot asking how to handle application messages. Most of the responses were along the lines of "only output what is important". Of course that implies that the programmer knows what is important to every user, which isn't very likely.

In Realeyes, the approach is different. First, there is a message for just about everything except normal data collection and analysis. This includes parsing configuration files, network connection activity, administrative commands, and, of course, errors. Each message is assigned a type code, such as critical, error, warning, informational, etc. Then, in the main configuration file, it is necessary to explicitly request for warnings and informational messages to be logged.

This way, a newcomer to the application can see all messages to get a sense of if and how the program works. When the repetitious messages are no longer useful, they can be ignored. Although the warnings may be helpful in troubleshooting, they generally relate information about configuration issues that are already known, and could become a bit irritating, so the user is given the choice of recording them or not.

Of course, error messages are not optional. And there is a NOTE message type, that is not optional, used for things like the startup and shutdown messages.

If you are interested in seeing how this is coded, check out the files, RealeyesAE/src/rae_control.c and RealeyesAE/include/rae_messages.h, in the Realeyes subversion repository. In rae_control.c, the function, control_messages, handles writing the log file. And down about line 380, there is a message.

This brings up a couple of points. First, the application is actually a collection of processes. The child processes do not write messages to disk. Instead, they create the message in shared memory and put it on a queue using the macro, raeMESSAGE, which is defined in rae_messages.h. The parent process (called the manager) periodically checks the message queues and writes messages to the log files. It also has a shutdown function that is called even after a system interrupt, which prints messages one last time.

Second, the message documentation is in the source code. I use doxygen to create program documentation. I found a way to create separate files for inline documentation and that is how the message logs are created. This makes it a lot easier to keep messages up to date, and to be sure that all messages are documented. Unfortunately, Dmitri did something after version 1.3.6 of doxygen, so that this no longer works. For now, I keep an older version of doxygen around, but eventually, I plan to create a script to handle it. In that, I would like to output messages in both HTML and ODF.

If this has been of interest and you have any tips for making maintenance of applications more efficient, please share.

Later . . . Jim

Monday, August 4, 2008

Program Security

Good security practices are multi-layered. The levels that are addressed in the Realeyes application are:
  • code vulnerabilities

  • program interaction

  • privileges and access

Code vulnerabilities are bugs that can be exploited to gain control of a program simply by interacting with it. Therefore secure programming starts with good coding practices. I saw David A. Wheeler present his HOWTO on secure programming practices, and highly recommend it. He covers issues in reading and writing files, how to prevent buffer overflows, user privileges, and much more. I was glad to have seen his presentation early in the design phase of the Realeyes project.

The only problem with his, and almost every other tutorial/reference I have ever read, is that it covers so much ground that each individual topic is a little short on detail. Even my favorite reference books by W. Richard Stevens leave a lot of code as an exercise for the reader. I am not going to write a tutorial, but the way issues specific to the Realeyes application have been handled are detailed implementation examples. Where files are referenced, each file path is relative to the subversion repository for Realeyes. And BTW, I have a pretty good background in this, but I am sure that there are others who have more than me, so I would be happy to hear of any suggestions for improvements.

There are four components in the Realeyes application: the IDS, the database, the user interface, and the IDS-database interface (database daemon or DBD). Each has unique security issues, including interacting with other components.

The database is where user IDs are maintained. In the PostgreSQL database, it is possible to create groups that are granted specific access rights to each table. Then each user is assigned to a group and inherits those rights. The groups in the Realeyes database schema are defined in RealeyesDB/sql_roles/create_roles.sql and include:

  • realeyesdb_dbd: Only used by the DBD program to insert data from the IDS sensors and retrieve commands and new rules to be sent to the sensors

  • realeyesdb_analyst_ro: An analyst-read-only can view data and rules, and produce reports

  • realeyesdb_analyst: An analyst can view data and rules, create incident reports, and produce reports

  • realeyesdb_analyst_dr: An analyst-define-rules can do everything an analyst can, plus define new rules

  • realeyesdb_admin: An application administrator can do everything an analyst-define-rules can, plus create and modify new users and other application information

The user interface and DBD are written in Java, and connect to the database directly. The connection is always encrypted. So this layer of security is an administrative issue, to make the database host as secure as possible. The only additional feature offered here is the selection of a different listening port for the database than the default. The classes that interface with the database are in the files RealeyesDBD/DBD_Database.java and RealeyesGUI/Database.java.

One issue that was raised early in the pilot project I am running at a local college was how secure the captured data is. The data is stored in the database but not in raw form. The user interface reads this and reformats it, but does not print that to a file. An analyst could cut and paste the data, so that becomes a personnel issue. I have debated whether to provide the capability to write to file, but for the time being am leaning away from it. The user interface does generate summary reports and writes those to file, but that does not include any of the captured data.

The DBD connects to both the database and the IDS sensors. The IDS connection is optionally encrypted so that if it and the DBD are on the same host, the encryption overhead can be eliminated. Also, the address of the DBD host must be defined to the IDS sensor for the connection to be accepted, and the ports that are used are defined in the configuration. A sample configuration is in the file RealeyesDBD/sample_config/realeyesDBD.xml, and the code to parse it is in RealeyesDBD/RealeyesDBD.java.

One big issue I discovered regarding Java and encrypted connections is that, in JRE 1.4, it is possible to maintain multiple connections using the SocketChannel class, and it is possible to encrypt them using the SSLSocket class. However, it is not possible to do both at the same time. In JRE 1.5, the flaw is addressed, but the solution is very ugly. It essentially requires an application programmer to write a TCP/IP API. The argument for this is that Java might be used on networks other than TCP/IP, so the solution must be broad. Hopefully, JRE 1.6 will provide a solution for 99.99% of Java application programmers, and then the remaining 0.01% will still have a partial solution for their needs.

While I have considered porting the DBD to C++, the current code handles this in two ways. To begin with, there are two connections between the DBD and each IDS sensor. One is for data and the other is for control information. The data connection is handled by starting a thread that is dedicated to that connection, and that code is in the file RealeyesDBD/DBD_Handler.java. The control information is more sporadic, which is why I would have liked to use the SocketChannel selector. The workaround is to have the DBD poll each IDS sensor every 8 seconds if there has been no activity on the connection. The code for this is in the file RealeyesDBD/DBD_Control.java.

The IDS is a collection of C programs. These are started by running 'realeyesIDS', which spawns child processes. I discussed the reasons for choosing interprocess communication over threads in "Loose Threads". The main process, called the Manager, and all but one of the child processes run under the superuser ID. This is because they use a large shared memory buffer, and the way that is built it can only be accessed by the superuser. The files that handle managing this buffer are:
  • RealeyesAE/src/rae_mem_ctl.c: Contains the code that the Manager uses to allocate, initialize, and do garbage collection for the buffer

  • RealeyesAE/src/rae_mem_mgmt.c: Contains the code that all processes use to allocate and free individual buffers

  • RealeyesAE/src/rae_lock_mgmt.c: Contains the code that all processes use to prevent memory locations from being changed incorrectly

The process that communicates with the DBD, called the Spooler, is designed with several security features. First, as the Spooler is started, it changes the user ID to one which has very limited access. It then changes the current directory to one that contains only the files it uses, and sets that to its root. The only communication from the Spooler to any of the other processes is through pipes, which means that it is serialized and straightforward to validate. Finally, the configuration file specifies the DBD host, and only connections from it are accepted.

The files that handle the Spooler communications are:
  • RealeyesIDS/data/rae_analysis.xml: The configuration file where the DBD host is defined (it is the manager's configuration file and contains a lot more, but the Spooler definitions are in their own section)

  • RealeyesIDS/src/rids_spooler.c: The Spooler initialization function where the user ID and directory are set is near the end of the spooler_init function

  • RealeyesIDS/src/rids_net_mgmt.c: The Spooler is the only process to use the network management functions, including the SSL setup, the listener which validates the connection request, and the exchange of data

Ultimately, the best way to program security is to think about how to exploit vulnerabilities in the code. And since the purpose of the Realeyes IDS to detect exploits, I spend a lot of time thinking about it in general. So I have a fair amount of confidence that it is a good example of a securely coded network application.

Later . . . Jim