Thursday, June 4, 2009

Good Passwords

I have recently had reason to think about "good" passwords. To begin with, passwords are like keys. And weak passwords are like leaving your keys in the ignition of your car when you are out of it -- before long, it's going to be stolen.

But while there is a lot of talk about strong passwords, I have not heard a really usable way of creating them. And by usable, I mean one that typical computer users will actually use consistently. Of course, this is leading to an algorithm I thought of recently.

First, choose three words. How they are chosen doesn't really matter, as long as they are not ridiculously obvious. I think it would be OK to use a standard theme. As an example, while I have no interest in golf, my uncle loves it. So that will be the theme, and for my first password, I will pick:
    plaid

    birdie

    sand
Next, pick three numbers. Once these are chosen they will almost never change. The numbers will be substituted for a letter in the words. This could be the third letter of each word or the second from the last. For this example, I will pick the numbers 4-7-4 and the third letter.

Next choose another letter position, that is not the same as the previous one. This gets capitalized (while all the others are lower case). For the example, it will be the last letter.

So now create the password:

  • plaidbirdiesand

  • pl4idbi7diesa4d

  • pl4iDbi7diEsa4D
If the site requires puctuation, simply choose a punctuation mark and insert it between two of the words:
  • pl4iDbi7diE:sa4D
Now all the user has to do is remember the three words, which are meaningful to only him or her, and should be reasonably easy to remember, even with four or five different passwords. The transformation is the same for every password. So another example is:
  • sliceironcart

  • sl4ceir7nca4t

  • sl4cEir7Nca4T
While this might not be acceptable to super top secret government facilities or financial institutions, it should be sufficient for the majority of people. And it would be a whole lot better than many passwords being used now. If you agree, teach it to everyone you know who uses passwords. Then we can start working on making sure passwords are always encrypted.

Later . . . Jim

Monday, June 1, 2009

Handling Semaphores in C Programs

A while back, Carlo Schroder, over at LinuxToday.com, put out a request for articles on programming. Now that I have put the downloads for Realeyes IDS version 0.9.5 up on SourceForge, I get to have some fun answering her call.

What I have found in programming books, including those by the late W. Richard Stevens (which I turn to most often) is usually a good start, but never the whole story. But since this is not a general programming text, I will focus on a single issue in detail. This post will cover using semaphores.

First, a little background on locks, in case you have never used them. In *nix, process locks are implemented with the semaphore system calls. Since I use child processes that share memory, I have to implement semaphores. Threads use pthread_mutex calls, which do essentially what these functions do, and then some.

The most common reason for implementing locks is if you have multiple concurrently running processes or threads that have access to the same variable in memory. Obtaining or releasing a lock is guaranteed by the operating system to be completed without interruption. A single line of C code, such as
 if (flag & 4)
requires a minimum of two machine instructions:
  • Get the value of flag from memory into a register

  • Compare the value to zero
It is possible the thread running that code could be swapped out after getting the value, but before comparing it, and the value of flag could be changed by another thread, making the comparison invalid. By requiring every thread to get the flag lock before reading or writing the value of flag, only one thread accesses flag at a time.

The rule of thumb for the code while a lock is held is to do only what requires holding the lock, and no more. Often there are less than ten instructions between getting and releasing a lock. However, sometimes there are a couple dozen, because all of them require holding the lock. My memory management is an example of this, and I will try to cover it down the road.

OK, now for how I implement semaphores. For some reason, the caller is required to define the semun structure. This definition is taken from the semctl man page and is in the rae_lock_mgmt.h file.

union semun {
int val; /* Value for SETVAL */
struct semid_ds *buf; /* Buffer for IPC_STAT, IPC_SET */
unsigned short int *array; /* Array for GETALL, SETALL */
struct seminfo *__buf; /* Buffer for IPC_INFO */
};
All of the following code is from the rae_lock_mgmt.c file. First, I define a global to keep track of a held lock. This is done so that if there is an interrupt, such as a segmentation fault, while the lock is held, it can be released in the signal handler by calling rae_release_lock. The caller must set these fields to the address of the variables used to track the semaphore ID and index.
/* Pointer to currently held lock */
int *rae_held_lock = NULL;
int *rae_hl_index = NULL;
Before a lock can be used, it must be initialized. I use a lot of locks and have found that different Linux distributions have different defaults for the maximum number of semaphores an application may allocate. To keep the number of allocations down, I have grouped the locks by functionality, and each group gets a semaphore set (or array) which only uses a single semaphore ID. Therefore, the number of locks in the group is passed to the init function.
int rae_init_lock(int il_size)
{
int i, il_semid = -1;
union semun sem_union;
The semget call returns a new semaphore set. Then each lock in the array is initialized to a value of 1 to indicate that it is available. If that fails, the semaphore ID is set to a negative value, which is -1 if the semaphore set is released, and -2 if it is not.
  if ((il_semid = semget(IPC_PRIVATE, il_size, (0660|IPC_CREAT))) > 0)
{
sem_union.val = 1;
for (i=0; i < il_size; i++)
{
if (semctl(il_semid, i, SETVAL, sem_union) == -1)
{
if (semctl(il_semid, 0, IPC_RMID, sem_union) == -1)
il_semid = -2;
else
il_semid = -1;
}
}
}
return(il_semid);
} /* end rae_init_lock */
When the application shuts down, the locks must be freed. If not, they remain allocated by the system. You can run 'ipcs -s' to see the locks that are held. If an application fails to relaese a lock, you can run ipcrm (read the man page) as root to release it.

Notice that the location of the semaphore ID in the application is passed to this function, and it is set to zero by the function. This is because the sem_ctl command, IPC_RMID, ignores the index and simply removes the entire semaphore. Also, I prefer to do as much as possible in a function so the caller does not have to worry about the details. This way, when I call the same function from different places, I reduce the risk of forgetting to set something.

int rae_free_lock(int *fl_semid, int fl_idx)
{
int fl_stat = 0;
union semun sem_union;

if (*fl_semid == 0)
goto out;
sem_union.val = 1;
fl_stat = semctl(*fl_semid, fl_idx, IPC_RMID, sem_union);
*fl_semid = 0;
out:
return(fl_stat);
} /* end rae_free_lock */
When a lock is needed for a memory location, the get lock function is called with the lock identifier, which consists of the semaphore ID and the index in its array. I have added a wait flag to allow some locks to be conditional. In my memory management code, I have three available buffer queues, and if the lock on one is held, the caller can simply try the next one without waiting.

The semop call gets and releases a lock by subtracting the supplied value from the semaphore's value. What this means is that you, the programmer, are responsible for defining what value indicates a held or released lock. By keeping all of this logic in a pair of functions, you have control over how it is implemented. All of the examples I have seen use 1 to indicate the lock is available, and 0 to indicate that it is held. I can imagine how other values might be used, but it seems ridiculously complicated and prone to error.
int rae_get_lock(int gl_wait, int gl_semid, int gl_idx)
{
int gl_stat = 0;
struct sembuf sem_b;

if (gl_semid == 0)
{
gl_stat = -2;
goto out;
}
The semaphore buffer structure is defined in the system headers. This is where the semaphore array index and operation are set. And as I said before, this function allows the caller to wait or not, which is accomplished by using the semaphore flag bit, IPC_NOWAIT.

The SEM_UNDO flag bit is supposed to reverse the operation when the process terminates, which implies that if the process fails while holding a lock, the system will release it (but not free it). However, in my experience, that doesn't always work, so I have included the capability to do this in my interrupt handlers, as I mentioned above.

  sem_b.sem_num = (short int) gl_idx;
sem_b.sem_op = -1;
if (gl_wait & raeLOCK_WAIT)
sem_b.sem_flg = SEM_UNDO;
else if (gl_wait & raeLOCK_NWAIT)
sem_b.sem_flg = SEM_UNDO | IPC_NOWAIT;
else
{
gl_stat = -1;
goto out;
}
This is the heart of the function. Read the semop man page for a lot more detail, but the general idea is as follows. The semop system call will attempt to subtract 1 from the current value of the lock. If that value is 1, the operation occurs immediately. Otherwise, the call will wait or return with the errno value of EAGAIN if the lock is held. Of course, there is the possibility the call will fail entirely, which must be handled.

If the lock value is set to 0, this means the lock is obtained, and this function sets the semaphore ID and index in the global lock tracking variables.

  if ((gl_stat = semop(gl_semid, &sem_b, 1)) == -1)
{
if ((gl_wait & raeLOCK_NWAIT) && errno == EAGAIN)
gl_stat = 1;
else if (errno == EIDRM)
gl_stat = -2;
}
if (!gl_stat && rae_held_lock != NULL)
{
*rae_held_lock = gl_semid;
*rae_hl_index = gl_idx;
}
out:
return(gl_stat);
} /* end rae_get_lock */
This is the reverse of the get lock function, in that it adds one to the lock value. There is no wait flag for releasing a lock, so only the semaphore ID and index are supplied. If the lock value is set to 1, the lock is released and this function clears the semaphore ID and index in the global lock tracking variables.
int rae_release_lock(int rl_semid, int rl_idx)
{
int rl_stat = 0;
struct sembuf sem_b;

sem_b.sem_num = (short int) rl_idx;
sem_b.sem_op = 1;
sem_b.sem_flg = SEM_UNDO;
rl_stat = semop(rl_semid, &sem_b, 1);
if (!rl_stat && rae_held_lock != NULL)
{
*rae_held_lock = 0;
*rae_hl_index = 0;
}
return(rl_stat);
} /* end rae_release_lock */
This set of functions makes using semaphores as easy as:
  • Init lock

  • Get lock

  • Release lock

  • Free lock
Of course, the caller code must be well thought out to prevent a deadly embrace. That is accomplished by keeping the code using the Get and Release calls as simple as possible, and making sure the instructions between them absolutely require the lock.

Later . . . Jim

Monday, May 25, 2009

Realeyes IDS 0.9.5 Released

There is a new download available for the Realeyes IDS version 0.9.5. Read the release notes for details, but essentially, this release is about new user interface features. The screenshots have been updated and there are two new demos, one on installation and configuration, and the other on the new features. There are links to all of this on the technology page

The college where I have been running a pilot project was able to provide an upgraded host system. This led to the discovery of a couple of issues with the IDS sensor software that I had not seen before. One was the signal handler for user interrupts, and the other was in releasing locks if shutting down. Both of these are fixed, although I might make some further improvements before the next release.

The big news is that the college security software faculty heard about the pilot project and is going to use Realeyes in the curriculum. This means that I am now running the system in a lab environment. And all of this means it gets tested more thoroughly. I did find a couple configurations that would not run successfully, so hopefully, the installation process is more solid.

I am hoping that this will help build the community around the project. I have had some interest expressed in working on code, as well as testing it in other environments. So it is moving in the right direction, and with a little luck and some elbow grease, the next release will be 1.0.

Later . . . Jim

Wednesday, April 29, 2009

Command Line vs. GUI Reality Check

I was reading Kyle Rankin and Bill Childer's Point/Counterpoint column on "Mutt vs. Thunderbird" in Linux Journal over breakfast the other day. It mostly boiled down to the perennial text vs. graphical user interface argument. And since I don't have a strong opinion about either mail client, it got me to thinking about the real difference between the two interfaces.

Before I dive into the firepit, let me explain that I spent the first half of my career in the IBM mainframe world, which meant writing Job Control Language (JCL) to submit batch jobs. Part of that time, I spent maintaining a TCP/IP stack written in assembler language. Compared to that, the distance between the command line and the GUI is much smaller than their advocates seem to realize.

To be honest, I didn't use GUIs for a quite a while after they became available. I didn't fight them so much as I just didn't find them intuitive or efficient. However, as usability studies improved the interfaces and processor speeds increased their capabilities, I gradually came to appreciate what they provide.

I think that much of the bias in favor of GUIs comes from the saying, "A picture is worth a thousand words." And certainly, pictures of cityscapes or family reunions or battlefields convey much more information than even the best description. However, imagine having to draw a picture to say, "I'm thirsty," or, "Turn left at the next light," or even to describe how to use a graphical user interface to a first time user. Very often, a few words are more efficient than any picture.

Obviously, each interface has its strengths and weaknesses. So what I was wondering, as I read Bill and Kyle, was whether it is possible to generalize the strengths of each. I'm not suggesting that I have it completely figured out, but I think there are, in fact, some common elements for the situations in which each is superior to the other.

To begin with, I am assuming that the interface in question has gone through enough testing and real-world use to be very good at what it is supposed to do. The Linux/Unix command line's many utilities that can be piped or scripted to create miniature applications is the best example of that interface. And I prefer the GNU collection because of the extra arguments that have been added by developers scratching an itch. Meanwhile, the current state of the art web browsers are good examples of GUIs.

Command line interfaces are best for data that is concise and reasonably well known in advance. For example, comparing the dates of two copies of the same file in different directories:
    ls -l /a/b/c/file /d/e/f/file
Or finding information about running processes:
    ps aux | grep program
Or starting and stopping daemons:
    /etc/init.d/daemon start
    /etc/init.d/daemon stop
The last two examples are mostly used by administrators, and I think that the command line is especially appropriate for that type of work. However, as a developer, I find that the most repetitive tasks I perform are done so most efficiently on the command line. And I personally almost never use a file browser, preferring the flexibility of the command line.

Graphical user interfaces are best for displaying somewhat large amounts of data that needs to be analyzed to determine what to do next. When a web page is loaded in a browser, it is usually not known in advance what link(s) will be selected. A window with a helpful design to guide the user to the proper selection and a mouse pointer that can be moved directly to the appropriate target is a highly effective use of this interface.

With this in mind, I started thinking about the use of both interfaces in the Realeyes IDS.

The installation is performed by executing several shell scripts on the command line. Many of the responses have a default, which allows the Enter key to be pressed for a majority of the responses. And where that is not possible, simple information, such as an IP address or a single name, is entered. To begin with, a graphical environment is not required. And second, there is no switching between the keyboard and the mouse. This makes deploying a new sensor or updating the application very fast after the first couple of times.

In contrast, the analysis window displays large amounts of information that may not be selected in a sequential order. Having the data in a scrollable list allows me to focus on one column, while watching another with my peripheral vision. This allows me to see patterns that might not be apparent in a text display.

Another advantage of the GUI comes when I want to view the session data. This is done by right clicking on a report and selecting Playback from the popup menu displayed. When the reports are sorted by Event, all of those with the same Event are similar enough that a quick glance at the session data is sufficient to determine whether it warrants more attention. Then a single click closes the window. This means that I can often rip through reports, spending only 2 or 3 seconds on most of them.

The GUI also provides tabs that allow me to search for trends or get summary reports without losing my focus on the original display. And the status frames automatically notify me if there is something that needs attention, without my having to query for it.

There are some administrative functions that could be performed from the command line as easily as the GUI, and in the early versions of the Realeyes IDS, it was the only way to do that. However, having them incorporated into the GUI is very convenient.

The downside of this is a lack of flexibility. In order for a capability to be available, there must be code in the GUI application. The command line gives an administrator complete control of maintenance procedures, and under certain circumstances, this is the only option.

From a design perspective, the choice of command line vs. GUI seems pretty straightforward. First, how quickly does the code need to be produced? Second, which interface makes the user most productive? While there is plenty of room for different points of view on the answers to these questions, it is simply not true that one is always better than the other.

Later . . . Jim

Tuesday, March 17, 2009

Database Security

I saw the following on the webappsec list at Security Focus:

| I've heard this preached before.
|
| Using JDBC properly can help protect against SQL Injection.
|
| What protections does JDBC provide?
|
| Does java encode the input to not be malicious?
|
| I'm curious where in the java source/libraries does jdbc help
| to mitigate malicious input when using jdbc.
|
This preach is applicable for any programming language. It
all depends on how well you have done input & output
validation. As in what input you expect & what input is
malicious for your app. If all goes well you can make SQL
injection very difficult or even impossible . The reason I
say difficult, because it all depends on how well the SQL
injection is crafted. As far as I recollect I don't think
JDBC or for that case even java gives you predefined class
for doing that. But there is quite a possibility that some
one on the internet must have surely written these classes.
--
Taufiq
http://www.niiconsulting.com/products/iso_toolkit.html

I don't disagree with Taufiq's assessment. However, I do disagree with his acceptance of the status quo. I wrote a rant on this blog responding to a complaint that security professionals are not taken seriously. In it, I pointed out that the security industry should promote improving the security climate, not just react to it with solutions 'for a price'. The example I gave was *DBC libraries.

The JDBC package, java.sql, does not supply any security parsing. This is not the real workhorse, but it should at least provide a method for this. Each database supplies a jar that java.sql classes call to access the specific database. This is where security parsing must be handled.

The thing is that parsing input is tricky. The first step is to validate that the input is correct for the column data type. This is reasonably straightforward for simple types like integer and varchar. But the way different databases support binary data and very large fields is not consistent. There is also support for non-standard data, such as PostgreSQL's support for the inet data type.

The JDBC Connection interface includes the getMetaData method, which returns the information supplied by the specific database library, some of which is unique to that database. There are not only differences between databases, it is even possible that there are differences between versions of the same database. This could be an issue for an application because:

    Some DatabaseMetaData methods return lists of information in the form of ResultSet objects. Regular ResultSet methods, such as getString and getInt, can be used to retrieve the data from these ResultSet objects.
All unique information must be verified for every version of the database supported. And if you are supporting multiple databases, it is that much more difficult.

The next step is to escape all characters that have special meaning, such as single quote and backslash. But again, each database has its own special characters that must be accounted for, such as ampersand in Oracle, and the E'...' escape pattern in PostgreSQL.

Update: Eric Kerin points out in his comments that the PreparedStatement interface does this, and after some testing I have found that this is the case. My excuse is that there is nothing in the javadoc for the SQL package or the PreparedStatement interface that explains this. Instead the documentation promotes it for optimizing frequently used statements. See my reply below for further responses to comments.

Also, there is a good article on this issue at the Open Web Application Security Project, which I found by googling for java and "sql injection".


The current situation places the responsibility for security on the thousands of application programmers, who must now dig into the internals of the database(s!) on the backend of their applications. If instead, the database development teams provided a parser for each field of data, it would be possible to determine if the input would result in a message, like this one that I was able to create from testing various input sequences:

    WARNING: nonstandard use of \' in a string literal

    ERROR: relation "foobar" does not exist

    STATEMENT: select foo from foobar

I'm still working on parsing that construct and reworking it in a way that does not reject the data out of hand, because it might be a legitimate description of an Event, or possibly a Trigger definition. I am fortunate, because the input is not read from the network. You might not be so lucky.

And before I leave this topic/rant, I must point out that application programmers need to work closely with their DBAs to be sure that permissions are set on tables to allow only as much access as absolutely necessary and no more. If you don't have a DBA and/or maintain the database yourself, you need to become very familiar with the levels of GRANTing access and the use of roles to at least limit the damage when SQL injection attacks succeed. In my own experience, as well as reports from others, the attacks on applications, and databases especially, is continuing to increase.

If anyone is interested in the database security in the Realeyes UI, check out the Database and ValidatorDBForm modules, and then see how they are used in any of the Window*Base.java modules at the UI subversion repository. The ValidatorDBForm class includes the InterfaceAdminExtendedProcessing interface to do extra contextual error checking, which really is the job of the application. There are some pretty good examples of its use in the WindowsRules(Triggers/Actions/Events).java modules.

I'm pretty sure I'm talking for all application developers when I say (as a security guy), "Hey database developers, a little help!"

Later . . . Jim