Wednesday, November 4, 2009

PostgreSQL File Corruption

The college where I am running my pilot project was able to let me have a faster, dual-core CPU. To my chagrin, I started getting more errors than before. But it really isn't that surprising, considering that the IDS is pretty intense, and the system is also running PostgreSQL and one or two Java apps.

I fixed the main problems in my code, so from a debugging perspective, it was a good test. And I was happy to see that, with the dual cores, I could run the user interface without shutting down the IDS. However, Xorg starting hanging occasionally, and there was no choice but to do a hard reset.

This was just an annoyance until I started getting PostgreSQL errors, such as "Could not open file pg_clog/000N", which caused me to lose several days worth of reports. For a pilot project, that is not critical, but it certainly raised a flag. So, I am going to document what I have done as well as what I have found from others.

First, backup your data. For my database, it is sufficient to run pg_dump to create scripts to insert data into the tables. But there are options for creating archives and compressing the data and later using pg_restore.

Unfortunately, the most recent backup I had was from a month before, so I wanted to do something about the pg_clog file. Here is what I did:
  1. I tried to run pg_dump, but that caused a really bad error which resulted in the partition being remounted in Read Only mode. At that point, I had no choice but to run fsck and reboot.

  2. With the file system errors fixed, I was able to run pg_dump and save all but a couple dozen reports. I then tried the REINDEX TABLE command, but without the pg_clog file, it failed.

  3. I was forced to use the DROP TABLE command on the table with the bad index, and then used the original CREATE TABLE script and the backup script to restore the data.

  4. Unfortunately, the performance accessing that table and another one with a relationship to it was horrible. So I ended up taking another backup, deleting all of the Incidents tables, recreating them, and then restoring the data.
So that's my story. But I was hopeful that there was a better method, so I have done some searching and here is what I have found from others' experience:
  • If you still have a live database, then if you can run "SELECT ctid FROM tab WHERE ..." for the records with unreasonable values that might tell you what blocks are corrupted. The value before the comma is the block number, which when multiplied by 8192 (assuming you're using 8k blocks) will tell you what file offset to look for the page. To find the file to look for the block in run "SELECT relfilenode FROM pg_class WHERE relname = 'tablename';". The answer will be a number that will be the filename, such as 16384. Note that if the file offset is over 1G then you would be looking for a file named 16384.N where N is which gigabyte chunk.

  • Create an empty file with the command "touch /POSTGRESDIR/pg_clog/000n". Next, fill the file with zeros ( blocks of 8K ) until the offset is covered, using the command "dd bs=8k count=1 < /dev/zero >> /usr/local/pgsql/data/pg_clog/000n", which is repeated until the offset is covered. If there are other files, in pg_clog, create a file with all zeroes the same size as those.

  • If you want to try to narrow down where the corruption is, you can experiment with commands like "SELECT ctid,* from big_table offset N limit 1;"

  • Use pg_resetxlog (located in /usr/lib/postgresql/8.3/bin/pg_resetxlog under Debian/Ubuntu)

  • Dump and reload the data on an other machine. A problem which can appear is that of data which violates constraints (like NOT NULL). One should remove all the constraints and add them back one by one, cleaning out the data which violates it.

  • You can set the client_min_message in postgresql.conf to DEBUG to get some more information.
As you can see, there is no magic wand command to recover your data. But hopefully, this will give you a fighting chance.

Later . . . Jim

Thursday, July 16, 2009

Managing Realeyes Memory

I have been at work on a new project that I hope to announce soon. But at the moment I need a break to let the algorithm for a particularly tricky function to germinate. So I am going to describe how I do memory management in the Realeyes IDS.

First, I have to say that this is not a generic memory manager. It is specific to my application. It may be possible to adapt it to other applications, but the key word here is 'adapt'. However, it will hopefully give anyone who is considering doing their own memory management some food for thought.

The reason I do my own memory management is to avoid fragmentation. The Realeyes IDS manages a lot of sessions simultaneously, so memory has to be used efficiently as possible. If a buffer were allocated for exactly the size of a packet's data, the overall buffer space would develop lots of pockets of unusable space. But to set the size of data buffers to the largest allowed by the Internet Protocol would also be inefficient, because there are a huge number of tinygrams in network traffic.

The solution is a compromise. I allocate fixed size buffers in sizes that are designed to waste as little space as possible. The smallest is 64 bytes. The next is 105 bytes, and the next, 128 bytes. So if 56 bytes are needed, the first size buffer is allocated. If 65 bytes are needed, the second. And if 108 bytes are needed, the third.

If you did a double take on the 105 byte buffer size, there is a method to my madness. These buffers are kept in pools of 8 Kilobytes. A 64 byte buffer pool will hold 256 buffers, and a 128 byte buffer pool will hold 64 buffers. Both of these will fit exactly into the 8K pool with no wasted space. To fine tune this a bit, I found the buffer size between them that wastes the least space. Allocating 78 buffers of 105 bytes each uses 8190 bytes, which wastes 2 bytes of space in an 8K pool.

Here is the complete list of buffer sizes: 64, 105, 128, 195, 256, 390, 512, 744, 1024, 2048, 4096, 8192. If larger sizes are needed, multiple adjacent pools are allocated, up to 64K. Again, this is specific to the Realeyes IDS application, which can guarantee that no buffer larger than 64K will ever be requested.

When the application is initialized, a huge buffer (many megabytes) is allocated and it is divided into 8K pools. Then, when a buffer is requested, if there is no pool for the appropriate buffer size already selected, the next available pool is assigned to provide buffers of that size only, and the first buffer in the pool is returned to the requester. If a pool already exists for the buffer size and has free buffers, a buffer from that pool is returned.

To handle the requests, each allocated pool is kept on one of three queues for that buffer size. The reason there are three is that the entire queue must be locked while buffers are being allocated or freed. The Realeyes IDS functions to handle semaphores allow for a caller to request an immediate return if a lock is already held. This means that if one of the pool queues is in use, the caller can try the next one. The rae_mem_mgmt.c module keeps track of the last queue accessed and uses a simple round robin method to check the next queue.

So far, so good. But there are still some loose ends. What if all of the buffers in a pool are in use, and that pool is at the head of the pool queue? For that matter, what if the first 1,000 pools in a queue have no available buffers? This is where the manager comes in.

For each pool size there are Full and Free queues. The manager periodically (about 500 times a second) checks each of the available queues, removes all pools that have no available buffers, and puts them on the Full queue for that buffer size. Pools on the Full queue that have available buffers are put on the Free queue. And pools that are on the Free queue are divvied up so that each available queue has approximately the same number of pools.

There are a few other steps in managing the queues, which is done in rae_mem_ctl.c. If an allocated pool has all of its buffers freed, it is put on a general available queue to be reused for possibly a different buffer size. Also, there is a queue at each buffer size for full pools that have not had available buffers for a period of time. This is only checked once a second to see if buffers have been freed.

So does it work? In the pilot project I have been running for over a year, the statistics I have collected show that the average number of sessions being managed simultaneously is around 20,000. Assuming an average size of 16K per session, that is 325M of data, plus the application data about each session. And that is just a snapshot. There are many Gigabytes of data being examined over the course of an hour. When the IDS does run out of buffers (I'm working on it, OK?!?), it is because the application hasn't released them when it should, not because the memory management is bogging down.

So that's the essence of how memory management is handled in the Realeyes IDS. However, because the application uses multiple processes instead of threads, the memory must be shared. I will cover that in a future post.

Later . . . Jim

Thursday, June 4, 2009

Good Passwords

I have recently had reason to think about "good" passwords. To begin with, passwords are like keys. And weak passwords are like leaving your keys in the ignition of your car when you are out of it -- before long, it's going to be stolen.

But while there is a lot of talk about strong passwords, I have not heard a really usable way of creating them. And by usable, I mean one that typical computer users will actually use consistently. Of course, this is leading to an algorithm I thought of recently.

First, choose three words. How they are chosen doesn't really matter, as long as they are not ridiculously obvious. I think it would be OK to use a standard theme. As an example, while I have no interest in golf, my uncle loves it. So that will be the theme, and for my first password, I will pick:
    plaid

    birdie

    sand
Next, pick three numbers. Once these are chosen they will almost never change. The numbers will be substituted for a letter in the words. This could be the third letter of each word or the second from the last. For this example, I will pick the numbers 4-7-4 and the third letter.

Next choose another letter position, that is not the same as the previous one. This gets capitalized (while all the others are lower case). For the example, it will be the last letter.

So now create the password:

  • plaidbirdiesand

  • pl4idbi7diesa4d

  • pl4iDbi7diEsa4D
If the site requires puctuation, simply choose a punctuation mark and insert it between two of the words:
  • pl4iDbi7diE:sa4D
Now all the user has to do is remember the three words, which are meaningful to only him or her, and should be reasonably easy to remember, even with four or five different passwords. The transformation is the same for every password. So another example is:
  • sliceironcart

  • sl4ceir7nca4t

  • sl4cEir7Nca4T
While this might not be acceptable to super top secret government facilities or financial institutions, it should be sufficient for the majority of people. And it would be a whole lot better than many passwords being used now. If you agree, teach it to everyone you know who uses passwords. Then we can start working on making sure passwords are always encrypted.

Later . . . Jim

Monday, June 1, 2009

Handling Semaphores in C Programs

A while back, Carlo Schroder, over at LinuxToday.com, put out a request for articles on programming. Now that I have put the downloads for Realeyes IDS version 0.9.5 up on SourceForge, I get to have some fun answering her call.

What I have found in programming books, including those by the late W. Richard Stevens (which I turn to most often) is usually a good start, but never the whole story. But since this is not a general programming text, I will focus on a single issue in detail. This post will cover using semaphores.

First, a little background on locks, in case you have never used them. In *nix, process locks are implemented with the semaphore system calls. Since I use child processes that share memory, I have to implement semaphores. Threads use pthread_mutex calls, which do essentially what these functions do, and then some.

The most common reason for implementing locks is if you have multiple concurrently running processes or threads that have access to the same variable in memory. Obtaining or releasing a lock is guaranteed by the operating system to be completed without interruption. A single line of C code, such as
 if (flag & 4)
requires a minimum of two machine instructions:
  • Get the value of flag from memory into a register

  • Compare the value to zero
It is possible the thread running that code could be swapped out after getting the value, but before comparing it, and the value of flag could be changed by another thread, making the comparison invalid. By requiring every thread to get the flag lock before reading or writing the value of flag, only one thread accesses flag at a time.

The rule of thumb for the code while a lock is held is to do only what requires holding the lock, and no more. Often there are less than ten instructions between getting and releasing a lock. However, sometimes there are a couple dozen, because all of them require holding the lock. My memory management is an example of this, and I will try to cover it down the road.

OK, now for how I implement semaphores. For some reason, the caller is required to define the semun structure. This definition is taken from the semctl man page and is in the rae_lock_mgmt.h file.

union semun {
int val; /* Value for SETVAL */
struct semid_ds *buf; /* Buffer for IPC_STAT, IPC_SET */
unsigned short int *array; /* Array for GETALL, SETALL */
struct seminfo *__buf; /* Buffer for IPC_INFO */
};
All of the following code is from the rae_lock_mgmt.c file. First, I define a global to keep track of a held lock. This is done so that if there is an interrupt, such as a segmentation fault, while the lock is held, it can be released in the signal handler by calling rae_release_lock. The caller must set these fields to the address of the variables used to track the semaphore ID and index.
/* Pointer to currently held lock */
int *rae_held_lock = NULL;
int *rae_hl_index = NULL;
Before a lock can be used, it must be initialized. I use a lot of locks and have found that different Linux distributions have different defaults for the maximum number of semaphores an application may allocate. To keep the number of allocations down, I have grouped the locks by functionality, and each group gets a semaphore set (or array) which only uses a single semaphore ID. Therefore, the number of locks in the group is passed to the init function.
int rae_init_lock(int il_size)
{
int i, il_semid = -1;
union semun sem_union;
The semget call returns a new semaphore set. Then each lock in the array is initialized to a value of 1 to indicate that it is available. If that fails, the semaphore ID is set to a negative value, which is -1 if the semaphore set is released, and -2 if it is not.
  if ((il_semid = semget(IPC_PRIVATE, il_size, (0660|IPC_CREAT))) > 0)
{
sem_union.val = 1;
for (i=0; i < il_size; i++)
{
if (semctl(il_semid, i, SETVAL, sem_union) == -1)
{
if (semctl(il_semid, 0, IPC_RMID, sem_union) == -1)
il_semid = -2;
else
il_semid = -1;
}
}
}
return(il_semid);
} /* end rae_init_lock */
When the application shuts down, the locks must be freed. If not, they remain allocated by the system. You can run 'ipcs -s' to see the locks that are held. If an application fails to relaese a lock, you can run ipcrm (read the man page) as root to release it.

Notice that the location of the semaphore ID in the application is passed to this function, and it is set to zero by the function. This is because the sem_ctl command, IPC_RMID, ignores the index and simply removes the entire semaphore. Also, I prefer to do as much as possible in a function so the caller does not have to worry about the details. This way, when I call the same function from different places, I reduce the risk of forgetting to set something.

int rae_free_lock(int *fl_semid, int fl_idx)
{
int fl_stat = 0;
union semun sem_union;

if (*fl_semid == 0)
goto out;
sem_union.val = 1;
fl_stat = semctl(*fl_semid, fl_idx, IPC_RMID, sem_union);
*fl_semid = 0;
out:
return(fl_stat);
} /* end rae_free_lock */
When a lock is needed for a memory location, the get lock function is called with the lock identifier, which consists of the semaphore ID and the index in its array. I have added a wait flag to allow some locks to be conditional. In my memory management code, I have three available buffer queues, and if the lock on one is held, the caller can simply try the next one without waiting.

The semop call gets and releases a lock by subtracting the supplied value from the semaphore's value. What this means is that you, the programmer, are responsible for defining what value indicates a held or released lock. By keeping all of this logic in a pair of functions, you have control over how it is implemented. All of the examples I have seen use 1 to indicate the lock is available, and 0 to indicate that it is held. I can imagine how other values might be used, but it seems ridiculously complicated and prone to error.
int rae_get_lock(int gl_wait, int gl_semid, int gl_idx)
{
int gl_stat = 0;
struct sembuf sem_b;

if (gl_semid == 0)
{
gl_stat = -2;
goto out;
}
The semaphore buffer structure is defined in the system headers. This is where the semaphore array index and operation are set. And as I said before, this function allows the caller to wait or not, which is accomplished by using the semaphore flag bit, IPC_NOWAIT.

The SEM_UNDO flag bit is supposed to reverse the operation when the process terminates, which implies that if the process fails while holding a lock, the system will release it (but not free it). However, in my experience, that doesn't always work, so I have included the capability to do this in my interrupt handlers, as I mentioned above.

  sem_b.sem_num = (short int) gl_idx;
sem_b.sem_op = -1;
if (gl_wait & raeLOCK_WAIT)
sem_b.sem_flg = SEM_UNDO;
else if (gl_wait & raeLOCK_NWAIT)
sem_b.sem_flg = SEM_UNDO | IPC_NOWAIT;
else
{
gl_stat = -1;
goto out;
}
This is the heart of the function. Read the semop man page for a lot more detail, but the general idea is as follows. The semop system call will attempt to subtract 1 from the current value of the lock. If that value is 1, the operation occurs immediately. Otherwise, the call will wait or return with the errno value of EAGAIN if the lock is held. Of course, there is the possibility the call will fail entirely, which must be handled.

If the lock value is set to 0, this means the lock is obtained, and this function sets the semaphore ID and index in the global lock tracking variables.

  if ((gl_stat = semop(gl_semid, &sem_b, 1)) == -1)
{
if ((gl_wait & raeLOCK_NWAIT) && errno == EAGAIN)
gl_stat = 1;
else if (errno == EIDRM)
gl_stat = -2;
}
if (!gl_stat && rae_held_lock != NULL)
{
*rae_held_lock = gl_semid;
*rae_hl_index = gl_idx;
}
out:
return(gl_stat);
} /* end rae_get_lock */
This is the reverse of the get lock function, in that it adds one to the lock value. There is no wait flag for releasing a lock, so only the semaphore ID and index are supplied. If the lock value is set to 1, the lock is released and this function clears the semaphore ID and index in the global lock tracking variables.
int rae_release_lock(int rl_semid, int rl_idx)
{
int rl_stat = 0;
struct sembuf sem_b;

sem_b.sem_num = (short int) rl_idx;
sem_b.sem_op = 1;
sem_b.sem_flg = SEM_UNDO;
rl_stat = semop(rl_semid, &sem_b, 1);
if (!rl_stat && rae_held_lock != NULL)
{
*rae_held_lock = 0;
*rae_hl_index = 0;
}
return(rl_stat);
} /* end rae_release_lock */
This set of functions makes using semaphores as easy as:
  • Init lock

  • Get lock

  • Release lock

  • Free lock
Of course, the caller code must be well thought out to prevent a deadly embrace. That is accomplished by keeping the code using the Get and Release calls as simple as possible, and making sure the instructions between them absolutely require the lock.

Later . . . Jim

Monday, May 25, 2009

Realeyes IDS 0.9.5 Released

There is a new download available for the Realeyes IDS version 0.9.5. Read the release notes for details, but essentially, this release is about new user interface features. The screenshots have been updated and there are two new demos, one on installation and configuration, and the other on the new features. There are links to all of this on the technology page

The college where I have been running a pilot project was able to provide an upgraded host system. This led to the discovery of a couple of issues with the IDS sensor software that I had not seen before. One was the signal handler for user interrupts, and the other was in releasing locks if shutting down. Both of these are fixed, although I might make some further improvements before the next release.

The big news is that the college security software faculty heard about the pilot project and is going to use Realeyes in the curriculum. This means that I am now running the system in a lab environment. And all of this means it gets tested more thoroughly. I did find a couple configurations that would not run successfully, so hopefully, the installation process is more solid.

I am hoping that this will help build the community around the project. I have had some interest expressed in working on code, as well as testing it in other environments. So it is moving in the right direction, and with a little luck and some elbow grease, the next release will be 1.0.

Later . . . Jim

Wednesday, April 29, 2009

Command Line vs. GUI Reality Check

I was reading Kyle Rankin and Bill Childer's Point/Counterpoint column on "Mutt vs. Thunderbird" in Linux Journal over breakfast the other day. It mostly boiled down to the perennial text vs. graphical user interface argument. And since I don't have a strong opinion about either mail client, it got me to thinking about the real difference between the two interfaces.

Before I dive into the firepit, let me explain that I spent the first half of my career in the IBM mainframe world, which meant writing Job Control Language (JCL) to submit batch jobs. Part of that time, I spent maintaining a TCP/IP stack written in assembler language. Compared to that, the distance between the command line and the GUI is much smaller than their advocates seem to realize.

To be honest, I didn't use GUIs for a quite a while after they became available. I didn't fight them so much as I just didn't find them intuitive or efficient. However, as usability studies improved the interfaces and processor speeds increased their capabilities, I gradually came to appreciate what they provide.

I think that much of the bias in favor of GUIs comes from the saying, "A picture is worth a thousand words." And certainly, pictures of cityscapes or family reunions or battlefields convey much more information than even the best description. However, imagine having to draw a picture to say, "I'm thirsty," or, "Turn left at the next light," or even to describe how to use a graphical user interface to a first time user. Very often, a few words are more efficient than any picture.

Obviously, each interface has its strengths and weaknesses. So what I was wondering, as I read Bill and Kyle, was whether it is possible to generalize the strengths of each. I'm not suggesting that I have it completely figured out, but I think there are, in fact, some common elements for the situations in which each is superior to the other.

To begin with, I am assuming that the interface in question has gone through enough testing and real-world use to be very good at what it is supposed to do. The Linux/Unix command line's many utilities that can be piped or scripted to create miniature applications is the best example of that interface. And I prefer the GNU collection because of the extra arguments that have been added by developers scratching an itch. Meanwhile, the current state of the art web browsers are good examples of GUIs.

Command line interfaces are best for data that is concise and reasonably well known in advance. For example, comparing the dates of two copies of the same file in different directories:
    ls -l /a/b/c/file /d/e/f/file
Or finding information about running processes:
    ps aux | grep program
Or starting and stopping daemons:
    /etc/init.d/daemon start
    /etc/init.d/daemon stop
The last two examples are mostly used by administrators, and I think that the command line is especially appropriate for that type of work. However, as a developer, I find that the most repetitive tasks I perform are done so most efficiently on the command line. And I personally almost never use a file browser, preferring the flexibility of the command line.

Graphical user interfaces are best for displaying somewhat large amounts of data that needs to be analyzed to determine what to do next. When a web page is loaded in a browser, it is usually not known in advance what link(s) will be selected. A window with a helpful design to guide the user to the proper selection and a mouse pointer that can be moved directly to the appropriate target is a highly effective use of this interface.

With this in mind, I started thinking about the use of both interfaces in the Realeyes IDS.

The installation is performed by executing several shell scripts on the command line. Many of the responses have a default, which allows the Enter key to be pressed for a majority of the responses. And where that is not possible, simple information, such as an IP address or a single name, is entered. To begin with, a graphical environment is not required. And second, there is no switching between the keyboard and the mouse. This makes deploying a new sensor or updating the application very fast after the first couple of times.

In contrast, the analysis window displays large amounts of information that may not be selected in a sequential order. Having the data in a scrollable list allows me to focus on one column, while watching another with my peripheral vision. This allows me to see patterns that might not be apparent in a text display.

Another advantage of the GUI comes when I want to view the session data. This is done by right clicking on a report and selecting Playback from the popup menu displayed. When the reports are sorted by Event, all of those with the same Event are similar enough that a quick glance at the session data is sufficient to determine whether it warrants more attention. Then a single click closes the window. This means that I can often rip through reports, spending only 2 or 3 seconds on most of them.

The GUI also provides tabs that allow me to search for trends or get summary reports without losing my focus on the original display. And the status frames automatically notify me if there is something that needs attention, without my having to query for it.

There are some administrative functions that could be performed from the command line as easily as the GUI, and in the early versions of the Realeyes IDS, it was the only way to do that. However, having them incorporated into the GUI is very convenient.

The downside of this is a lack of flexibility. In order for a capability to be available, there must be code in the GUI application. The command line gives an administrator complete control of maintenance procedures, and under certain circumstances, this is the only option.

From a design perspective, the choice of command line vs. GUI seems pretty straightforward. First, how quickly does the code need to be produced? Second, which interface makes the user most productive? While there is plenty of room for different points of view on the answers to these questions, it is simply not true that one is always better than the other.

Later . . . Jim

Tuesday, March 17, 2009

Database Security

I saw the following on the webappsec list at Security Focus:

| I've heard this preached before.
|
| Using JDBC properly can help protect against SQL Injection.
|
| What protections does JDBC provide?
|
| Does java encode the input to not be malicious?
|
| I'm curious where in the java source/libraries does jdbc help
| to mitigate malicious input when using jdbc.
|
This preach is applicable for any programming language. It
all depends on how well you have done input & output
validation. As in what input you expect & what input is
malicious for your app. If all goes well you can make SQL
injection very difficult or even impossible . The reason I
say difficult, because it all depends on how well the SQL
injection is crafted. As far as I recollect I don't think
JDBC or for that case even java gives you predefined class
for doing that. But there is quite a possibility that some
one on the internet must have surely written these classes.
--
Taufiq
http://www.niiconsulting.com/products/iso_toolkit.html

I don't disagree with Taufiq's assessment. However, I do disagree with his acceptance of the status quo. I wrote a rant on this blog responding to a complaint that security professionals are not taken seriously. In it, I pointed out that the security industry should promote improving the security climate, not just react to it with solutions 'for a price'. The example I gave was *DBC libraries.

The JDBC package, java.sql, does not supply any security parsing. This is not the real workhorse, but it should at least provide a method for this. Each database supplies a jar that java.sql classes call to access the specific database. This is where security parsing must be handled.

The thing is that parsing input is tricky. The first step is to validate that the input is correct for the column data type. This is reasonably straightforward for simple types like integer and varchar. But the way different databases support binary data and very large fields is not consistent. There is also support for non-standard data, such as PostgreSQL's support for the inet data type.

The JDBC Connection interface includes the getMetaData method, which returns the information supplied by the specific database library, some of which is unique to that database. There are not only differences between databases, it is even possible that there are differences between versions of the same database. This could be an issue for an application because:

    Some DatabaseMetaData methods return lists of information in the form of ResultSet objects. Regular ResultSet methods, such as getString and getInt, can be used to retrieve the data from these ResultSet objects.
All unique information must be verified for every version of the database supported. And if you are supporting multiple databases, it is that much more difficult.

The next step is to escape all characters that have special meaning, such as single quote and backslash. But again, each database has its own special characters that must be accounted for, such as ampersand in Oracle, and the E'...' escape pattern in PostgreSQL.

Update: Eric Kerin points out in his comments that the PreparedStatement interface does this, and after some testing I have found that this is the case. My excuse is that there is nothing in the javadoc for the SQL package or the PreparedStatement interface that explains this. Instead the documentation promotes it for optimizing frequently used statements. See my reply below for further responses to comments.

Also, there is a good article on this issue at the Open Web Application Security Project, which I found by googling for java and "sql injection".


The current situation places the responsibility for security on the thousands of application programmers, who must now dig into the internals of the database(s!) on the backend of their applications. If instead, the database development teams provided a parser for each field of data, it would be possible to determine if the input would result in a message, like this one that I was able to create from testing various input sequences:

    WARNING: nonstandard use of \' in a string literal

    ERROR: relation "foobar" does not exist

    STATEMENT: select foo from foobar

I'm still working on parsing that construct and reworking it in a way that does not reject the data out of hand, because it might be a legitimate description of an Event, or possibly a Trigger definition. I am fortunate, because the input is not read from the network. You might not be so lucky.

And before I leave this topic/rant, I must point out that application programmers need to work closely with their DBAs to be sure that permissions are set on tables to allow only as much access as absolutely necessary and no more. If you don't have a DBA and/or maintain the database yourself, you need to become very familiar with the levels of GRANTing access and the use of roles to at least limit the damage when SQL injection attacks succeed. In my own experience, as well as reports from others, the attacks on applications, and databases especially, is continuing to increase.

If anyone is interested in the database security in the Realeyes UI, check out the Database and ValidatorDBForm modules, and then see how they are used in any of the Window*Base.java modules at the UI subversion repository. The ValidatorDBForm class includes the InterfaceAdminExtendedProcessing interface to do extra contextual error checking, which really is the job of the application. There are some pretty good examples of its use in the WindowsRules(Triggers/Actions/Events).java modules.

I'm pretty sure I'm talking for all application developers when I say (as a security guy), "Hey database developers, a little help!"

Later . . . Jim

Saturday, March 14, 2009

Java Search

I think I'm finally getting the hang of Object Oriented programming. I have been working on the user interface to provide all administration from it and add quite a bit of usability.

Over the past few days, I added search to the playback window. Because the playback window has two frames (Text classes in Java), this is a bit trickier than your standard text search. To begin with, I am allowing the search to be limited to one or the other frame, as well as using both. This means that the search class has to be aware of each frame.

To be able to highlight the text, the Text class actually needs to be defined as StyledText. When the text is found, the replaceStyleRanges method is called to highlight it. For now, I am leaving it highlighted, thinking that it is more helpful to be able to see all of the found selections. The current found text is displayed in reverse video by using the setSelection method. This one has to be reset by setting the selection range to zero before setting the new selection.

I thought about being able to use a single find window to search multiple playback windows, but this made my head hurt. However, it did seem friendly to share the search strings between playback windows. So I created a string array in the global variables class, and store the strings there. I even save them to the preference store so that they are maintained over application restarts.

If you are interested in this code, check out the WindowPlayback*.java source modules in the subversion repository. The preference store is defined and initialized in the Globals.java and MainWindow.java modules.

The beauty of the OOP style is that almost all of my code is spent managing indexes. The heavy lifting is done by other classes and their methods. So I am hopeful that these GUI enhancements will be finished pretty soon, and I will build another set of download packages by the end of the month.

Later . . . Jim

Monday, February 23, 2009

Whatcha Doin'?

As soon as I put the latest download on SourceForge, I started working on the user interface. I am hoping to bring it up to a version 1.0 level of usability. At the rate I am going, I expect to have a download ready in about a month.

So that's what's coming. But what I barely mentioned, was a new feature in the last package that I added a week or so before it was built. The reason was, I wanted to see how well it worked in the pilot project before explaining it. But now I have done that, and I am pretty happy with it.

The feature is a comparison of the client and server session size. When a TCP session ends, the client data is multiplied by a factor, and if that is larger than the server data, it is reported. The multipliers are specified by port, so each protocol can be handled uniquely.

In the pilot, I have set port 80 to have a multiplier of 1, just as a proof of concept. There have not been too many reports on it, and those can be grouped in two categories:
  • Document uploads: This was what I was hoping to see. And I believe that it means if there is an exploit that loads a rootkit, that would be reported, as well.

  • Large cookies: I never realized how much is saved in some cookies. I have seen several (Facebook, I'm looking at you!) that are over 4Kbytes.

  • Miscellaneous: The rest of the reports are usually client requests for many files, but some of them don't get sent for some reason.
Again, this is mostly a proof of concept feature and I hope to expand on it down the road. But it gives me the sense that the Realeyes IDS is capable of detecting behavior. I think that's pretty cool.

Later . . . Jim

Saturday, January 24, 2009

New Realeyes IDS Release and Demos

There is a new download available for Realeyes v0.9.4. Read the release notes for details, but basically there are several fixes and a few new features.

Unfortunately, while testing the packages, I discovered that the previous release had a ridiculous error that caused the system to fail if only IPv4 or IPv6 were chosen to be monitored. My lame excuse for this is that I made some significant changes to the database and spent a lot of time testing those. When I modified the install script to allow for alternate combinations of IPv4 and IPv6, I 'assumed' that the original case would work.

Another problem I discovered is that between the last release (Sept. 2008) and now, the eclipse project archived the version of the SWT libraries I was pointing to from the installation instructions and the Microsoft Windows installer. So that has been corrected, and hopefully won't change again.

Essentially it comes down to people and their (my) limitations. I was planning to add a page to the project web site to encourage developers, but I made it a priority after finding that embarrassing glitch.

I am very pleased to announce that I have created several demos of the application. The main links go to youtube, but there are ogg theora versions available also. And, if anything, these are better than those at youtube. Assuming you have the mplayer plugin installed in your browser, you should get a video that fills the window, which makes it much easier to see than the little viewer on the youtube page.

The demos were created using the GTK version of recordMyDesktop. This is a simple application, but it does what it is supposed to very well. The only technical issue I had was that the output file name needs to be set before the recording.

The biggest problem I had was in my performance. I wrote outlines for the demos, but discovered that even just moving the cursor around, much less opening and closing windows, caused me to lose track of what I was saying. So I eventually wrote out scripts which I nearly read word-for-word. And even at that, there are noticeable pauses in a couple of places. If possible, I recommend that 2 people work together, one talking, the other manipulating the screen.

Anyhow, I hope these demos give people an idea of the power of this system. I am finding quite a bit that is interesting at the college where I am running a pilot project. I will be writing about that soon.

Later . . . Jim