Showing posts with label linux. Show all posts
Showing posts with label linux. Show all posts

Wednesday, November 4, 2009

PostgreSQL File Corruption

The college where I am running my pilot project was able to let me have a faster, dual-core CPU. To my chagrin, I started getting more errors than before. But it really isn't that surprising, considering that the IDS is pretty intense, and the system is also running PostgreSQL and one or two Java apps.

I fixed the main problems in my code, so from a debugging perspective, it was a good test. And I was happy to see that, with the dual cores, I could run the user interface without shutting down the IDS. However, Xorg starting hanging occasionally, and there was no choice but to do a hard reset.

This was just an annoyance until I started getting PostgreSQL errors, such as "Could not open file pg_clog/000N", which caused me to lose several days worth of reports. For a pilot project, that is not critical, but it certainly raised a flag. So, I am going to document what I have done as well as what I have found from others.

First, backup your data. For my database, it is sufficient to run pg_dump to create scripts to insert data into the tables. But there are options for creating archives and compressing the data and later using pg_restore.

Unfortunately, the most recent backup I had was from a month before, so I wanted to do something about the pg_clog file. Here is what I did:
  1. I tried to run pg_dump, but that caused a really bad error which resulted in the partition being remounted in Read Only mode. At that point, I had no choice but to run fsck and reboot.

  2. With the file system errors fixed, I was able to run pg_dump and save all but a couple dozen reports. I then tried the REINDEX TABLE command, but without the pg_clog file, it failed.

  3. I was forced to use the DROP TABLE command on the table with the bad index, and then used the original CREATE TABLE script and the backup script to restore the data.

  4. Unfortunately, the performance accessing that table and another one with a relationship to it was horrible. So I ended up taking another backup, deleting all of the Incidents tables, recreating them, and then restoring the data.
So that's my story. But I was hopeful that there was a better method, so I have done some searching and here is what I have found from others' experience:
  • If you still have a live database, then if you can run "SELECT ctid FROM tab WHERE ..." for the records with unreasonable values that might tell you what blocks are corrupted. The value before the comma is the block number, which when multiplied by 8192 (assuming you're using 8k blocks) will tell you what file offset to look for the page. To find the file to look for the block in run "SELECT relfilenode FROM pg_class WHERE relname = 'tablename';". The answer will be a number that will be the filename, such as 16384. Note that if the file offset is over 1G then you would be looking for a file named 16384.N where N is which gigabyte chunk.

  • Create an empty file with the command "touch /POSTGRESDIR/pg_clog/000n". Next, fill the file with zeros ( blocks of 8K ) until the offset is covered, using the command "dd bs=8k count=1 < /dev/zero >> /usr/local/pgsql/data/pg_clog/000n", which is repeated until the offset is covered. If there are other files, in pg_clog, create a file with all zeroes the same size as those.

  • If you want to try to narrow down where the corruption is, you can experiment with commands like "SELECT ctid,* from big_table offset N limit 1;"

  • Use pg_resetxlog (located in /usr/lib/postgresql/8.3/bin/pg_resetxlog under Debian/Ubuntu)

  • Dump and reload the data on an other machine. A problem which can appear is that of data which violates constraints (like NOT NULL). One should remove all the constraints and add them back one by one, cleaning out the data which violates it.

  • You can set the client_min_message in postgresql.conf to DEBUG to get some more information.
As you can see, there is no magic wand command to recover your data. But hopefully, this will give you a fighting chance.

Later . . . Jim

Tuesday, September 23, 2008

My App Fails the LSB

My position on the Linux Standard Base has evolved. When I first heard about it, I was all for it. The LSB as a standard could be useful to some, but now I disagree with the goals of the LSB working group. To be sure, this post is not about dissing the Linux Foundation. They have many worthwhile projects.

What follows is my experience with the LSB Application Checker, my take on the purpose of the LSB, and my own suggested solution for installing applications on GNU/Linux distributions. The Realeyes application failed to certify using the checker v2.0.3, which certifies against the LSB v3.2. Everything that it called out could be changed to pass the tests, but I will only consider correcting a few of the 'errors'.

After building the Realeyes v0.9.3 release, I collected all executable files in a common directory tree, downloaded the LSB application checker, and untarred it. The instructions say to run the Perl script, app-checker-start.pl, and a browser window should open. The browser window did not open, but a message was issued saying that I should connect to http://myhost:8889. This did work, and I was presented with the Application Check screen.

There was a text box to enter my application name for messages and one to enter the files to be tested. Fortunately, there was a button to select the files, and when I clicked on it a window opened that let me browse my file system to find the directories where the files were located. For each file to be tested, I clicked on the checkbox next to it, and was able to select all of the files, even though they were not all in the same directory. Then I clicked on the Finish button and all 87 of the selected files were displayed in the file list window.

When I clicked on the Run Test button, a list of about a dozen tasks was displayed. Each was highlighted as the test progressed. This took less than a minute. Then the results were displayed.

There were four tabs on the results page:
  • Distribution Compatability: There were 27 GNU/Linux distributions checked, including 2 versions of Debian, 4 of Ubuntu, 3 of openSUSE, 3 of Fedora, etc. Realeyes passed with warnings on 14 and failed on the rest.

  • Required Libraries: These are the external libraries required by the programs written in C. There were nine for Realeyes, and three (libcrypto, libssl, and libpcap) are not allowed by the LSB. This means that distros are not required to include the libs in a basic install, so they are not guaranteed to be available.

  • Required interfaces: These are the calls to functions in the libraries. There were almost a thousand in all, and the interfaces in the libraries not allowed by the LSB were called out.

  • LSB Certification: This is the meat of the report and is described in some detail below.

The test summary gives an overview of the issues:
  • Incorrect program loader: Failures = 11

  • Non-LSB library used: Failures = 4

  • Non-LSB interface used: Failures = 60

  • Bashism used in shell script: Failures = 21

  • Non-LSB command used: Failures = 53

  • Parse error: Failures = 5

  • Other: Failures = 53

The C executables were built on a Debian Etch system and use /lib/ld-linux.so.2 instead of /lib/ld-lsb.so.3. The Non-LSB libraries and interfaces were described above, but there was an additional one. The Bashisms were all a case of either using the 'source' built in command or using a test like:
  while (( $PORT < 0 )) || (( 65535 < $PORT )); do

which, in other Bourne shells, requires a '$(( ... ))'. The parse errors were from using the OR ("||") symbol.

The fixes for these are:
  • Use the recommended loader

  • Statically link the Non-LSB libraries

  • Use '.' instead of 'source'

  • Rework the numeric test and OR condition

So far, all of this is doable, sort of. But every time a statically linked library is updated, the app must be rebuilt and updates sent out. Also, the additional non-LSB library is used by another one. So I would have to build that library myself and statically link the non-LSB library (which happens to be part of Xorg) in it (and that one is part of the GTK). The reason I am using it is because the user interface is built on the Eclipse SWT classes, which uses the local graphics system calls to build widgets.

The non-LSB commands include several Debian specific commands (such as adduser), and for my source packages I had to rework the scripts to allow for alternatives (such as useradd). But the other disallowed commands are:
  • free: To display the amount of free memory

  • sysctl: To set system values based on the available memory

  • scp: Apparently the whole SSL issue is a can of worms

  • java

  • psql

Finally, all of the other 'errors' are: Failed to determine the file type, for the file types:
  • JAR

  • AWK

  • SQL

  • DTD and XML

Part of the problem with the LSB is that it has bitten off more than it can chew. Apparently Java apps are non-LSB compliant. So are apps written in PHP, Ruby, Erlang, Lisp, BASIC, Smalltalk, Tcl, Forth, REXX, S-Lang, Prolog, Awk, ... From my reading, Perl and Python are the only non-compiled languages that are supported by the LSB, but I don't know what that really means (although I heartily recommend to the Application Checker developers that they test all of the executables in their own app ;-). And I suspect that apps written in certain compiled languages, such as Pascal or Haskell, will run into many non-LSB library issues.

Then there are databases. Realeyes uses PostgreSQL, and provides scripts to build and maintain the schema. Because of changes at version 8, some of these scripts (roles defining table authorizations) only work for version 8.0+. The LSB Application Checker cannot give me a guarantee that these will work on all supported distros because it didn't test them. I have heard that there is some consideration being given to MySQL, but from what I can tell, it is only to certifying MySQL, not scripts to build a schema in MySQL.

After all this kvetching, I have to say that the Application Checker application is very well written. It works pretty much as advertised, it is fairly intuitive, and it provides enough information to resolve the issues reported by tests. My question is, "Why is so much effort being put into this when almost no one is using it?"

An argument can be made that the LSB helps keep the distros from becoming too different from each other and without the promise of certified apps, the distros would not be motivated to become compliant. But I only see about a dozen distros on the list, with Debian being noticeably absent. And yet, there is no more sign of fragmentation in the GNU/Linux world than there ever was.

My theory on why UNIX fragmented is that proprietary licenses prevented the sharing of information which led to major differences in the libraries, in spite of POSIX and other efforts to provide a common framework. In the GNU/Linux world, what reduces fragmentation is the GPL and other FOSS licenses, not the LSB. All distros are using most of the same libraries, and the differences in versions are not nearly as significant as every UNIX having libraries written from scratch.

I have to confess, I couldn't care less whether Realeyes is LSB compliant, because it is licensed under the GPL. Any distro that would like to package it is welcome. In fact, I will help them. That resolves all of the dependency issues.

While I am not a conspiracy theorist, I do believe in the law of unintended consequences. And I have a nagging feeling that the LSB could actually be detrimental to GNU/Linux. The only apps that benefit from LSB compliance are proprietary apps. The theory behind being LSB compliant is that proprietary apps can be guaranteed a successful installation on any LSB compliant GNU/Linux distro. I'm not arguing against proprietary apps. If a company can successfully sell them for GNU/Linux distros, more power to them. However, what if proprietary libraries manage to sneak in? This is where the biggest threat of fragmentation comes from.

But even more importantly, one of the most wonderful features of GNU/Linux distros is updates, especially security updates. They are all available from the same source, using the same package manager, with automatic notifications. If the LSB is successful, the result is an end run around package managers, and users get to deal with updates in the Balkanized way of other operating systems. That is a step in the wrong direction.

The right direction is to embrace and support the existing distro ecosystems. There should be a way for application teams to package their own apps for multiple distros, with repositories for all participating distros. The packages would be supported by the application development team, but would be as straightforward to install and update as distro supported packages.

There is such a utility, developed by the folks who created CUPS. It is called the ESP Package Manager. It claims to create packages for AIX, Debian GNU/Linux, FreeBSD, HP-UX, IRIX, Mac OS X, NetBSD, OpenBSD, Red Hat Linux, Slackware Linux, Solaris, and Tru64 UNIX. If the effort that has gone into LSB certification were put into this project or one like it, applications could be packaged for dozens of distros.

And these would not just be proprietary apps. There are many FOSS apps that don't get packaged by distros for various reasons, and they could be more widely distributed. Since the distros would get more apps without having to devote resources to building packages, they should be motivated to at least cooperate with the project. And don't forget the notification and availability of updates.

As a developer and longtime user of GNU/Linux ('95), I believe that all of the attempts to create a universal installer for GNU/Linux distros are misguided and should be discouraged. I say to developers, users, and the LSB working group, "Please use the package managers. A lot of effort has been put into making them the best at what they do."

Later . . . Jim

Saturday, August 30, 2008

Building a Debian GNU/Linux package

I recently built packages for the 0.9.3 release of Realeyes. These include both source and Debian GNU/Linux packages. For reasons that I have forgotten, I built the Debian packages before I started working on the source packages, but I'm glad that I did. Going through that process made me add several things that I would have overlooked, especially man pages.

I actually built an entire distribution as part of the process of creating my packages. It does not meet the requirements for re-distribution, but is very handy for laying down a fresh install with exactly what is needed to run Realeyes. I will provide the steps for that in another post.

The source is comparatively easy, once all of the files are collected, a tar file is created of the directory. The trick here is to verify that the C code will compile. I have read a lot of comments about how straight-forward the standard configure/make installation procedure is. But from a developer's perspective, there are several issues, mainly having to do with autoconf and automake. I don't have enough space here to discuss these, and I really don't have any wisdom to impart if I did, because I have only learned as much as I needed for my own packages.

Debian packaging is somewhat more complicated. A package that is included in a Debian distribution must go through fairly rigorous tests. There are several scripts for checking correct packaging procedures, including lintian and linda. These verify such conditions as:
  • executables are built correctly

  • man pages exist for every executable file

  • naming conventions are followed

  • Debian documentation is formatted correctly
I have seen a few HOWTOs on building a Debian package, and there is a huge amount of information on the Debian site. But I still had to piece together a working plan for myself, so I thought it would be worth sharing it. There may be better ways to do some of it, but I have written scripts that get me through the process with only a little manual effort. And even when there is an automated process, it is still good to know what is happening under the hood. So without further ado, I offer my experience.

Building a Debian Package


I. Read enough of the manuals to get a sense how Debian packages are built, and then keep links to them for reference:

II. Create a working directory for the package

  • Get a package to use as a model, using the following commands to extract the package files and the Debian metadata files:

    apt-get -d -y --reinstall install package_name
    dpkg -x package_name
    cd package_dir
    dpkg -e ../package_name

  • Create a working directory, and under it make the directories to be installed for the package, even if there are no files to be saved in them. These may include:

    working_dir/etc/package_name
    working_dir/etc/init.d
    working_dir/usr/sbin
    working_dir/usr/share/package_name
    working_dir/usr/share/doc/package_name
    working_dir/usr/share/man
    working_dir/var/log/package_name

  • Make the Debian control directory with its files (as needed)

    working_dir/DEBIAN

    • control: This contains the description of the package, including dependencies, architecture, and the description used by aptitude or synaptic -- use the model to create this for the first time

    • conffiles: This contains any configuration files installed with the package -- I put mine in /etc/realeyes

    • preinst: This is a shell script that runs before the package is installed, if it exists

    • postinst: This is a shell script that runs after the package is installed, if it exists -- I use it to create user IDs

    • prerm: This is a shell script that runs before the package is de-installed, if it exists

    • postrm: This is a shell script that runs after the package is de-installed, if it exists

  • Populate the directories with the application files: Use the model to help understand what goes where

III. Use the maintainer tools to verify package acceptability
  • Build the package:
    cd working_dir
    dpkg-deb --build package_dir package_name
  • lintian/linda: Check for package discrepancies. Note that lintian and linda are not in the standard package and must be installed separately. Lintian uses Perl and linda use Python, so there may be several dependencies installed with them.
    lintian -i package_name > package_name.lintian
    linda package_name > package_name.linda
  • Fix all the problems, here are some helpful hints from my experience:

    • Man pages: txt2man is a program that takes ascii text and converts it to a man page. It works for simple pages, but the resulting groff file may have to be edited manually in some cases. Use 'gzip -9' to compress man pages.

    • Compiled programs: Compiled programs must be stripped. Use the command:
      install -s
    • Identify all non-executable files in system directories (ie. /etc/package_name) in the package's DEBIAN/conffiles.

    • Lintian provides the section in the Debian Policy manual that describes the requirement that was flagged.

  • Sign the package: Create a GPG key for the package and sign each package with the key
    gpg --gen-key
    dpkg-sig -s builder pkg.deb
    NOTES:

    • The public keyring is in $HOME/.gnupg/pubring.gpg

    • There should be a lot of entropy on the system to help the random number generator, (grep -R abc /usr/* seems to work well)

    • Issue the command 'cat /proc/sys/kernel/random/entropy_avail' to find out how much entropy is currently available, it should be at least above 1,500
At this point, the package can be installed using the dpkg command. However, if there are dependencies that must be installed, dpkg will issue a warning about them, but does not handle their installation. So if you want to go to the next level, here is what you have to do.

IV. Repository directories

A Debian repository has a relatively simple directory structure to maintain the packages and metadata about them. An installation ISO is basically a repository tree with just the stable packages. A custom repository tree can be added to the apt sources.list to be accessed just like officially maintained packages, with aptitude or synaptic.

In the top repository directory, the following are mandatory:
  • md5sum.txt: The list of all files in the tree with their md5 checksums

    To create the md5sum.txt file for my mini-distro, I wrote a script that ran in the top ISO directory. It did a recursive ls, ran md5sum on all regular files, and wrote the output to the md5sum.txt file. I keep that as a template and only update the files that change.

  • pool: The subdirectory where packages are kept. Under the pool directory, there are a few pre-defined directories where the different categories of packages are kept. Anyone who has edited a sources.list file has seen most of these:

    • main: Technically, these are packages that meet the Debian Free Software Guidelines (DFSG), but I think of them as the officially maintained packages

    • contrib: Contributed packages are DFSG, but depend on packages that are not -- I use this for my own packages, even though I don't have any non-DFSG dependencies

    • non-free: These are non-DFSG packages

    The structure of each of these is the same. The package directories are in subdirectories named with the first letter of the package name. The exception is libraries, which are under directories named libletter, which is the prefix of the library package name. Below these subdirectories are the directories with the actual package files.

  • dists: The dists directory contains the metadata about packages. There is a directory for the distribution, in this case, etch. In an ISO, there are also the directories, frozen, stable, testing, and unstable, which are links to the distribution directory. In a repository, these may have their own files. But for my purposes, I only include the distribution subdirectory.

    Under the distribution subdirectory are the following:

    • Release: This file describes the packages, including the architecture, the components, and contains md5sums for the package metadata files -- the file information also includes the file size, and since there are only a few of these, I created the original by hand

    • main: This directory contains the metadata about the main packages

    • contrib: This directory contains the metadata about the contrib packages

    The structure of main and contrib is the same, and again, I only use contrib. The architecture directories are below contrib, and in my case, there is only binary-i386. In the architecture directory there are three files:

    • Packages: This uses information from the DEBIAN/control file and adds such things as the full path of the package file

    • Packages.gz

    • Release: This contains metadata about the contrib directory

  • I also put a few optional files in the top level directory. These include a copy of the GPL (I use version 3), installation instructions, and the public key for the signed packages. The installation instructions explain how to add the public key so that aptitude and synaptic can validate the packages.

V. Add the packages
  • Copy the packages to the appropriate pool directory. In my case, this means copying them to:

      iso_dir/pool/contrib/r/realeyescomponent

  • Create an override file for the packages. This consists of a line for each packages with the following information:

      package priority section

    In my case it looks like this:
    realeyesDB   optional  net
    realeyesDBD optional net
    realeyesGUI optional net
    realeyesIDS optional net
    The man page for dpkg-scanpackages (the next command) has a description of each field and says that the override file for official packages is in the indices directory on Debian mirrors.

  • Build the metadata:
    dpkg-scanpackages \
    pool/contrib/ override.etch.contrib > \
    dists/etch/contrib/binary-i386/Packages
    cd dists/etch/contrib/binary-i386
    gzip -c -9 Packages > Packages.gz
    cd ../..
    md5sum contrib/binary-i386/Packages* > md5.tmp
    ls -l contrib/binary-i386/Packages* >> md5.tmp
  • The file, md5.tmp, is edited to put the file size after the md5 checksum and before the file name, and the file listing lines are deleted. Then the file, Release, is edited to read in the file, md5.tmp, at the end and the duplicate lines are deleted.
    gpg --sign -ba -o Release.gpg Release
    cd ../..
    cp -a ~/.gnupg/pubring.gpg RE_pubring.gpg
    md5sum ./INSTALL* > md5sum.txt
    md5sum ./GPL* >> md5sum.txt
    md5sum ./dists/etch/Release >> md5sum.txt
    md5sum ./dists/etch/contrib/binary-i386/* >> md5sum.txt
    md5sum ./pool/contrib/r/*/* >> md5sum.txt

VI. Installing the package

The instructions for installing this package are:
  • Copy the debian packages to a directory that will be used for the initial installation and future updates, such as, /var/tmp/realeyes. Untar the packages:
    tar xvzf realeyes_debian.tar.gz
  • Change to the top level packages directory and add the public key file, RE_pubring.gpg, to the Debian trusted sources with the command:
    apt-key add RE_pubring.gpg
  • Edit the file /etc/apt/sources.list to add the line:
    deb file://install_dir/realeyes/ etch contrib
  • Update the package lists with one of the following methods:

    • On the command line enter:
      apt-get update
    • In aptitude, select Actions -> Update package list

  • Install the package using aptitude or synaptic
So there you have it. I hope it shortcuts the learning curve a little.

Later . . . Jim