Sunday, March 6, 2011

Database performance in practice

For a project at work, I wanted to put some syslog messages into a database so you would be able to query the dataset easily and most importantly, quickly. To my surprise, this was not as easy as I thought it would be. Database engines are not the magic tools they promise to be.

The main problem with syslog data is its volume. The relevant logging lines of several machines combined, gathered over three years time was about 50 million records. You would think a database would be able to hold 50 million records ...

MongoDB is a fairly recent new high performance database engine. MongoDB is "web-scale" meaning that you can spread load over multiple servers. I have only one server so web-scale doesn't apply in this case. MongoDB doesn't understand SQL, but instead it uses a Javascript interface. I don't like SQL much so initially I was quite happy with MongoDB. There is some controverse around MongoDB because it defers commits to disk (which means undetected data loss in case of a power outage) in order to gain performance but in this case I could live with it and this wasn't a show stopper.
It comes with nice Python bindings so developing the app was quickly done. MongoDB doesn't require you to define any schemas or anything at all so it's very easy to get something working quickly.
Tests with a months worth of syslog data were satisfactory so I decided to load up all the data, with records dating back as long as three years. This took a while so I let it run overnight.
Queries now took a long time to complete. Unacceptably long. To cut down on query time, I took the extreme measure to make a subset of the data that contained only three months worth of data.
I read that querying on datetime objects was slow, so I broke it down to year/month/day numbers and used that instead. When adding these columns, the database files on disk blew up from 50 gigs to 120 gigs. Disk space was not really an issue on this system, but really, 120 gigabytes?
Memory usage was much worse though, to the point it became unbearable. MongoDB uses the mmap() system call to map on-disk files into memory, with the unfortunate consequence that when you do a costly query, it will consume all system memory. Linux has a-OK memory management so your system won't die on the spot, but it was clearly having a hard time. Since I didn't want to dedicate the full box to MongoDB, this was a no-go and Mongo had to go.

MySQL is a famous free relational database and would have been many a person's first choice anyway, so I decided to give it a go. One of the reasons I don't like SQL databases is that you have sit down and take the time to set it up. Create the database, the user(s), set passwords, write the schema etcetera before you can start. What you get in return for this investment is that you can give certain users or apps read-only rights to the database (which is kind of important when dealing with logging data) and you can easily review what attributes and data types you used to create the tables with.
Tests with a months worth of syslog data were satisfactory so I decided to load up all the data, with records dating back as long as a year. This was practically impossible. Inserting rows into MySQL goes alright for a while and then it slows down, and slows down more and more until the point were it takes minutes to insert data into the database! I tried improving this with transactions (do multiple inserts and commit) and it didn't help!
Fifty M records is too much for MySQL. MySQL's on-disk and in-memory footprint were formidable, but who cares when you can't even put some data in your database. You can do tricks with partitioning in MySQL but I didn't try it because it creates new issues to consider. Someone said I should use PostgreSQL, but I was already too tired to try.

'Proprietary' solution
I was convinced that a modern PC should be able to handle 50 million records, so I decided to do it the hardcoded way using a packed struct and dump that into a binary file. Database lovers cringed and called me an idiot but I let them. Using some shortcuts, it was possible to cram a syslog line into just 48 bytes — and this includes two magic bytes to detect possible data corruption. The total amounts to 2.5 gigabytes which takes about half a minute to read through from disk, but the subset is only a few hundred megs and is processed much quicker.
Because the data can be packed so small, it's no problem doing a linear search to find what you are looking for. A linear scan through memory may be an inefficient way of searching but it finishes in a split second so no further optimisations are necessary at this point. Mind you, there is no indexing going on because I'm not trying to imitate a database engine.
What's also cool about this data is that when you are searching by date, you can skip through the data using a binary partitioning method (just like bsearch() does). Moreover, data can be easily partitioned by writing to a new data file every month.
The major drawback of this solution is of course development time. It's written in C for performance reasons and as a consequence the program is not yet fully featured, and it will take considerable effort before it is.

Databases are generic tools that keep you from writing lots of code that have to do with storing and searching through data. In that respect, databases are like what scripting languages are compared to low-level programming languages. You can outperform (or maybe in this case I should say, outmaneuver) them but whether it's a smart thing to go that way depends.

The given problem can probably be solved using MySQL by dynamically creating a new partition each month and adapting the application to work with that. I'm pretty confident that it will work but this too will require some more work to get done.

What bothers me is that database engines are overkill for the small data store problems (where you would typically use some ASCII flat format file to put all records in), while at the same time these heavyweight tools cripple under the load when you put a lot of records into them. All of a sudden you have to be a database expert for solving a seemingly simple problem.

Update (March 14, 2011)
I set up a MySQL db with the archive storage engine and partitioned it by month. It's currently loaded with 150 million records (!) and counting ... It's holding up nicely. We will get some more experience with it in the coming weeks, but so far, I'm quite impressed with it especially because it is running on a rather cheap and simple PC server.