Monday, November 14, 2011

Thread safety and thread specific data

Every once in a while I work again on my Pythonesque / Objective-Cish library for standard C. One of the goals of this library is simplicity; ease of use. One of the ways of coming to an interface that is easy to use is the use of static variables. Statics keep the state and save it for later. A negative side-effect of statics is, however, that they break thread safety. In a multi-threaded program you can not have static variables because every thread works on the same memory.

Thread safety is an important issue. A library that is not thread-safe can only work correctly when used in serial codes. Today's computers have multiple cores, and tomorrow's computers will have many (and I mean many) cores. Although in some cases it is alright to use a forking model, it is generally not a bad thing to use threading whenever and wherever you can.

For synchronized access to global variables there are mutexes, semaphores and condition variables. For access to static variables we need something totally different. Each thread should have its own copy of the static variable, because the static keeps state and the state may be different for each thread. This is called thread-specific data. The creators of the POSIX thread library recognized the need for an interface to handle thread-specific data, so they included some functions for this in their API.

The answer to the problem of converting static variables into thread-specific data lies in using the functions pthread_key_create(), pthread_getspecific() and pthread_setspecific().
#include <pthread.h>

pthread_key_t key_buf = (pthread_key_t)0L;

void func(void) {
/* static char buf[64]; old code; not thread-safe */

    char *buf = pthread_getspecific(key_buf);
    if (buf == NULL) {
        if ((buf = (char *)malloc(64)) == NULL)
            error();

        if (pthread_setspecific(key_buf, buf) != 0)
            error();
    }
    ... do something ...
    ... save state into buf ...
}

void initialize(void) {
    if (key_buf != (pthread_key_t)0L)
        return;

    if (pthread_key_create(&key_buf, free) != 0)
        error();
}
In the code example above, key_buf is a key that is used to get access to the thread-specific data. The key must first be created (or initialized). This can be done in a serial part of the code, or you can put a mutex lock around the code that creates the key. Once the key has been created, it exists for all threads, even for threads that will be created later.
In func() we obtain the thread-specific buffer using the key for that thread-specific data. If this thread did not have its own allocated buffer yet, allocate it and assign it to this thread's specific data.
What if the thread exits? We allocated a buffer for this thread, so would it leak memory? The answer is no; when we created the key, we passed free() as second argument to pthread_key_create(). The second argument is the destructor, it gets called when the thread exits. Note that the destructor is bound to the key, so you must use each unique key in the same way for every thread. (Which makes perfect sense, but I'm just tellin' ya).

Concluding, thread-specific data is the perfect mechanism for keeping state in threads, where you would normally – in purely serial code – use static variables.