Monday, November 14, 2011

Thread safety and thread specific data

Every once in a while I work again on my Pythonesque / Objective-Cish library for standard C. One of the goals of this library is simplicity; ease of use. One of the ways of coming to an interface that is easy to use is the use of static variables. Statics keep the state and save it for later. A negative side-effect of statics is, however, that they break thread safety. In a multi-threaded program you can not have static variables because every thread works on the same memory.

Thread safety is an important issue. A library that is not thread-safe can only work correctly when used in serial codes. Today's computers have multiple cores, and tomorrow's computers will have many (and I mean many) cores. Although in some cases it is alright to use a forking model, it is generally not a bad thing to use threading whenever and wherever you can.

For synchronized access to global variables there are mutexes, semaphores and condition variables. For access to static variables we need something totally different. Each thread should have its own copy of the static variable, because the static keeps state and the state may be different for each thread. This is called thread-specific data. The creators of the POSIX thread library recognized the need for an interface to handle thread-specific data, so they included some functions for this in their API.

The answer to the problem of converting static variables into thread-specific data lies in using the functions pthread_key_create(), pthread_getspecific() and pthread_setspecific().
#include <pthread.h>

pthread_key_t key_buf = (pthread_key_t)0L;

void func(void) {
/* static char buf[64]; old code; not thread-safe */

    char *buf = pthread_getspecific(key_buf);
    if (buf == NULL) {
        if ((buf = (char *)malloc(64)) == NULL)

        if (pthread_setspecific(key_buf, buf) != 0)
    ... do something ...
    ... save state into buf ...

void initialize(void) {
    if (key_buf != (pthread_key_t)0L)

    if (pthread_key_create(&key_buf, free) != 0)
In the code example above, key_buf is a key that is used to get access to the thread-specific data. The key must first be created (or initialized). This can be done in a serial part of the code, or you can put a mutex lock around the code that creates the key. Once the key has been created, it exists for all threads, even for threads that will be created later.
In func() we obtain the thread-specific buffer using the key for that thread-specific data. If this thread did not have its own allocated buffer yet, allocate it and assign it to this thread's specific data.
What if the thread exits? We allocated a buffer for this thread, so would it leak memory? The answer is no; when we created the key, we passed free() as second argument to pthread_key_create(). The second argument is the destructor, it gets called when the thread exits. Note that the destructor is bound to the key, so you must use each unique key in the same way for every thread. (Which makes perfect sense, but I'm just tellin' ya).

Concluding, thread-specific data is the perfect mechanism for keeping state in threads, where you would normally – in purely serial code – use static variables.

Sunday, November 6, 2011

Portal rendering engine (or not)

All this thinking and writing about the id Tech 5 engine in my last post brought back a lot of memories of DOOM and Quake. It sparked an old desire in me to develop a 3D engine for a first person game. As computers have become faster over the years, making a DOOM like game now is a lot easier than it was in 1993. On the other hand, people also expect to see more in a game now than back then. It shouldn't be your goal to be better than Rage or write the next Crysis. As an amateur, you can not set the bar that high for yourself, or you will fail. Yet it is still a challenge to create an engine that will allow you to walk smoothly through fairly large data sets.

The odd thing about DOOM is that it's hard to comprehend how its internals work because it's not really 3D and uses complex trickery rather than straightforward OpenGL to draw the scene. In 1993 there were no hardware 3D graphics cards for PC and even worse, the strongest CPUs in that day had a hard time doing 3D math. Therefore, DOOM employs techniques that you would not use nowadays.
I find DOOM interesting though for thought experiments. It works with a 2D map, which simplifies things greatly. The map may be 2D, but it is all vector based and it does include height. Starting out in 3D from nothing with little experience at all is a daunting task. Therefore I say it is a-okay to start out in a vector based 2D world and work up from there.

Imagine a floor plan of a building and having to visualize a first person view, walking through that building. Note that I start out by considering only a single floor and not a full apartment tower block.
Suppose the player is looking down a corridor. Left and right are closed doors to offices. Further on, an office door is open on the left side. At the end, the corridor continues to the right.
The engine may have loaded the entire map into memory, but there is no reason to calculate nor draw any geometry for rooms that are not visible. So, you want to take some scissors and clip away the parts of the map that can not be seen from the current player's position. It's easy to do when you look at it, but how can we do this in a computer program?
The hardest part of the whole rendering engine is determining what walls are visible and what walls are not, from any position on the map, looking into any direction.

The answer lies in dividing the map up into sectors. It is not based on a rectangular grid, a sector is merely a collection of walls. The corridor itself is such a sector. The office room that is on the left is a sector. The part at the end, where the corridor takes a right turn, that's a new sector. These sectors are connected through doors, but also through invisible doors that are always open. They are just lines on the map. These are portals. You can see through them and you can walk through them. Portals can also be small windows with bars, if you like.

Back to our rendering engine. The player is in sector 1. The engine considers drawing the walls in sector 1. During this process, it also encounters a portal to sector 2 (office room) and a portal to sector 3 (corridor takes a right turn). The player can see inside these sectors so the engine has to take sector 2 and 3 into consideration as well.

Only small parts of sector 2 and 3 should be visible (as we are standing in the corridor), so optimizations can be made. Before drawing the geometry at all, the engine performs view frustum culling. Since you can only see what is inside the portal, the viewing frustum for the sector is adjusted to the portal itself. The line of sight from the viewpoint (of the player) to the boundaries of the portal now form the new viewing frustum for looking from one sector into the other; from the corridor into the next room. This will cull any walls in sector 2 that are not visible and only draw walls that are visible from the player's point of view.

Sector 3 is another corridor which connects to more rooms and more portals. The other portals will be culled as described above. Hence it doesn't matter how much geometry is beyond sector 3, it will not be considered.

In a game engine, you can put this to other use as well. For example, to activate monsters: an enemy soldier in the next area might be walking in circles while whistling a tune, so that the player can hear him while he's not seeing him yet. This kind of games ties enemies down to the areas where they live on the map, if the player rushes past them on to the next sector, they are typically literally out of view and not being considered (or controlled) by the game engine any more.

Now for the real bonus. The geometry of a building is static, so unless you have moving walls there is no further computation involved. Simply send the geometry to the video hardware and the scene is being rendered.

So, is everything great? Not really. In theory you can adjust the view frustum to the portal to see into the other sector, but in practice it's not super easy to do in 3D. The portal may have an irregular shape which complicates matters.
Another practical issue is where to put portals on the map. It's easy for a human level designer to draw them into a 2D map, but it gets hard for 3D maps. Moreover, it's incredibly hard to develop a program that does this automatically. You can use a binary space partitioning (BSP) method but it tends to cut up a map into many more pieces than a human would, creating many more portals. Although not incorrect, but having many portals in a scene means having to do more calculations on the CPU, which is something we were trying to avoid.

You will find no references to the term portal in technical descriptions of how the Quake engine works. Quake maps are partitioned into clusters (sectors) and each cluster carries onboard a potentially visible set (PVS) in the form of a bit array that tells the engine what other clusters are visible from this cluster. It then proceeds to do coarse grained view frustum culling using bounding boxes, and finally rendering the geometry. OpenGL will take care of backface culling.
The PVS are precomputed and the maps have been optimally segmented so that Quake never has to bother with portals. It is well possible that portals are used in the preprocessing step, although they might as well use a costly brute force method like casting rays of light and see what walls are lit.

For id's latest game Rage, I realized that it's largely based on the same engine. Rage's texturing is awesome (read my previous post) but the way the corridors and halls work are still the same as before. DOOM3 and Rage feature very high polygon count environments but the trick is, they are still simple blocky corridors. A player can never climb on or crawl under pipes that run along the walls, revealing the fact that the player is in reality walking through a crude maze of invisible walls with some very detailed visuals situated behind those plexiglass walls.
The game's rendering engine deals with a small collection of flat walls per sector and selects which walls to render. Each flat wall has a property that points to the complex geometry that the player is about to see. The engine then sends that geometry to the GPU and the result is a highly detailed 3D scene for a very low cost of CPU power.

For further reading on portal engines I recommend flipcode's articles on Building a 3D Portal Engine. I have to say it is doubtful that any modern game today uses a true portal renderer. That aside, it is an interesting way of looking at the problem and flipcode's articles are a fun and educational read.