Monday, July 20, 2015

Solving the mother of all sudoku puzzles

This post is all about sudoku, the venerable Japanese puzzle. Even though the hype lies far behind us, every once in a while I get bored and return to solving a few puzzles. But then one day I was given an exceptionally hard to beat sudoku puzzle. Even after hours of staring it down, I wasn’t getting anywhere. Hmmm… I was not going to be belittled by some stupid puzzle! And so I decided to cheat a little and crack it by writing a sudoku solver.

Sudoku is a numbers game, but there is no calculation involved whatsoever. The ‘game board’ is a set of 9×9 tiles, each containing a number from 1 to 9. The numbers in a row must be unique, however, as well as the numbers in each column, and the numbers in each 3×3 sub block.
Some numbers are given; it is up to you (the player) to fill in the gaps. Once all the numbers are filled in according to the given rules, the puzzle is solved. In general, a sudoku puzzle has one, and only one, unique solution.


First of all, let me mention that brute-forcing it is a definitive no-no. There are countless possibilities that do not lead to the solution. I figured that the total number of permutations would be around 9×9! (over 3 million), but two researchers have calculated that there are 5,472,730,538 (almost 5.5 billion) essentially different sudoku grids. The number of raw permutations is much larger however, in the range 6.67×1021.
Trying them all is going to take forever, even on the fastest computer available today. We need to be smarter than that—why not simply play the game and see where it gets us?

In order to let the computer play, we are going to need a couple of routines:
  1. load a given sudoku puzzle
  2. check whether puzzle is correct
  3. find cell with only one possibility left
It’s easy enough to store the puzzle into a 9×9 array of cells. Each cell holds a number, but also has a property saying whether it’s a ‘solid’ cell, or can be filled in by the player.
Checking the correctness of the puzzle is a three-step process: check all the rows, columns, and 3×3 blocks. Finding a cell with only one possibile solution also checks all the rows, columns, and 3×3 blocks.

This is a reasonable start for a solver program, but note that it will only solve the most naive beginner-level sudoku puzzles. The problem is, there will be cells that have more than just one option, and this solver is just not good enough. For the super evil sudoku, we are going to have to do better than this.

Rather than finding a cell with only one possible solution left, let’s take a look at what possible solutions there are for a blank cell. A blank cell may have any number between 1 and 9, unless that number is already present in that row, column, or 3×3 block. Any already filled-in numbers are removed from the set of possible solutions for the cell. The remaining set holds a number of possiblities, of which one is the solution for this cell. Let’s try them all, but because we want to have the solution as quickly as possible, start with the cell that has the smallest set. So if there is a cell with only one or two possible solutions, try that one first before moving on to harder to crack cells.

As you are trying out possible solutions, the numbers lock into place and they can no longer be part of solution sets in other cells that are in the same row, column or block. Slowly, the puzzle is being solved. But we may still hit a wall and have to go back to the point where we tried a wrong possibilty; this happens because not all attempts lead to a perfect puzzle. Many configurations are invalid, and need to be discarded.

Going back is easily done by means of backtracking. Backtracking works by recursion. I find it easy to think about in terms of functional programming; in functional programming, objects are immutable, they can not be changed. If you wish to change the state of an object, then that is a new object, albeit a modified clone of the object.
So whenever trying a solution to a cell, clone the whole puzzle. Put that puzzle object onto a stack. Now recurse, and try solving the next cell. If the puzzle is invalid, backtrack: pop the stack and return. As the recursive function is returning, it will continue trying the next solution to the cell. Notice how cells that ‘work’ are followed-up on and lead to the next step in solving the puzzle, and solutions that don’t work tend to cut off excess work.

I implemented this solver in Python. Implementation detail: cloning is done with deepcopy.copy(). My aging computer solves the evil sudoku in under half a second. On the Raspberry Pi 2 it takes 3.5 seconds. Remember that brute-forcing would take forever. You don’t need a fast computer (nor a low-level programming language) to pull this off. What’s important is choosing the right algorithm.

Backtracking is the algorithm for this kind of puzzles, where a partial list of candidates could lead to a solution. In fact, after finding a solution, the algorithm may continue to run and find all solutions. In a bold move, I handed it a completely blank 9×9 grid. It started spewing out all solutions to sudoku … each and every sudoku there is, ever was, or ever will be. And after a while, I hit Ctrl-C.

Sunday, June 14, 2015

Live Coding & Hotloading

This marks this blog's 100th post. Well, it's number 98 really, but two posts have been held back forever as draft. Anyway, I have been saving something special—a magic trick to impress your friends with. I saw this trick being performed live by an oldskool game programmer on Twitch, and it's called “live coding”. It enables you to type in some C code, and immediately see the results onscreen, even while the game is running. How can that be? It's a kind of magic.

Computers aren't any good at performing magic tricks, so something must be happening behind the scenes. As you may know, games run in a loop. What you see onscreen, is the result of frames being rendered by the render loop. With live coding, we dynamically load newly compiled code, and execute it. So, we are effectively changing the game code as it is running. The result is as if we are using C as an internal scripting language.
This is made possible by dynamic linking; rather than statically linking executable code, a shared object is loaded and linked at run-time. The coolio term for dynamic linking is “hotloading”.

Hotloading requires the game to be written with a certain structure; the game loop is in the main program, while the actual gameplay functions are dynamic, they are compiled into a shared object. It is a loadable module, like a plugin. The main program will check whether a newer shared object file is present in the directory, and if so, load it.

The functions for loading a shared object file at run-time are platform-specific. On UNIX (think Linux, OSX) you use dlopen(), dlsym(), dlclose().
On Windows, use LoadLibrary(), GetProcAddress(), and FreeLibrary().
Or we can use SDL to do this in a portable way:
typedef void (*Func)(void *);

    if (handle != NULL) {
        SDL_UnloadObject(handle);
        handle = NULL;
    }
    handle = SDL_LoadObject(filename);
    if (handle == NULL) {
        error("failed to load %s", filename);
    }
    func = (Func)SDL_LoadFunction(handle, "game");
    if (func == NULL) {
        error("failed to find symbol 'game'");
    }
The code shown will load a shared object file, and find the function named game. Afterwards, we can call func(), effectively calling game().

It's not all that easy though as there are some caveats. You will want to add live coding capability in the early stages of development; adding it later on is going to be painful. I will try to explain why.

Notice that we obtained a pointer to a function by searching for the symbol, ie. the function name. Mind that on Windows, that works both ways; if you want to call a function from the game code that resides in the main part, you need to know where that function is. Hence the desire to structure the program such that we need to know about as few functions as possible.
I had no problem whatsoever on UNIX, where calling functions back and forth like that works naturally, out of the box. If you develop on UNIX and forget about this bitsy detail, you will be in a world of pain when you want to port to Windows later on.

Secondly, all global variables must be declared in the main part of the program. You can't put them in the shared object, because that memory is going to be unloaded and replaced once the new version gets loaded. This means that you also can not have any static variables in the module. If you do, the variable will lose its value when the module is unloaded and the game will misbehave.
This asks for a coding style where you invariably pass a pointer to a big bad struct that contains everything, and I do mean everything, that should be globally accessible to the entirety of the program.
Personally, I'm not fond of this style at all—say what you will, but I like having statics in my modules.
In turn, this quickly leads to doing your own memory management. Note that it is a myth that live coding requires you do your own memory management.

Another myth is you can't use C++ with live coding. C++ compilers do name mangling, throwing a spanner in the works. You can work around this by making sure that the interface is in plain C, and does not rely on C++. Entry point functions should be declared extern "C".

Because all globals live in a big struct, you can't use live coding for making large changes. You can't add new globals. Once you touch the layout of the big memory structure, the program will crash. And if it doesn't crash, you will be overwriting fields in memory unintentionally.

My programmin’ brother dismissed live coding as a gimmick. It is super useful for tweaking small parameters like monster speed, hit points, and such. But all these parameters shouldn't be hard-coded in the first place; they should be tunables in the game engine that developers can play with as the engine is running.

Despite its drawbacks, seeing live coding in action is most impressive. It allows you to experiment in a playground. It turns good old C into a scripting language, and you must admit, that's all we ever wanted.

Sunday, May 17, 2015

Loading PNG images with libpng

For textures in games I'm often inclined to quickly grab some old TGA (targa) loader. The code works, and thus I've been dragging it along for years. Another often used format is Windows BMP. Writing an image loader can be quite an undertaking, which is because of the huge variety of features a single image format may support. Lucky for us, we don't have to write a PNG loader from scratch; let's use the libpng library.

About the PNG image format
Nearly 20 years ago the PNG (Portable Network Graphics) image format was invented for the Web. Up till then, the most used image formats were GIF and JPEG. GIF supports only 256 colors, and uses a patented compression method. JPEG does not support transparency, and uses lossy compression, resulting in visual artefacts.
PNG improved greatly on this by supporting 32-bit color, transparency, and lossless compression. Moreover, it's absolutely patent-free and freely available. Since it was invented for the Web and not photography or printing press, PNG lacks alternate color schemes like CMYK, and you won't find the image format being used for those applications. It's a good fit for our purposes, however.

The PNG file format consists of a header followed by a number of chunks. The most basic chunks are PLTE (palette) and IDAT (data), but there are many other kinds of chunks, for example for storing color information and metadata, like last modified time. The image data is compressed with the deflate algorithm, which is the algorithm used by zip/unzip. Deflate does lossless compression, and more importantly, the algorithm is not patented. It is present in zlib, which is also freely available.
However, the image lines are filtered by one of five possible filter types to improve the compression rate. Many kinds of pixel formats are supported. Finally, PNG supports interlacing, so pixels aren't necessarily stored neatly in sequence.
This is just to illustrate that writing a PNG loader from scratch is not easily done. We don't have so much time on our hands, and will therefore be using the library.

A walkthrough of the code
Let's load up some pixels — should be easy, right? So you open up the manual for libpng, only to find that it starts with a whopping 20 pages listing just function headers. That's right, there are about 260 different functions in libpng. We won't need to use them all, but unfortunately, loading a PNG image is not quite straightforward. But let's get down to it.
Our PNG_Image class will have height, width, a bytes per pixel member, a color format (for OpenGL), and a pixel buffer. Following is the annotated code needed to load a PNG image. The method will take a filename and return true on success.

In the top the file, include the appropriate headers:
#include <png.h>
#include <gl.h>

First off, we will want to open a .png file and check the header for the “PNG” signature.
    FILE *f = fopen(filename, "rb");
    if (f == nullptr) {
        return false;
    }

    // check that the PNG signature is in the file header

    unsigned char sig[8];
    if (fread(sig, 1, sizeof(sig), f) < 8) {
        fclose(f);
        return false;
    }
    if (png_sig_cmp(sig, 0, 8)) {
        fclose(f);
        return false;
    }

Next, we need to create two data structures: png_struct and png_info. Many of libpng's functions will require us to pass in these structs. You can think of png_struct as a “PNG object” that holds state for various functions. It won't hold any image data, however. The png_info struct will hold metadata like image height and width. Creating these structs allocates memory from the heap; if anything goes wrong we must take care to deallocate them, or else we'd be leaking memory.
[Also mind that we still have an open file].
    // create two data structures: 'png_struct' and 'png_info'

    png_structp png = png_create_read_struct(PNG_LIBPNG_VER_STRING, nullptr, nullptr, nullptr);
    if (png == nullptr) {
        fclose(f);
        return false;
    }

    png_infop info = png_create_info_struct(png);
    if (info == nullptr) {
        png_destroy_read_struct(&png, nullptr, nullptr);
        fclose(f);
        return false;
    }

Mind that plain C has no exceptions. To work around this, libpng has an error handling mechanism that works with setjmp(). Imagine decompressing a corrupted image, and you don't want the program to crash. What libpng does is, it jumps back to the point where you issued the setjmp() call. So this here is just some cleanup code.
    // set libpng error handling mechanism
    if (setjmp(png_jmpbuf(png))) {
        png_destroy_read_struct(&png, &info, nullptr);
        fclose(f);

        if (pixels != nullptr) {
            delete [] pixels;
            pixels = nullptr;
        }
        return false;
    }

Q: Where did that pixels array come from?
A: That's our pixel buffer member variable. It won't be allocated until later. The construction with setjmp() is turning the code inside out.

We are just getting started here. Let's gather some basic information: image height, width, and such.
    // pass open file to png struct
    png_init_io(png, f);

    // skip signature bytes (we already read those)
    png_set_sig_bytes(png, sizeof(sig));

    // get image information
    png_read_info(png, info);

    w = png_get_image_width(png, info);
    h = png_get_image_height(png, info);

We're only interested in RGB and RGBA data, so other bit depths and color types are converted. So even if the image is in grayscale or whatever, we will get RGB data when reading the image. It's nice that libpng can do these conversions on the fly.
    // set least one byte per channel
    if (png_get_bit_depth(png, info) < 8) {
        png_set_packing(png);
    }

PNG may have a separate transparency chunk. Convert it to alpha channel.
    // if transparency, convert it to alpha
    if (png_get_valid(png, info, PNG_INFO_tRNS)) {
        png_set_tRNS_to_alpha(png);
    }

Next, get the color type. The format variable is meant as a parameter for OpenGL texturing. If the format can not be determined, give up and return an error.
    switch(png_get_color_type(png, info)) {
        case PNG_COLOR_TYPE_GRAY:
            format = GL_RGB;
            png_set_gray_to_rgb(png);
            break;

        case PNG_COLOR_TYPE_GRAY_ALPHA:
            format = GL_RGBA;
            png_set_gray_to_rgb(png);
            break;

        case PNG_COLOR_TYPE_PALETTE:
            format = GL_RGB;
            png_set_expand(png);
            break;

        case PNG_COLOR_TYPE_RGB:
            format = GL_RGB;
            break;

        case PNG_COLOR_TYPE_RGBA:
            format = GL_RGBA;
            break;

        default:
            format = -1;
    }
    if (format == -1) {
        png_destroy_read_struct(&png, &info, nullptr);
        fclose(f);
        return false;
    }

This is how to get the number of bytes per pixel, regardless of channels and bit depth:
    bpp = (uint)png_get_rowbytes(png, info) / w;

[That bpp variable is bytes per pixel, not bits per pixel].

We don't want to receive any interlaced image data. Even if the image is interlaced, we want to have the ‘final’ image data. There's a func for that:
    png_set_interlace_handling(png);

We are done reading the header (info) chunk. You need to tell libpng you want to get ahead and skip the rest of the chunk.
    png_read_update_info(png, info);

Now we are ready to go read the image data. We deferred allocating the pixel buffer up till now (this is how the manual says you should do it).
Annoyingly enough, png_read_image() expects an array of pointers to store rows of pixel data. So you need to construct both a pixel buffer and a temporary array of pointers to the rows in that pixel buffer. And then you can read the image data.
    // allocate pixel buffer
    pixels = new unsigned char[w * h * bpp];

    // setup array with row pointers into pixel buffer
    png_bytep rows[h];
    unsigned char *p = pixels;
    for(int i = 0; i < h; i++) {
        rows[i] = p;
        p += w * bpp;
    }

    // read all rows (data goes into 'pixels' buffer)
    // Note that any decoding errors will jump to the
    // setjmp point and eventually return false
    png_read_image(png, rows);

You actually need to read the end of the PNG data. This will read the final chunk in the file, and check that the file is not corrupted.
    png_read_end(png, nullptr);

Finally, clean up and return true, indicating success.
    png_destroy_read_struct(&png, &info, nullptr);
    fclose(f);
    return true;

Well ... that was a little odd, but we managed to load some pixels. Not included in this tutorial: we can pass the pixel buffer on to OpenGL to create textures.

Opinion or plain justified criticism
libpng provides tons of functionality for working with PNG images. Be that as it may, there is a lot not to like here. For one thing, loading an image is a ridiculous, unwieldy process. Conspicuously absent is a simple and stupid png_load() function that does all that listed above, now left as an exercise by the developers of libpng.
All I wanted to do was load an image, but libpng required me to learn everything there is to know about the file format, the library, and then some.
At any rate, do the dance and it will work. If it's too much for you, there's always SDL_Image.

Monday, April 27, 2015

Fast Pseudo Random Number Generator

Games need a good pseudo random number generator (PRNG). Flipping cards, rolling dice, determining hit points, choosing whether to fire a bullet or not, all these are random choices. Random is a strange phenomenon though. The set of numbers 1,1,1,1 is just as random as 4,2,1,3 — even though I carefully hand-picked both sequences, the latter looks more random. A shuffled list of numbers is perceived as being more random than a list of truly random numbers.

It is a well-known fact (or actually, myth) that computers do not do random numbers; instead they are computed and therefore they are predictable. This was true until computer scientists found that real-world noise like typing speed, mouse movement, networking interrupts, and hard drive rotation statistics may be used for entropy: a source for random number generation. This is only a source though and the produced random number sequences may still be predictable depending on the algorithm used. Cryptographically safe RNGs aim to make it impossible to guess the next random number. There is a kind of chicken-and-egg relation here: crypto functions need good RNGs to be able to scramble the input; the scrambled output appears as random as any random data. Therefore the crypto function can be used as a RNG.

Anyway, there are some problems when it comes to games. Fast-paced action games may make hundreds of random choices every frame. At 60 frames per second, we don't want to take too much time generating random numbers. Therefore cryptographically safe RNGs are off limits, we are going to have to settle with a more predictable PRNG.
Another problem is that at 60 fps, we might quickly drain the entropy pool. We can't afford having to wait for new entropy, and we don't want the operating system to run out. So even though modern operating systems can supply fairly truthful random numbers, we are not going to bother and stick with “decent enough” instead; we are going at it old-skool and calculate the pseudo random numbers by ourselves.

PRNG requirements for games:
  • it must be fast
  • it must show good distribution
  • not repeating numbers often
  • no cryptographically safeness required

In them old days, we would use rand() and seed it by calling srand(). rand() is a notoriously bad PRNG and you should not use it nowadays. The BSD man page even calls it “bad random number generator”. It has a low period, meaning that after a while the same prefix starts to repeat. This is a bad habit for a PRNG.
I personally have bad experiences with rand(). My Tetris clone would happily generate the same block over and over again, giving some really bad gameplay experiences. I started handing out bonuses for this, but it didn't make it more fun. The game notably improved when I switched to a different PRNG.
The modern, new and improved equivalent of rand() is random(). It should be noted however that there might be a portability issue here; random() may not be available on your platform of choice.

Mersenne Twister
A popular modern PRNG is the Mersenne Twister. It is named after the Mersenne prime number that it uses. It is a well respected PRNG, and an easy choice for C++ programmers, as it is included in the C++11 standard library. The C++11 way to get some random numbers is:
#include <random>

    std::random_device rd;     // only used once for seeding
    std::mt19937 rng(rd());    // select Mersenne Twister
    std::uniform_int_distribution<int> dist(0, 1000);
    int n = dist(rng);
The Mersenne Twister is relatively slow for game code. Apparently it badly stresses the CPU cache.

XorShift
The most simple and fastest decent PRNG there is, is the XorShift algorithm. It simply does a couple of bit-shift and exclusive-OR operations, and that's it. It has four internal state variables that you should initialize (in a more or less random manner, as you see fit).
uint32_t xorshift128(void) {
    // algo taken from wikipedia page
    uint32_t t = state.x ^ (state.x << 11);
    state.x = state.y;
    state.y = state.z;
    state.z = state.w;
    state.w = state.w ^ (state.w >> 19) ^ t ^ (t >> 8);
    return state.w;
}
XorShift is incredibly simple, but good enough for games. In that respect, it is a perfect fit for our purposes. There are XorShift* and XorShift+ variants, that use multiplication and addition respectively, and produce 64 bits values.

Statistical Analysis
In a feeble attempt to prove that XorShift suffices, I calculated the standard deviation for a series of numbers between 0 and 1000 generated by rand(), random(), std::mt19937, and xorshift128(). They all produce a standard deviation around 285, meaning that they all exhibit the same fairly high spread of values. In that sense, we can say that probably they all produce sets of random values usable for games. This is by no means an extensive analysis of the PRNGs, of course. There exists a test suite named TESTU01 for testing PRNGs, but I'll leave it up to the math and computer scientists to play around with that.

For now, I'm happy to have found that the XorShift PRNG is exceptionally fast, while still producing decently enough pseudo random numbers. It is a perfect fit for games.

Two interesting links for further reading: