tag:blogger.com,1999:blog-44535930411295501162024-03-13T01:51:36.268+01:00The Developer's CryYet another blog by a hobbyist programmer.Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comBlogger99125tag:blogger.com,1999:blog-4453593041129550116.post-24186277372986519892015-07-20T18:34:00.000+02:002015-07-20T18:39:03.508+02:00Solving the mother of all sudoku puzzles<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://4.bp.blogspot.com/-VyrzQvxcr7s/SSltl2KM2eI/AAAAAAAAAE4/V4Vi3p9Zkmw/s1600/routing_bitmask.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
This post is all about sudoku, the venerable Japanese puzzle. Even though
the hype lies far behind us, every once in a while I get bored and return to
solving a few puzzles. But then one day I was given an exceptionally hard
to beat sudoku puzzle. Even after hours of staring it down, I wasn’t getting
anywhere. Hmmm… I was not going to be belittled by some stupid puzzle!
And so I decided to cheat a little and crack it by writing a sudoku solver.</div>
<div style="text-align: justify;">
<br />
Sudoku is a numbers game, but there is no calculation involved whatsoever.
The ‘game board’ is a set of 9×9 tiles, each containing a number from
1 to 9. The numbers in a row must be unique, however, as well as the numbers
in each column, and the numbers in each 3×3 sub block. <br />
Some numbers are given; it is up to you (the player) to fill in the gaps.
Once all the numbers are filled in according to the given rules, the puzzle
is solved. In general, a sudoku puzzle has one, and only one, unique solution.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-i_Y6ExKeygQ/Va0gnLZ3NdI/AAAAAAAADi8/moTAL5SwbrA/s1600/evil_sudoku.png" imageanchor="1"><img border="0" src="http://4.bp.blogspot.com/-i_Y6ExKeygQ/Va0gnLZ3NdI/AAAAAAAADi8/moTAL5SwbrA/s1600/evil_sudoku.png" /></a></div>
<br />
First of all, let me mention that brute-forcing it is a definitive no-no.
There are countless possibilities that do not lead to the solution. I figured
that the total number of permutations would be around 9×9! (over 3 million),
but two researchers have calculated that there are 5,472,730,538 (almost
5.5 billion) essentially different sudoku grids. The number of raw
permutations is much larger however, in the range 6.67×10<sup>21</sup>.<br />
Trying them all is going to take forever, even on the fastest computer
available today. We need to be smarter than that—why not simply <i>play</i> the
game and see where it gets us?<br />
</div>
<div style="text-align: justify;"><br />
In order to let the computer play, we are going to need a couple of routines:</div>
<ol style="text-align: justify;">
<li>load a given sudoku puzzle</li>
<li>check whether puzzle is correct</li>
<li>find cell with only one possibility left</li>
</ol>
<div style="text-align: justify;">
It’s easy enough to store the puzzle into a 9×9 array of cells. Each
cell holds a number, but also has a property saying whether it’s a ‘solid’
cell, or can be filled in by the player. <br />
Checking the correctness of the puzzle is a three-step process: check all
the rows, columns, and 3×3 blocks. Finding a cell with only one possibile
solution also checks all the rows, columns, and 3×3 blocks.</div>
<div style="text-align: justify;">
<br />
This is a reasonable start for a solver program, but note that it will only
solve the most naive beginner-level sudoku puzzles. The problem is, there
will be cells that have more than just one option, and this solver is just
not good enough. For the super evil sudoku, we are going to have to do better
than this.</div>
<div style="text-align: justify;">
<br />
Rather than finding a cell with only one possible solution left, let’s
take a look at what possible solutions there are for a blank cell.
A blank cell may have any number between 1 and 9, unless that number
is already present in that row, column, or 3×3 block. Any already
filled-in numbers are removed from the set of possible solutions for the cell.
The remaining set holds a number of possiblities, of which one is <i>the</i>
solution for this cell. Let’s try them all, but because we want to have
the solution as quickly as possible, start with the cell that has the smallest
set. So if there is a cell with only one or two possible solutions, try that
one first before moving on to harder to crack cells.</div>
<div style="text-align: justify;">
<br />
As you are trying out possible solutions, the numbers lock into place and
they can no longer be part of solution sets in other cells that are in the
same row, column or block. Slowly, the puzzle is being solved. But we may
still hit a wall and have to go back to the point where we tried a wrong
possibilty; this happens because not all attempts lead to a perfect puzzle.
Many configurations are invalid, and need to be discarded.</div>
<div style="text-align: justify;">
<br />
Going back is easily done by means of backtracking. Backtracking works by
recursion. I find it easy to think about in terms of functional programming;
in functional programming, objects are immutable, they can not be changed.
If you wish to change the state of an object, then that is a new object,
albeit a modified clone of the object. <br />
So whenever trying a solution to a cell, clone the whole puzzle. Put that
puzzle object onto a stack. Now recurse, and try solving the next cell.
If the puzzle is invalid, backtrack: pop the stack and return. As the recursive
function is returning, it will continue trying the next solution to the
cell. Notice how cells that ‘work’ are followed-up on and lead to the next
step in solving the puzzle, and solutions that don’t work tend to cut off
excess work.</div>
<div style="text-align: justify;">
<br />
I implemented this solver in Python. Implementation detail: cloning is done with
<span style="font-family: "Courier New",Courier,monospace;">deepcopy.copy()</span>. My aging computer solves the evil sudoku in under half a
second. On the Raspberry Pi 2 it takes 3.5 seconds. Remember that
brute-forcing would take forever. You don’t need a fast computer (nor a
low-level programming language) to pull this off. What’s important is choosing
the right algorithm.</div>
<div style="text-align: justify;">
<br />
Backtracking is <i>the</i> algorithm for this kind of puzzles, where a partial
list of candidates could lead to a solution. In fact, after finding a
solution, the algorithm may continue to run and find <i>all</i> solutions.
In a bold move, I handed it a completely blank 9×9 grid. It started
spewing out all solutions to sudoku … each and every sudoku there is,
ever was, or ever will be. And after a while, I hit Ctrl-C.</div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-50888456698378922292015-06-14T13:50:00.000+02:002015-06-14T17:10:06.090+02:00Live Coding & Hotloading<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://2.bp.blogspot.com/-x_tk5euXJn4/SXyF8AskaaI/AAAAAAAAAFQ/xAPD3xIPz5Y/s1600/openal.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
This marks this blog's 100th post. Well, it's number 98 really, but two posts have been held back forever as draft. Anyway, I have been saving something special—a magic trick to impress your friends with. I saw this trick being performed live by an oldskool game programmer on Twitch, and it's called “live coding”. It enables you to type in some C code, and immediately see the results onscreen, even while the game is running. How can that be? It's a kind of magic.</div>
<br />
<div style="text-align: justify;">
Computers aren't any good at performing magic tricks, so something must be happening behind the scenes. As you may know, games run in a loop. What you see onscreen, is the result of frames being rendered by the render loop. With live coding, we dynamically load newly compiled code, and execute it. So, we are effectively changing the game code as it is running. The result is as if we are using C as an internal scripting language.<br />
This is made possible by dynamic linking; rather than statically linking executable code, a shared object is loaded and linked at run-time. The coolio term for dynamic linking is “hotloading”.</div>
<br />
<div style="text-align: justify;">
Hotloading requires the game to be written with a certain structure; the game loop is in the main program, while the actual gameplay functions are dynamic, they are compiled into a shared object. It is a loadable module, like a plugin. The main program will check whether a newer shared object file is present in the directory, and if so, load it.</div>
<br />
<div style="text-align: justify;">
The functions for loading a shared object file at run-time are platform-specific. On UNIX (think Linux, OSX) you use <span style="font-family: "Courier New",Courier,monospace;">dlopen()</span>, <span style="font-family: "Courier New",Courier,monospace;">dlsym()</span>, <span style="font-family: "Courier New",Courier,monospace;">dlclose()</span>.<br />
On Windows, use <span style="font-family: "Courier New",Courier,monospace;">LoadLibrary()</span>, <span style="font-family: "Courier New",Courier,monospace;">GetProcAddress()</span>, and <span style="font-family: "Courier New",Courier,monospace;">FreeLibrary()</span>.<br />
Or we can use SDL to do this in a portable way:</div>
<blockquote class="tr_bq">
typedef void (*Func)(void *);<br />
<br />
if (handle != NULL) {<br />
SDL_UnloadObject(handle);<br />
handle = NULL;<br />
}<br />
handle = SDL_LoadObject(filename);<br />
if (handle == NULL) {<br />
error("failed to load %s", filename);<br />
}<br />
func = (Func)SDL_LoadFunction(handle, "game");<br />
if (func == NULL) {<br />
error("failed to find symbol 'game'");<br />
}</blockquote>
<div style="text-align: justify;">
The code shown will load a shared object file, and find the function named <span style="font-family: "Courier New",Courier,monospace;">game</span>. Afterwards, we can call <span style="font-family: "Courier New",Courier,monospace;">func()</span>, effectively calling <span style="font-family: "Courier New",Courier,monospace;">game()</span>.</div>
<br />
<div style="text-align: justify;">
It's not all that easy though as there are some caveats. You will want to add live coding capability in the early stages of development; adding it later on is going to be painful. I will try to explain why.</div>
<br />
<div style="text-align: justify;">
Notice that we obtained a pointer to a function by searching for the symbol, ie. the function name. Mind that on Windows, that works both ways; if you want to call a function from the game code that resides in the main part, you need to know where that function is. Hence the desire to structure the program such that we need to know about as few functions as possible.</div>
<div style="text-align: justify;">
I had no problem whatsoever on UNIX, where calling functions back and forth like that works naturally, out of the box. If you develop on UNIX and forget about this bitsy detail, you will be in a world of pain when you want to port to Windows later on.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Secondly, all global variables must be declared in the main part of the program. You can't put them in the shared object, because that memory is going to be unloaded and replaced once the new version gets loaded. This means that you also can not have any <span style="font-family: "Courier New",Courier,monospace;">static</span> variables in the module. If you do, the variable will lose its value when the module is unloaded and the game will misbehave.</div>
<div style="text-align: justify;">
This asks for a coding style where you invariably pass a pointer to a big bad struct that contains everything, and I do mean everything, that should be globally accessible to the entirety of the program.<br />
Personally, I'm not fond of this style at all—say what you will, but I like having statics in my modules.</div>
<div style="text-align: justify;">
In turn, this quickly leads to doing your own memory management. Note that it is a <i>myth</i> that live coding <i>requires</i> you do your own memory management.</div>
<div style="text-align: justify;">
<br />
Another myth is you can't use C++ with live coding. C++ compilers do name mangling, throwing a spanner in the works. You can work around this by making sure that the interface is in plain C, and does not rely on C++. Entry point functions should be declared <span style="font-family: "Courier New",Courier,monospace;">extern "C"</span>.</div>
<br />
<div style="text-align: justify;">
Because all globals live in a big struct, you can't use live coding for making large changes. You can't add new globals. Once you touch the layout of the big memory structure, the program will crash. And if it doesn't crash, you will be overwriting fields in memory unintentionally.</div>
<br />
<div style="text-align: justify;">
My programmin’ brother dismissed live coding as a gimmick. It is super useful for tweaking small parameters like monster speed, hit points, and such. But all these parameters shouldn't be hard-coded in the first place; they should be tunables in the game engine that developers can play with as the engine is running.</div>
<br />
<div style="text-align: justify;">
Despite its drawbacks, seeing live coding in action is most impressive. It allows you to experiment in a playground. It turns good old C into a scripting language, and you must admit, that's all we ever wanted.</div>
<div style="text-align: justify;">
<br /></div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-9021749978658525012015-05-17T22:05:00.000+02:002015-05-17T22:05:03.260+02:00Loading PNG images with libpng<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://2.bp.blogspot.com/-WUjeaclsvcs/TmPG5bZji4I/AAAAAAAADFA/q3GM6ke1IWI/s1600/virtual_in_virtual.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
For textures in games I'm often inclined to quickly grab some old TGA (targa) loader. The code works, and thus I've been dragging it along for years. Another often used format is Windows BMP. Writing an image loader can be quite an undertaking, which is because of the huge variety of features a single image format may support. Lucky for us, we don't have to write a PNG loader from scratch; let's use the libpng library.<br />
<br />
<span style="color: #b45f06;"><b>About the PNG image format</b></span><br />
Nearly 20 years ago the PNG (Portable Network Graphics) image format was invented for the Web. Up till then, the most used image formats were GIF and JPEG. GIF supports only 256 colors, and uses a patented compression method. JPEG does not support transparency, and uses lossy compression, resulting in visual artefacts.<br />
PNG improved greatly on this by supporting 32-bit color, transparency, and lossless compression. Moreover, it's absolutely patent-free and freely available. Since it was invented for the Web and not photography or printing press, PNG lacks alternate color schemes like CMYK, and you won't find the image format being used for those applications. It's a good fit for our purposes, however.<br />
<br />
The PNG file format consists of a header followed by a number of chunks. The most basic chunks are PLTE (palette) and IDAT (data), but there are many other kinds of chunks, for example for storing color information and metadata, like last modified time. The image data is compressed with the deflate algorithm, which is the algorithm used by zip/unzip. Deflate does lossless compression, and more importantly, the algorithm is not patented. It is present in zlib, which is also freely available.<br />
However, the image lines are filtered by one of five possible filter types to improve the compression rate. Many kinds of pixel formats are supported. Finally, PNG supports interlacing, so pixels aren't necessarily stored neatly in sequence.<br />
This is just to illustrate that writing a PNG loader from scratch is not easily done. We don't have so much time on our hands, and will therefore be using the library.<br />
<br />
<span style="color: #b45f06;"><b>A walkthrough of the code</b></span><br />
Let's load up some pixels — should be easy, right? So you open up the manual for libpng, only to find that it starts with a whopping 20 pages listing just function headers. That's right, there are about 260 different functions in libpng. We won't need to use them all, but unfortunately, loading a PNG image is not quite straightforward. But let's get down to it.</div>
<div style="text-align: justify;">
Our <span style="font-family: "Courier New",Courier,monospace;">PNG_Image</span> class will have height, width, a bytes per pixel member, a color format (for OpenGL), and a pixel buffer. Following is the annotated code needed to load a PNG image. The method will take a filename and return <span style="font-family: "Courier New",Courier,monospace;">true</span> on success.<br />
<br />
In the top the file, include the appropriate headers:<br />
<blockquote class="tr_bq">
#include <png.h><br />
#include <gl.h></blockquote>
<br />
First off, we will want to open a .png file and check the header for the “PNG” signature.</div>
<blockquote class="tr_bq">
FILE *f = fopen(filename, "rb");<br />
if (f == nullptr) {<br />
return false;<br />
}<br />
<br />
// check that the PNG signature is in the file header<br />
<br />
unsigned char sig[8];<br />
if (fread(sig, 1, sizeof(sig), f) < 8) {<br />
fclose(f);<br />
return false;<br />
}<br />
if (png_sig_cmp(sig, 0, 8)) {<br />
fclose(f);<br />
return false;<br />
}</blockquote>
<br />
<div style="text-align: justify;">
Next, we need to create two data structures: <span style="font-family: "Courier New",Courier,monospace;">png_struct</span> and <span style="font-family: "Courier New",Courier,monospace;">png_info</span>. Many of libpng's functions will require us to pass in these structs. You can think of <span style="font-family: "Courier New",Courier,monospace;">png_struct</span> as a “PNG object” that holds state for various functions. It won't hold any image data, however. The <span style="font-family: "Courier New",Courier,monospace;">png_info</span> struct will hold metadata like image height and width. Creating these structs allocates memory from the heap; if anything goes wrong we must take care to deallocate them, or else we'd be leaking memory.<br />
<span style="color: #666666;">[Also mind that we still have an open file].</span></div>
<blockquote class="tr_bq">
// create two data structures: 'png_struct' and 'png_info'<br />
<br />
png_structp png = png_create_read_struct(PNG_LIBPNG_VER_STRING, nullptr, nullptr, nullptr);<br />
if (png == nullptr) {<br />
fclose(f);<br />
return false;<br />
}<br />
<br />
png_infop info = png_create_info_struct(png);<br />
if (info == nullptr) {<br />
png_destroy_read_struct(&png, nullptr, nullptr);<br />
fclose(f);<br />
return false;<br />
}</blockquote>
<br />
<div style="text-align: justify;">
Mind that plain C has no exceptions. To work around this, libpng has an error handling mechanism that works with <span style="font-family: "Courier New",Courier,monospace;">setjmp()</span>. Imagine decompressing a corrupted image, and you don't want the program to crash. What libpng does is, it jumps back to the point where you issued the <span style="font-family: "Courier New",Courier,monospace;">setjmp()</span> call. So this here is just some cleanup code.</div>
<blockquote class="tr_bq">
// set libpng error handling mechanism<br />
if (setjmp(png_jmpbuf(png))) {<br />
png_destroy_read_struct(&png, &info, nullptr);<br />
fclose(f);<br />
<br />
if (pixels != nullptr) {<br />
delete [] pixels;<br />
pixels = nullptr;<br />
}<br />
return false;<br />
}</blockquote>
<br />
<div style="text-align: justify;">
Q: Where did that <span style="font-family: "Courier New",Courier,monospace;">pixels</span> array come from?<br />
A: That's our pixel buffer member variable. It won't be allocated until later. The construction with <span style="font-family: "Courier New",Courier,monospace;">setjmp()</span> is turning the code inside out.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
We are just getting started here. Let's gather some basic information: image height, width, and such.</div>
<blockquote class="tr_bq">
// pass open file to png struct<br />
png_init_io(png, f);<br />
<br />
// skip signature bytes (we already read those)<br />
png_set_sig_bytes(png, sizeof(sig));<br />
<br />
// get image information<br />
png_read_info(png, info);<br />
<br />
w = png_get_image_width(png, info);<br />
h = png_get_image_height(png, info);</blockquote>
<br />
<div style="text-align: justify;">
We're only interested in RGB and RGBA data, so other bit depths and color types are converted. So even if the image is in grayscale or whatever, we will get RGB data when reading the image. It's nice that libpng can do these conversions on the fly.</div>
<blockquote class="tr_bq">
// set least one byte per channel<br />
if (png_get_bit_depth(png, info) < 8) {<br />
png_set_packing(png);<br />
}</blockquote>
<br />
<div style="text-align: justify;">
PNG may have a separate transparency chunk. Convert it to alpha channel.</div>
<blockquote class="tr_bq">
// if transparency, convert it to alpha<br />
if (png_get_valid(png, info, PNG_INFO_tRNS)) {<br />
png_set_tRNS_to_alpha(png);<br />
}</blockquote>
<br />
<div style="text-align: justify;">
Next, get the color type. The <span style="font-family: "Courier New",Courier,monospace;">format</span> variable is meant as a parameter for OpenGL texturing. If the format can not be determined, give up and return an error.</div>
<blockquote class="tr_bq">
switch(png_get_color_type(png, info)) {<br />
case PNG_COLOR_TYPE_GRAY:<br />
format = GL_RGB;<br />
png_set_gray_to_rgb(png);<br />
break;<br />
<br />
case PNG_COLOR_TYPE_GRAY_ALPHA:<br />
format = GL_RGBA;<br />
png_set_gray_to_rgb(png);<br />
break;<br />
<br />
case PNG_COLOR_TYPE_PALETTE:<br />
format = GL_RGB;<br />
png_set_expand(png);<br />
break;<br />
<br />
case PNG_COLOR_TYPE_RGB:<br />
format = GL_RGB;<br />
break;<br />
<br />
case PNG_COLOR_TYPE_RGBA:<br />
format = GL_RGBA;<br />
break;<br />
<br />
default:<br />
format = -1;<br />
}<br />
if (format == -1) {<br />
png_destroy_read_struct(&png, &info, nullptr);<br />
fclose(f);<br />
return false;<br />
}</blockquote>
<br />
<div style="text-align: justify;">
This is how to get the number of bytes per pixel, regardless of channels and bit depth:</div>
<blockquote class="tr_bq">
bpp = (uint)png_get_rowbytes(png, info) / w;</blockquote>
<br />
<div style="text-align: justify;">
<span style="color: #666666;">[That <span style="font-family: "Courier New",Courier,monospace;">bpp</span> variable is <i>bytes</i> per pixel, not bits per pixel].</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
We don't want to receive any interlaced image data. Even if the image is interlaced, we want to have the ‘final’ image data. There's a func for that:</div>
<blockquote class="tr_bq">
png_set_interlace_handling(png);</blockquote>
<br />
<div style="text-align: justify;">
We are done reading the header (info) chunk. You need to tell libpng you want to get ahead and skip the rest of the chunk.</div>
<blockquote class="tr_bq">
png_read_update_info(png, info);</blockquote>
<br />
<div style="text-align: justify;">
Now we are ready to go read the image data. We deferred allocating the pixel buffer up till now (this is how the manual says you should do it).</div>
<div style="text-align: justify;">
Annoyingly enough, <span style="font-family: "Courier New",Courier,monospace;">png_read_image()</span> expects an array of pointers to store rows of pixel data. So you need to construct both a pixel buffer and a temporary array of pointers to the rows in that pixel buffer. And then you can read the image data.</div>
<blockquote class="tr_bq">
// allocate pixel buffer<br />
pixels = new unsigned char[w * h * bpp];<br />
<br />
// setup array with row pointers into pixel buffer<br />
png_bytep rows[h];<br />
unsigned char *p = pixels;<br />
for(int i = 0; i < h; i++) {<br />
rows[i] = p;<br />
p += w * bpp;<br />
}<br />
<br />
// read all rows (data goes into 'pixels' buffer)<br />
// Note that any decoding errors will jump to the<br />
// setjmp point and eventually return false<br />
png_read_image(png, rows);</blockquote>
<br />
<div style="text-align: justify;">
You actually need to read the end of the PNG data. This will read the final chunk in the file, and check that the file is not corrupted.</div>
<blockquote class="tr_bq">
png_read_end(png, nullptr);</blockquote>
<br />
<div style="text-align: justify;">
Finally, clean up and return true, indicating success.</div>
<blockquote class="tr_bq">
png_destroy_read_struct(&png, &info, nullptr);<br />
fclose(f);<br />
return true;</blockquote>
<br />
<div style="text-align: justify;">
Well ... that was a little odd, but we managed to load some pixels. Not included in this tutorial: we can pass the pixel buffer on to OpenGL to create textures.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #b45f06;"><b>Opinion or plain justified criticism</b></span></div>
<div style="text-align: justify;">
libpng provides tons of functionality for working with PNG images. Be that as it may, there is a lot not to like here. For one thing, loading an image is a ridiculous, unwieldy process. Conspicuously absent is a simple and stupid <span style="font-family: "Courier New",Courier,monospace;">png_load()</span> function that does all that listed above, now left as an exercise by the developers of libpng.</div>
<div style="text-align: justify;">
All I wanted to do was load an image, but libpng required me to learn everything there is to know about the file format, the library, and then some.</div>
<div style="text-align: justify;">
At any rate, do the dance and it will work. If it's too much for you, there's always SDL_Image.</div>
<div style="text-align: justify;">
<br /></div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-45333265908536624372015-04-27T06:10:00.001+02:002015-04-27T06:54:28.126+02:00Fast Pseudo Random Number Generator<div style="text-align: justify;">
<img border="0" src="http://4.bp.blogspot.com/-VyrzQvxcr7s/SSltl2KM2eI/AAAAAAAAAE4/V4Vi3p9Zkmw/s1600/routing_bitmask.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" />Games need a good pseudo random number generator (PRNG). Flipping cards, rolling dice, determining hit points, choosing whether to fire a bullet or not, all these are random choices. Random is a strange phenomenon though. The set of numbers 1,1,1,1 is just as random as 4,2,1,3 — even though I carefully hand-picked both sequences, the latter looks more random. A shuffled list of numbers is perceived as being more random than a list of truly random numbers.<br />
<br />
It is a well-known fact (or actually, myth) that computers do not do random numbers; instead they are computed and therefore they are predictable. This was true until computer scientists found that real-world noise like typing speed, mouse movement, networking interrupts, and hard drive rotation statistics may be used for entropy: a source for random number generation. This is only a source though and the produced random number sequences may still be predictable depending on the algorithm used. Cryptographically safe RNGs aim to make it impossible to guess the next random number. There is a kind of chicken-and-egg relation here: crypto functions need good RNGs to be able to scramble the input; the scrambled output appears as random as any random data. Therefore the crypto function can be used as a RNG.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Anyway, there are some problems when it comes to games. Fast-paced action games may make hundreds of random choices every frame. At 60 frames per second, we don't want to take too much time generating random numbers. Therefore cryptographically safe RNGs are off limits, we are going to have to settle with a more predictable PRNG.
</div>
<div style="text-align: justify;">
Another problem is that at 60 fps, we might quickly drain the entropy pool. We can't afford having to wait for new entropy, and we don't want the operating system to run out. So even though modern operating systems can supply fairly truthful random numbers, we are not going to bother and stick with “decent enough” instead; we are going at it old-skool and calculate the pseudo random numbers by ourselves.
</div>
<br />
PRNG requirements for games:<br />
<ul>
<li>it must be fast</li>
<li>it must show good distribution</li>
<li>not repeating numbers often</li>
<li>no cryptographically safeness required</li>
</ul>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In them old days, we would use <span style="font-family: "Courier New",Courier,monospace;">rand()</span> and seed it by calling <span style="font-family: "Courier New",Courier,monospace;">srand()</span>. <span style="font-family: "Courier New",Courier,monospace;">rand()</span> is a notoriously bad PRNG and you should not use it nowadays. The BSD man page even calls it “bad random number generator”. It has a low period, meaning that after a while the same prefix starts to repeat. This is a bad habit for a PRNG.</div>
<div style="text-align: justify;">
I personally have bad experiences with <span style="font-family: "Courier New",Courier,monospace;">rand()</span>. My Tetris clone would happily generate the same block over and over again, giving some really bad gameplay experiences. I started handing out bonuses for this, but it didn't make it more <i>fun</i>. The game notably improved when I switched to a different PRNG.</div>
<div style="text-align: justify;">
The modern, new and improved equivalent of <span style="font-family: "Courier New",Courier,monospace;">rand()</span> is <span style="font-family: "Courier New",Courier,monospace;">random()</span>. It should be noted however that there might be a portability issue here; <span style="font-family: "Courier New",Courier,monospace;">random()</span> may not be available on your platform of choice.<br />
<br /></div>
<div style="text-align: justify;">
<span style="color: #b45f06;"><b>Mersenne Twister</b></span><br />
A popular modern PRNG is the Mersenne Twister. It is named after the Mersenne prime number that it uses. It is a well respected PRNG, and an easy choice for C++ programmers, as it is included in the C++11 standard library. The C++11 way to get some random numbers is:</div>
<blockquote class="tr_bq">
#include <random><br />
<br />
std::random_device rd; // only used once for seeding<br />
std::mt19937 rng(rd()); // select Mersenne Twister<br />
std::uniform_int_distribution<int> dist(0, 1000);<br />
int n = dist(rng);</blockquote>
The Mersenne Twister is relatively slow for game code. Apparently it badly stresses the CPU cache.<br />
<br />
<span style="color: #b45f06;"><b>XorShift</b></span><br />
<div style="text-align: justify;">
The most simple and fastest decent PRNG there is, is the XorShift algorithm. It simply does a couple of bit-shift and exclusive-OR operations, and that's it. It has four internal state variables that you should initialize (in a more or less random manner, as you see fit).</div>
<blockquote class="tr_bq">
uint32_t xorshift128(void) {<br />
// algo taken from wikipedia page<br />
uint32_t t = state.x ^ (state.x << 11);<br />
state.x = state.y;<br />
state.y = state.z;<br />
state.z = state.w;<br />
state.w = state.w ^ (state.w >> 19) ^ t ^ (t >> 8);<br />
return state.w;<br />
}</blockquote>
<div style="text-align: justify;">
XorShift is incredibly simple, but good enough for games. In that respect, it is a perfect fit for our purposes. There are XorShift* and XorShift+ variants, that use multiplication and addition respectively, and produce 64 bits values.</div>
<br />
<div style="text-align: justify;">
<span style="color: #b45f06;"><b>Statistical Analysis</b></span></div>
<div style="text-align: justify;">
In a feeble attempt to prove that XorShift suffices, I calculated the standard deviation for a series of numbers between 0 and 1000 generated by <span style="font-family: "Courier New",Courier,monospace;">rand()</span>, <span style="font-family: "Courier New",Courier,monospace;">random()</span>, <span style="font-family: "Courier New",Courier,monospace;">std::mt19937</span>, and <span style="font-family: "Courier New",Courier,monospace;">xorshift128()</span>. They all produce a standard deviation around 285, meaning that they all exhibit the same fairly high spread of values. In that sense, we can say that probably they all produce sets of random values usable for games. This is by no means an extensive analysis of the PRNGs, of course. There exists a test suite named TESTU01 for testing PRNGs, but I'll leave it up to the math and computer scientists to play around with that.</div>
<br />
<div style="text-align: justify;">
For now, I'm happy to have found that the XorShift PRNG is exceptionally fast, while still producing decently enough pseudo random numbers. It is a perfect fit for games.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Two interesting links for further reading:</div>
<ul>
<li><a href="http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx" target="_blank">Eternally Confuzzled about <span style="font-family: "Courier New",Courier,monospace;">rand()</span></a></li>
<li><a href="http://www.drdobbs.com/quick-and-portable-random-number-generat/184403024" target="_blank">Dr.Dobb's on Lehmer numbers</a></li>
</ul>
<br />Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-10499001194548128412015-03-08T13:37:00.000+01:002015-03-08T13:53:50.612+01:00Wrapping Worlds<div class="p1" style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://1.bp.blogspot.com/-UGddqZyccug/VEvK7djLh4I/AAAAAAAADe8/jixVpO0YeAI/s1600/escher_sphere.png" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<span class="s1">In the classic game <i>Pac-Man</i>, there is a tunnel in the middle of the level, </span><span class="s1">when you go in on one side, you emerge on the other side of the screen. </span><span class="s1">It plays an important part in gameplay, as you can use it to evade the </span>fast red ghost.</div>
<div class="p1" style="text-align: justify;">
<span class="s1">In the classic <i>Defender</i>, if you fly your spaceship far enough to the right </span><span class="s1">(or left, as you can go both ways), space will wrap around and you will </span>eventually return to the same point. The game world of <i>Defender</i> is cylindrical.</div>
<div class="p2" style="text-align: justify;">
<span class="s1"></span><br /></div>
<div class="p1" style="text-align: justify;">
<span class="s1">You will find this technique of wrapping worlds a lot in classic 2D games. </span><span class="s1">It doesn't work well for 3D games, because of their level of realism and </span><span class="s1">immersiveness, it feels weird if a 3D world wraps. <span style="color: #666666;">[The irony is that the </span></span><span class="s1"><span style="color: #666666;">real world does seem to wrap around, as observed by a traveller on the ground]</span>. </span><span class="s1">On the other hand, I always I feel weird when reaching the edge of the obviously </span>square world in 3D games.</div>
<div class="p1" style="text-align: justify;">
<span class="s1">Anyway, wrapping worlds are a lot of fun for 2D retro style games, and </span><span class="s1">I couldn't find much information on the net how to go about and implement such a thing. </span>So I devised a couple of methods and wrote them down here.</div>
<div class="p2" style="text-align: justify;">
<span class="s1"></span><br /></div>
<div class="p1">
<span class="s1"><b><span style="color: #b45f06;">1. The straightforward way</span></b></span></div>
<div class="p1" style="text-align: justify;">
<span class="s1">Wrapping is simply achieved by using the modulo operator. </span><span class="s1">If the player moves off the map, her new position is mod the world width. </span><span class="s1">Consequently, when a player moves off the map on the right, she re-emerges </span>on the left. That was easy.</div>
<div class="p1" style="text-align: justify;">
<a href="http://4.bp.blogspot.com/-LBfzmUE1FX4/VPw56j65oRI/AAAAAAAADgw/fyBSf5o5Bz4/s1600/modulo.png" imageanchor="1" style="clear: right; display: inline !important; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="http://4.bp.blogspot.com/-LBfzmUE1FX4/VPw56j65oRI/AAAAAAAADgw/fyBSf5o5Bz4/s1600/modulo.png" /></a><span class="s1">But not so fast. A monster chasing the player will effectively see the player </span><span class="s1">being teleported from the right side of the map, to the far left side of the map. </span><span class="s1">The monster will now change direction, and has to travel all the way across </span><span class="s1">the map in order to get near the player again. </span><span class="s1">So to fix this, you might set a flag that a monster is not allowed to turn </span>if ... But there is more. <span class="s1">A similar problem exists for monsters that live in the left side of the map. </span><span class="s1">They will not see that the player is actually close nearby, until the player </span>is teleported in over the right edge of the map. By now it's becoming clear that wrapping is not as easy as it looks.</div>
<div class="p2" style="text-align: justify;">
<span class="s1"></span><br /></div>
<div class="p2">
</div>
<div class="p1">
<span class="s1"><b><span style="color: #b45f06;">2. Overscan areas</span></b></span></div>
<div class="p2" style="text-align: justify;">
<span class="s1">Rather than having the map just be the map, allow the player (and everything else) </span><span class="s1">to exist in a boundary outside the world. The boundary area is like an overscan area.</span><br />
<a href="http://1.bp.blogspot.com/-CBO-rVgHXlA/VPw6AkZIq1I/AAAAAAAADhI/YytGIlhLW0c/s1600/overscan.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="http://1.bp.blogspot.com/-CBO-rVgHXlA/VPw6AkZIq1I/AAAAAAAADhI/YytGIlhLW0c/s1600/overscan.png" /></a><span class="s1">This region is a copy of the other side of the world. Once the player ventures </span>too far, teleport everything in the overscan area to the other side.</div>
<div class="p2" style="text-align: justify;">
<span class="s1">The overscan area should at least be a screen wide to work properly. </span>Maps should also be at least two screens wide. <span class="s1">This method is relatively costly on memory because it duplicates large parts of </span>the map.</div>
<div class="p1" style="text-align: justify;">
<span class="s1"></span><br /></div>
<div class="p2">
<span class="s1"><b><span style="color: #b45f06;">3. Far offset</span></b></span></div>
<div class="p1" style="text-align: justify;">
<span class="s1">The previous method still has a problem with the direction of monsters that are </span><span class="s1">outside the screen.</span><br />
<a href="http://4.bp.blogspot.com/-V-5jcj0dy1s/VPw6AvUnKwI/AAAAAAAADg4/F4inOtu4sV8/s1600/far_offset.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="http://4.bp.blogspot.com/-V-5jcj0dy1s/VPw6AvUnKwI/AAAAAAAADg4/F4inOtu4sV8/s1600/far_offset.png" /></a><span class="s1">A way to solve this is to have the game take place at a far offset. </span><span class="s1">The problem with wrapping is partly because we're too close to zero. By translating </span><span class="s1">the origin to, say, 100,000, we're largely preventing the problem of monsters heading </span><span class="s1">the wrong way. It's mostly a band-aid though as the situation still does exist. Eventually the player may reach the zero line, at which everything needs to be rebased at the far offset again. It is </span><span class="s1">unlikely that any player will venture so far that she wrapped around the </span>world a dozen times, but nevertheless possible.<br />
<span class="s1"></span></div>
<div class="p1">
<span class="s1">
</span></div>
<div class="p1" style="text-align: justify;">
<span class="s1"></span><br /></div>
<div class="p1">
<span class="s1"><b><span style="color: #b45f06;">4. Sliding window</span></b></span></div>
<div class="p2" style="text-align: justify;">
<span class="s1">Let the area around the player be a sliding window that glides over the map. </span><span class="s1">The area should at least be what's visible on screen, but may be larger to allow </span>off-screen monsters to come in and attack.<span class="s1"></span></div>
<div class="p1" style="text-align: justify;">
<a href="http://3.bp.blogspot.com/-wMngFSA39rc/VPw6Aqiu6-I/AAAAAAAADg8/3NpPLIhXbDk/s1600/sliding_window.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="http://3.bp.blogspot.com/-wMngFSA39rc/VPw6Aqiu6-I/AAAAAAAADg8/3NpPLIhXbDk/s1600/sliding_window.png" /></a><span class="s1">The sliding window uses ‘stripping’ to phase in and phase out relevant parts </span><span class="s1">of the map. For example, as the window slides to the right, a vertical strip </span><span class="s1">slides into the window on the right, and a vertical strip slides out of view on the left. </span><span class="s1">This works particularly well for tiled maps. Monsters can be frozen into the tile map, </span>and come to life as soon as they come into view. <span class="s1">The coordinates of alive objects are relative to the sliding window (they are no longer in </span>‘map’ space) and therefore there is no directional problem when the world wraps.</div>
<div class="p2" style="text-align: justify;">
<br />
<span class="s1"></span></div>
<div class="p1" style="text-align: justify;">
<span class="s1">I implemented each of these and they all work. What seems to me the best method however is </span><span class="s1">the sliding window technique. Even for games that are not tile based at all, you may </span><span class="s1">want to consider adding a tile grid anyway. The sliding window technique makes a clear </span><span class="s1">distinction between objects that are alive, and objects that are not worth spending time on. </span><span class="s1">You can enjoy a massive world with millions of objects without breaking a sweat. </span>And what's fun, the world wraps.</div>
<div class="p1">
</div>
<div class="p2" style="text-align: justify;">
<span class="s1"></span><br /></div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-24719072768426346152015-02-11T22:05:00.000+01:002015-02-11T22:05:44.518+01:00Rock solid frame rates with SDL2<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://1.bp.blogspot.com/-UlAqZS--hYU/SCIALEeDLII/AAAAAAAAAB0/yS2qlEy0WbU/s1600/SDL_logo.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
For fast-paced action games you need to have a steady framerate. If the framerate is not steady, you are likely to see stutter. It's a periodic wobble, jerky movement, as if a frame was skipped. It's very annoying, and once you see it, you can't unsee it. I ran into this problem and for a moment there, I just blamed SDL, heh! In forums online, I found more people complaining about stutter with SDL. The accepted answer on StackOverflow for this problem was wrong. Game dev forums did turn up some good tips. Finally, I was able to piece it together. Apparently this topic is not well understood by many—and the ones who do get it right don't seem to get just what the problem is. It has indeed to do with frame rate, but it's not skipping. Something else is up.</div>
<br />
<b><span style="color: #b45f06;">Loop the loop</span></b><br />
<div style="text-align: justify;">
Games run in a loop. You update some monsters, do the rendering, and cap the frame rate by delaying until it's time to display the next frame. It's likely to be programmed something like this:</div>
<blockquote>
move_monsters();<br />
render();<br />
<br />
float freq = SDL_GetPerformanceFrequency();<br />
<br />
now = SDL_GetPerformanceCounter();<br />
secs = (now - last_time) / freq;<br />
SDL_Delay(FRAMERATE - secs * 1000);<br />
<br />
swap_buffers();</blockquote>
<div style="text-align: justify;">
<span style="color: #666666;">NB. You might also use <span style="font-family: "Courier New",Courier,monospace;">SDL_GetTicks()</span>. SDL 2 offers new performance counter functions. These have much greater precision than <span style="font-family: "Courier New",Courier,monospace;">SDL_GetTicks()</span>.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
We've subtracted the time that the game was busy, so to complete a frame we need to wait the remaining time. This is exactly where the bug is, odd as it may seem. To prove it, try timing <span style="font-family: "Courier New",Courier,monospace;">SDL_Delay()</span> calls in a loop:</div>
<blockquote>
float freq = SDL_GetPerformanceFrequency();<br />
<br />
t0 = SDL_GetPerformanceCounter();<br />
SDL_Delay(30);<br />
t1 = SDL_GetPerformanceCounter();<br />
<br />
printf("time slept: %.2f msecs", (t1 - t0) / freq * 1000.0f);</blockquote>
Result:<br />
<blockquote class="tr_bq">
time slept: 31.51 msecs<br />
time slept: 31.31 msecs<br />
time slept: 34.06 msecs<br />
time slept: 32.15 msecs<br />
time slept: 34.54 msecs<br />
time slept: 33.07 msecs</blockquote>
<div style="text-align: justify;">
But we asked to sleep only 30 milliseconds! How can this be?</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The problem is not so much with SDL as with the operating system. This can be proved by replacing <span style="font-family: "Courier New",Courier,monospace;">SDL_Delay()</span> with <span style="font-family: "Courier New",Courier,monospace;">usleep()</span>, and the problem persists. Even if you try the insanely precise <span style="font-family: "Courier New",Courier,monospace;">nanosleep()</span> call, the problem persists. Sleeping is inaccurate—despite the fact that modern computers are very capable of doing high precision timing.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Some try fixing this by sleeping a little less, and then doing a busy loop to get as close to the sleep time as possible. This almost works, but it it isn't perfect and you will still see stutter. The non-deterministic nature of a multitasking operating system gets in our way.</div>
<br />
<div style="text-align: justify;">
<b><span style="color: #b45f06;">Thou shall wait</span></b></div>
<div style="text-align: justify;">
So, the system is incapable of sleeping an exact amount of time. How about not sleeping at all.</div>
<div style="text-align: justify;">
The trick of sleeping until the next frame is probably a spin-off of the waiting for vsync trick. But sleeping isn't the same thing, and it doesn't work because it doesn't necessarily bring you in sync with anything. So instead of sleeping, we are going to wait for vsync.</div>
<blockquote>
sdl_window = SDL_CreateWindow(title, x, y, w, h, SDL_WINDOW_OPENGL);<br />
gl_context = SDL_GL_CreateContext(sdl_window);<br />
SDL_SetSwapInterval(1);</blockquote>
<div style="text-align: justify;">
This is for use with OpenGL. It will enable the swap interval (wait for vsync) before swapping the render buffers. You can remove any <span style="font-family: "Courier New",Courier,monospace;">SDL_Delay()</span> calls, because during the swap interval the program already sleeps. If you are using the SDL render functions, it's done like this:</div>
<blockquote class="tr_bq">
renderer = SDL_CreateRenderer(win, -1,<br />
SDL_RENDERER_ACCELERATED|SDL_RENDERER_PRESENTVSYNC);</blockquote>
<div style="text-align: justify;">
Now <span style="font-family: "Courier New",Courier,monospace;">SDL_RenderPresent()</span> will wait for vsync before presenting the buffer.</div>
<br />
<div style="text-align: justify;">
<b><span style="color: #b45f06;">Moving on</span></b></div>
<div style="text-align: justify;">
That's only half of the story. The problem of stutter is not solely caused by an inaccurate sleep timer. It stutters because movement is too rigid:</div>
<blockquote>
new_pos = old_pos + speed;</blockquote>
<div style="text-align: justify;">
This sort of code <i>always</i> advances the position, no matter whether time advanced evenly or not. So in the case where time did not advance perfectly in line with the desired rate (which is impossible, as we saw earlier), this code will produce stutter.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
To solve this problem, we must write code that can handle variable timings. It works just like in physics class:</div>
<blockquote>
float delta_time = (now - t0) * (float)freq;<br />
<br />
new_pos = old_pos + speed * delta_time;</blockquote>
<div style="text-align: justify;">
With <span style="font-family: "Courier New",Courier,monospace;">SDL_GetPerformanceCounter()</span> and <span style="font-family: "Courier New",Courier,monospace;">SDL_GetPerformanceFrequency()</span> you can calculate the time step for this frame in seconds. Note that <span style="font-family: "Courier New",Courier,monospace;">delta_time</span> holds a small value, so crank up the speed. If your game world is defined in meters, you can simulate things rather faithfully with speed in meters per second.</div>
<br />
<div style="text-align: justify;">
<b><span style="color: #b45f06;">At last</span></b></div>
<div style="text-align: justify;">
Lo and behold, our frame rate is now rock steady and scrolling is super smooth. It's locked down at 60 fps, with very low CPU usage, even on my five year old computer. Like with Columbus' egg, it's easy once you know how.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
My advice for testing is to create a large checkerboard and scrolling that across the screen. If it stutters, you will see it. Also try disabling the vsync just for fun, and watch the fps counter go nuts.</div>
<br />Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-61669298671460612072015-01-11T17:09:00.002+01:002015-01-11T17:09:56.454+01:00Memory Management: the SLUB allocator<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://2.bp.blogspot.com/-Y5eDrp9ZcTU/TqQFZypLRiI/AAAAAAAADFQ/xjOCBc_xN0g/s1600/exception1.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
Last time I wrote about the SLAB allocator, a method for caching objects in a pool. And then I read about SLUB, the slab allocator that is present in modern versions of the Linux kernel, and got a bit carried away with it. SLUB is brilliant due to its simplicity, and like the Linux kernel, by now I have thrown out my SLAB code in favor of SLUB. That's how it goes with the software lifecycle; out with the old, in with the new. Descriptions of how these allocators work often either do not go into enough detail, or are way too cryptic to understand, so I'll do my best to shed some light.<br />
<br />
Pool allocators basically cache objects of the same size in an array. Because we are talking low-level memory allocators here, the absolute size of the array is always going to be a single page. A single page is 4 kilobytes. <span style="color: #666666;">[Yes, I know hugepages can be 2 megabytes, and sometimes even a gigabyte, but for the sake of simplicity we'll work with regular 4k pages here]</span>.<br />
To keep track of what array slots are in use and what slots are free, they keep a free list. The free list is just a list of indices saying which slots are free. It is metadata that has to be kept somewhere. The difference between SLAB and SLUB is the way they deal with that metadata (and boy, what a difference it makes).<br />
<br />
In SLAB, the freelist is kept at the beginning of the memory page. The size of the freelist depends on the number of objects that can fit in the page. When the objects are small, more objects can fit into the page and the freelist will be larger than when the object size is large, not as many objects can fit in the page.<br />
SLUB is much more clever than this. SLUB decides to keep the metadata elsewhere, and uses the full page for object storage. The result is that SLUB can do something that SLAB can not: it can store four objects of 1k each in a 4k page, or two objects of 2k, or one object of 4k in size. This is something that SLAB could never do.</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-dC2hPVG1YHA/VLKQxJAet2I/AAAAAAAADfk/yjSd2G-dJjs/s1600/slub.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-dC2hPVG1YHA/VLKQxJAet2I/AAAAAAAADfk/yjSd2G-dJjs/s1600/slub.png" height="172" width="320" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
But what about the metadata? SLUB puts the free pointers <i>inside</i> the memory slots. The objects are free, so this memory is available for us to use.<br />
The other metadata (like, where is the first free object? What is my object size?) is kept in a Page struct. All pages in memory are tracked by an array of Page struct. Beware the confusion here, a Page struct is not the actual memory page, it just represents the metadata for that memory page.<br />
<br />
All we did was take that little bit of metadata from each slab, and store it elsewhere. The net result is that we can make use of memory much more effectively than before.<br />
<br />
In our last implementation we used C++ templates to make things type safe. While this makes sense from a C++ programmer's perspective, it doesn't make any sense from a memory allocator's perspective. The memory allocator has bins for <i>sizes</i>, not for <i>types</i>. This alone is reason enough not to code it using templates. Stroustrup himself said that object allocation and object creation are separate, and there's no arguing with that.<br />
Therefore the SLUB is implemented as a plain C code, and it turns out very straightforward and easy to understand. You have 4k of memory, the number of objects you can store equals 4k divided by <span style="font-family: "Courier New",Courier,monospace;">object_size</span>. The first free object slot is <span style="font-family: "Courier New",Courier,monospace;">page_struct.free_index</span>. When you allocate an object, return the address of <span style="font-family: "Courier New",Courier,monospace;">slot[page_struct.free_index]</span>. Update the <span style="font-family: "Courier New",Courier,monospace;">free_index</span> and zero the object's memory, of course. And that's all there is to it, really.<br />
<br />
The engineers at Sun actually took into account CPU cachelines and had slab queues per CPU. This may have been a good idea in 1992 but becomes kind of crazy when you realize that today's supercomputers comprise tens (if not hundreds) of thousands processor cores.<br />
SLAB also needs to take into account object alignment, which is automatic in SLUB as long as the object size is a multiple of the CPU's word size, and this is already taken care of by the compiler when it packs and pads a struct. At the end of the day, SLAB is unnecessarily complicated, over-engineered. SLUB says: <span style="color: red;">K</span>eep <span style="color: red;">I</span>t <span style="color: red;">S</span>imple, <span style="color: red;">S</span>tupid!</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b><span style="color: #b45f06;">Caches</span></b><br />
When the slab runs out of free memory slots, you will want to get a new page and initialize it as a new slab. Likewise, when a slab is entirely free, you will want to release the page. This is trivially implemented using <span style="font-family: "Courier New",Courier,monospace;">malloc()</span> and <span style="font-family: "Courier New",Courier,monospace;">free()</span>. It's more fun however to actually cache these pages in a linked list. The pointers for the linked list can again be members of Page struct, it's metadata after all.</div>
<div style="text-align: justify;">
The Linux kernel optimizes for speed by keeping linked lists of full, partially full, and empty pages. When an object is allocated or freed, it always looks at the partially full list first. If after allocation the page is now full, the page is moved to the full list. If after freeing an object the page is now empty, the page is moved to the empty list.<br />
<br />
You may wonder why it holds on to empty pages rather than releasing them immediately. The answer is that getting a new page may well take a long time. So, the empty list is an insurance policy that the cache can always quickly get a page.<br />
If the cache is empty, it will have to get a new page from an other allocator. The Linux kernel implements the buddy allocator for managing blocks of pages. Implementing buddy allocator code is a bit complicated (and fun), I might write about that some other time.<br />
Anyway, slab allocators are fascinating, and it's nice to know how they work. You can't talk about SLAB without talking about SLUB, and noting that simplicity beats complexity.</div>
<br />Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-85678404174641974142014-12-21T23:42:00.001+01:002014-12-22T18:42:11.932+01:00Memory Management: Slab allocator<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://2.bp.blogspot.com/-Y5eDrp9ZcTU/TqQFZypLRiI/AAAAAAAADFQ/xjOCBc_xN0g/s1600/exception1.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
Last time we saw we could speed up memory allocations dramatically by using a pool allocator. I had a nagging feeling though. My pool allocator uses a <span style="font-family: "Courier New",Courier,monospace;">std::vector</span>, and it's doing all kinds of things behind the scenes. Resizing on the fly is cheating and undesired. Another thing that's certainly not right is that we're manually calling the destructor, while the object is alive and well, and present in the vector. This ultimately leads to the double destruction of objects once the pool is destructed; the destructor of the pool destructs the vector, which in turn destructs all objects it holds—effectively destructing objects <i>twice</i>. Double destruction is not a problem when your destructors are coded neatly, but it is something that normally is not supposed to happen. That's a big <span style="font-family: "Courier New",Courier,monospace;">FIXME</span> remark that needs to be addressed.<br /><br />
</div>
<div style="text-align: justify;">
As a proposed fix for the problem, let's do away with <span style="font-family: "Courier New",Courier,monospace;">std::vector</span> altogether. Let's change the vector into a fixed size array. Upon initialization, the space for the array is preallocated, but no objects are created, no constructors are being run. When an object is allocated from the array, its constructor is invoked via placement <span style="font-family: "Courier New",Courier,monospace;">new</span>. When the (fixed size) array runs out space, allocate a new ‘subpool’ and chainlink them together as a linked list.<br />
When an object is freed, you can look at the address and work out which array slot it came from. When all is well, manually call the destructor. If that pointer came from somewhere else, generate a fatal error.<br /><br />
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-kD4Uv6p3YxM/VJc2Pz8DiPI/AAAAAAAADfQ/Dwe_Fa2D5Do/s1600/linked_arrays.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-kD4Uv6p3YxM/VJc2Pz8DiPI/AAAAAAAADfQ/Dwe_Fa2D5Do/s1600/linked_arrays.png" height="131" width="320" /></a>
<br />
<br />
</div>
<div style="text-align: justify;">
This is a <i>slab</i> allocator. There is an opportunity here to implement garbage collection, and it's not as difficult as it may seem. All the allocator needs to do is keep track of which slots are allocated, and which slots are free—and it already needs to do that anyway. Once we get that right, garbage collection is simply destroying any objects that are still lingering around when the pool is destructed.<br /><br />
</div>
<div style="text-align: justify;">
In order to track which slots are free, you could keep a small bitmap. If there are eight slots per slab, the bitmap is as small as one single byte. If you choose a single 64-bit <span style="font-family: "Courier New",Courier,monospace;">int</span> for a bitmap, the slab can hold 64 slots. That's a big slab. Personally, I went for sixteen objects in a slab.<br />
Although a bitmap is perfectly okay for tracking free slots, you'll need to run through lots of bit shifting loops for the bookkeeping. The performance can be greatly improved by employing a different technique.<br /><br />
</div>
<div style="text-align: justify;">
I know the Linux kernel internally uses a slab allocator, and therefore I took a peek in the book <i>Understanding the Linux Kernel</i>. It's in there alright, but unfortunately I couldn't make any sense of it. This stuff is hard. Or maybe not so much, I did find a rather good explanation on the web in the documentation on <a href="https://www.kernel.org/doc/gorman/html/understand/understand011.html" target="_blank">kernel.org</a>.<br /><br />
</div>
<div style="text-align: justify;">
What happens in the kernel is that each slab has a freelist, which is an array of integers (since we store only small numbers, these can be bytes rather than 64-bit <span style="font-family: "Courier New",Courier,monospace;">int</span>s). Each entry in the array represents an index to the next free slot. So, the index functions somewhat as a pointer to a list of free objects.<br />
The final entry in the freelist array is -1, marking the end, when the slab runs out of memory. Additionally, there is an integer <span style="font-family: "Courier New",Courier,monospace;">free_slot</span> which denotes the first free slot. Since this is always the index of the first free slot, all the allocator does is access this number when an object is requested. It doesn't even have to search a free block, it already knows which object slot is free.<br />
It's not described exactly in the kernel documentation, but at this point you should mark the freelist slot as being in use. Again, this is important for garbage collection.<br />
<span style="color: #666666;">[In the kernel source the member is actually called <span style="font-family: "Courier New",Courier,monospace;">free</span>, which in my code would inconveniently clash with the <span style="font-family: "Courier New",Courier,monospace;">std::free()</span> function, as well as my <span style="font-family: "Courier New",Courier,monospace;">Slab<T>::free()</span> method. Even though C++ has namespacing I still went with <span style="font-family: "Courier New",Courier,monospace;">free_slot</span> instead.]</span>
<br /><br />
</div>
<div style="text-align: justify;">
For illustration purposes, consider what happens when we have a slab allocator with four object slots. Upon initialisation:<br />
</div>
<div>
<blockquote class="tr_bq">
free_slot: 0<br />
free_list: { 1, 2, 3, END }</blockquote>
</div>
<div>
Now, let's allocate an object.<br />
</div>
<div>
<blockquote class="tr_bq">
free_slot: 1<br />
free_list: { IN_USE, 2, 3, END }</blockquote>
</div>
<div>Let's allocate one more.<br /></div>
<div>
<blockquote class="tr_bq">
free_slot: 2<br />
free_list: { IN_USE, IN_USE, 3, END }</blockquote>
</div>
<div style="text-align: justify;">
When memory is freed, look at the address to find out to which slab it belongs, and which slot it is. The <span style="font-family: "Courier New",Courier,monospace;">free_slot</span> pointer points at the first next free object; insert this number into the freelist slot. This effectively marks the slot as free again. If the slot was not marked as in use, something would be terribly wrong and the program should abort immediately. The <span style="font-family: "Courier New",Courier,monospace;">free_slot</span> pointer now becomes the slot number itself, thus the first free object is the object we just freed. Note how the freelist works in a LIFO fashion.<br /><br />
</div>
<div style="text-align: justify;">
When we free the first allocated object, the result is:<br />
</div>
<div>
<blockquote class="tr_bq">
free_slot: 0<br />
free_list: { 2, IN_USE, 3, END }</blockquote>
</div>
<div style="text-align: justify;">
If we want to garbage collect at this point, we can easily see that the object in slot #1 is still hanging around. We can garbage collect by freeing it. The situation then becomes as follows:<br />
</div>
<div>
<blockquote class="tr_bq">
free_slot: 1<br />
free_list: { 2, 0, 3, END }</blockquote>
</div>
<div style="text-align: justify;">
The freelist now appears shuffled, but that doesn't matter; it operates as a LIFO queue. If we now allocate an object again, it fetches the object in slot #1.<br /><br />
</div>
<div style="text-align: justify;">
Managing the freelist like this is deceivingly simple. This slab allocator greatly outperforms our previous solution.<br />
Combining everything said here, the data structure in code is:<br />
</div>
<div>
<blockquote class="tr_bq">
template <typename T><br />
struct Slab {<br />
Slab<T>* next;<br />
signed short free_slot;<br />
signed char free_list[NUM_OBJECTS];<br />
T objects[NUM_OBJECTS];<br />
};</blockquote>
</div>
<div style="text-align: justify;">
What I initially did here is use fixed size arrays. In the Linux kernel however, the only constant is the size of the memory page for the slab. It calculates how many objects of the desired object size can fit onto that page, taking into account the size of the metadata and necessary alignment. It crams as many objects as possible into a page, but note that the precise amount is different for every type T. For instance, many small objects of type A may fit into a single memory page, while another slab can hold a few large objects of type B.<br /><br />
</div>
<div style="text-align: justify;">
The Linux kernel improves efficiency of the allocator even more by keeping three separate lists to manage slabs:
lists for empty, full, and partially full slabs. On top of that, it keeps caches of slabs, so it always has quick access to them in case of need.<br />
I simply couldn't resist the temptation, so the updated version is:
</div>
<div>
<blockquote class="tr_bq">
template <typename T><br />
struct Slab {<br />
Slab<T>* prev, *next;<br />
unsigned short num_objects, num_free;<br />
unsigned short free_slot;<br />
// compiler may insert padding here<br />
// free_list is directly behind the struct in memory<br />
// objects are behind the free_list in memory<br />
};<br />
<br />
Slab<T>* empty, *full, *partial;</blockquote>
</div>
<div style="text-align: justify;">
<span style="color: #666666;">[Now, <span style="font-family: "Courier New",Courier,monospace;">num_objects</span> isn't really a property of the slab; it's a property of the slab type. So why is it implemented as such? The reason is it's not a straightforward <span style="font-family: "Courier New",Courier,monospace;">constexpr</span> due to alignment, and the freelist having a maximum limit (we're using bytes as slot index, that's only 8 bits). The slab is written as a self-contained object allocator, and therefore it knows how many objects it can contain. In the Linux kernel, each slab also knows its object size. Since we're using C++ templates here, we can simply get away with <span style="font-family: "Courier New",Courier,monospace;">sizeof(T)</span>.]</span><br /><br />
</div>
<div style="text-align: justify;">
And you know what they say, safety first. The metadata is kept inside the slab, as well as the objects that are handed out to the application side of things. A badly behaving application may well overwrite other objects, or worse, our freelist. To guard against this, my slab allocator implements red zoning and object markers; magic numbers are inserted in between, and they are checked every single time.<br />
For game code, the extra checks are <span style="font-family: "Courier New",Courier,monospace;">#ifdef</span>-ed out in a release build, but for other kinds of applications it's probably best to leave them in anyway.<br />
Moreover, be mindful that a memory allocator should always zero the memory: not only when handing out objects, but also when they are freed. If you don't, you will get a massive Heartbleed disaster—the baddest security bug of 2014, the decade, and maybe of all computing history.<br /><br />
</div>
<div style="text-align: justify;">
Happy holidays, and be careful with fireworks on New Year's Eve. You will need those fingers to do some more programming in 2015.<br />
</div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-11521802906654029142014-11-30T12:38:00.000+01:002014-11-30T12:38:13.218+01:00Memory Management: Pool Allocator<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://2.bp.blogspot.com/-Y5eDrp9ZcTU/TqQFZypLRiI/AAAAAAAADFQ/xjOCBc_xN0g/s1600/exception1.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
I want to get back to game programming, so I had another look at the quad-tree. I did a post on quad-trees in August last year. Remember that it's a technique for grouping objects together by chopping up 2D spaces into four quarters. Each interesting quarter is again divided into four parts. This basically goes on until <span style="font-family: "Courier New",Courier,monospace;">max_depth</span> is reached. The purpose of this space partitioning is grouping objects for the sake of collision testing; objects that are near each other may collide and are worth testing for collisions. Objects that are not near each other can not possibly collide, so we skip those when collision testing.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Having lots of moving objects, it's easiest to construct the quad-tree on every frame, or game tick. Now imagine that they are slow moving objects. If the objects are moving slowly, chances are that the quad-tree will look exactly the same at every frame. So why would you spend all that time building up and tearing down the same quad-tree on every frame? Unfortunately, we have to, and trying to bend an existing quad-tree into shape is much more difficult than simply rebuilding it from scratch. But we can optimize in another way.<br />
The process of rebuilding the quad-tree takes a lot of <span style="font-family: "Courier New",Courier,monospace;">malloc()</span> and <span style="font-family: "Courier New",Courier,monospace;">free()</span> calls, and they are not cheap functions. If you want 60 fps, you will want to spend your time better than calling <span style="font-family: "Courier New",Courier,monospace;">malloc()</span> and <span style="font-family: "Courier New",Courier,monospace;">free()</span> all the time.<br />
Rather than trying to reuse the entire quad-tree, let's reuse the timber that it's made of. Let's grab the objects that represent the quad-tree nodes and leafs from a pool allocator.
</div>
<div style="text-align: justify;">
<br />
A pool allocator is simply a stack of pointers to objects in memory. Rather than freeing the memory, you push the object onto a stack. Next, when you need to allocate an object, you pop it from the stack. If the stack runs out of memory, you simply allocate a new object using <span style="font-family: "Courier New",Courier,monospace;">malloc()</span>.<br />
A pool allocator only allocates objects for a specific data type. All the objects in the pool are of the same type and size. By no means try to make it a generic allocator, or you'll be writing you own fragmenting memory manager.<br />
<br />
In C, you might have:
</div>
<blockquote class="tr_bq">
static QuadTree *qt_pool[MAX_QT_POOL];<br />
static int qt_pool_idx = 0;<br />
<br />
QuadTree *qt_alloc(void);<br />
void qt_free(QuadTree *);</blockquote>
<div style="text-align: justify;">
<br />
Things get quite interesting when you move to C++. First of all, C++ has a <span style="font-family: "Courier New",Courier,monospace;">std::vector</span> class that you can use as a dynamic array, meaning that you can grow the pool as large as is necessary without having to worry too much about what the exact limit should be.<br />
Secondly, C++ has <span style="font-family: "Courier New",Courier,monospace;">template</span> classes for generic programming. A templated Pool class can work with all kinds of types: QuadTrees, Monsters, Bullets, Explosions, etcetera.<br />
Thirdly, C++ classes have constructors and destructors. In order to do this right, the pool allocator must properly construct and destruct objects when they are allocated and freed. In C++, <i>placement</i> <span style="font-family: "Courier New",Courier,monospace;">new</span> takes some already allocated memory, and calls the constructor for it. And in case you're wondering, manually calling a destructor requires no special trickery.</div>
<blockquote class="tr_bq">
template <typename T><br />
class Pool {<br />
public:<br />
// insert boilerplate here ...<br />
<br />
T* alloc(void) {<br />
if (v_.empty()) {<br />
// allocate new object<br />
return new T();<br />
}<br />
<br />
// take last object in pool<br />
size_t idx = v_.size() - 1;<br />
<br />
// use placement new to construct<br />
T* t = new (v_[idx]) T();<br />
<br />
// remove item from vector<br />
v_.erase(v_.begin() + idx);<br />
<br />
// return object pointer<br />
return t;<br />
}<br />
<br />
void free(T* t) {<br />
v_.push_back(t);<br />
// destruct t<br />
t->~T();<br />
}<br />
<br />
private:<br />
std::vector<T*> v_;<br />
};</blockquote>
<div style="text-align: justify;">
This code works with raw pointers, and you can get philosophical and argue that these should be <span style="font-family: "Courier New",Courier,monospace;">std::unique_ptr<T></span> objects, that are <span style="font-family: "Courier New",Courier,monospace;">std::move()</span>d around. And with C++14, you can use <span style="font-family: "Courier New",Courier,monospace;">std::make_unique<T>()</span>.<br />
<span style="color: #666666;">[I didn't bother because personally, I'm no fan of <span style="font-family: "Courier New",Courier,monospace;">std::unique_ptr</span>. I think the syntax is horrid, and I wish they would have invented a new keyword instead for strong referencing pointers.]</span><br />
<br /></div>
<div style="text-align: justify;">
Growing the vector means doing implicit memory allocations, something that we were trying to avoid in the first place..! Luckily, this will only happen the first time around; the pool will grow to a certain size and then stay that size as objects start being reused. An optimization would be to reserve chunks of space in advance, so that the vector need not be resized as often.<br />
You can avoid working with pool arrays altogether, by chain linking objects to a free-list. This requires the objects to have a pointer member though (to make the chain link), and this breaks the generic nature of the Pool template class that is shown above. It would be an elegant solution for plain C however.<br />
<br /></div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-75733798869158503332014-10-25T18:26:00.000+02:002014-10-26T12:31:41.465+01:00Round Balls and Flat Disks<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://1.bp.blogspot.com/-UGddqZyccug/VEvK7djLh4I/AAAAAAAADe4/odZqjFJVefU/s1600/escher_sphere.png" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
I used to call <span style="font-family: "Courier New",Courier,monospace;">gluSphere()</span> to render simple planets in 3D space programs. For reasons beyond me, this function has been deprecated. Apparently, all GLU functions have been scrapped in the newer OpenGL standard. I've been told by an expert that OpenGL maintains backward compatibility by using version numbering, meaning that you should still be able to use old OpenGL 2, as long as you don't try to mix it with OpenGL 4 functions. GLU and GLUT are not really part of the OpenGL standard so that's where the story ends, I suppose. The problem is, I found myself being unable to compile older programs anymore after upgrading the operating system — totally unexpected, an unpleasant surprise. Searching for an answer, I found StackOverflow. The answer: “Do the math.”<br />
<br />
I had to scratch my head a little there. I have to admit, it's been a while since I ‘did the math’ on anything, really. A visit to MathWorld turned up some nice formulas. Now, how to put that into code.</div>
<br />
<b><span style="color: #b45f06;">Implementing gluSphere()</span></b><br />
<div style="text-align: justify;">
As you can see in wireframe mode, the gluSphere is built up from quads. Clearly, triangle strips speed up the process. It should be possible to use a single triangle strip, but I didn't bother trying to figure that one out. So I went for stacked triangle strips that wrap around the sphere. This method uses slices and stacks (similar to lines of longitude and latitude) just like <span style="font-family: "Courier New",Courier,monospace;">gluSphere()</span> does.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Anyway, the math. We need to calculate all the vertices for the triangle strips that make up the sphere. A sphere is like rotating a vector 360 degrees around in a horizontal plane (left, right), while rotating a vector 180 degrees up and down in a vertical plane. We'll call the first angle alpha and the second beta. The size of the sphere is determined by radius <span style="font-family: "Courier New",Courier,monospace;">r</span>. Any point (x, y, z) on the sphere is given by:</div>
<blockquote class="tr_bq">
x = r ⋅ cos α ⋅ sin β<br />
y = r ⋅ cos β<br />
z = r ⋅ sin α ⋅ sin β</blockquote>
One caveat is that the angles should be given in radians rather than degrees for the C math library, so use <span style="font-family: "Courier New",Courier,monospace;">M_PI</span>. Another is the OpenGL coordinate system; you may have to work with -z, or like in my case, I had to exchange the Y and Z-axis to get the desired result.<br />
<div style="text-align: justify;">
<br />
Knowing this, calculating all vertices is easy. Setup a double loop that calculates all coordinates, taking into a account the number of stacks and slices that you wish to have.</div>
<blockquote class="tr_bq">
for(beta = 0.0; beta <= M_PI; beta += M_PI/num_stacks) {<br />
for(alpha = 0.0; alpha <= 2.0*M_PI; alpha += 2.0*M_PI/num_slices) {<br />
calculate x, y, z<br />
store in vertex array<br />
}<br />
}</blockquote>
<b><span style="color: #b45f06;">Calculating Sphere Normals</span></b><br />
The get good lighting on the sphere, we need to know the normal vectors. The normal at each vertex is given by <span style="font-family: "Courier New",Courier,monospace;">normalize(x,y,z)</span>. That's easy.<br />
<br />
<b><span style="color: #b45f06;">Calculating Sphere Texture Coordinates</span></b><br />
Having planets without texture is not fun. Texture coordinates run from 0.0 to 1.0.<br />
<blockquote class="tr_bq">
x = alpha / (2.0 * M_PI)<br />
y = (M_PI - beta) / M_PI</blockquote>
<br />
<b><span style="color: #b45f06;">Implementing gluDisk()</span></b><br />
<div style="text-align: justify;">
For the rings of Saturn, it's easy to use a gluDisk.
<span style="font-family: "Courier New",Courier,monospace;">gluDisk()</span> renders a flat disk, with an inner hole in the center. In math terms, the disk uses a circle formula, and there are two radii (one for the outer circle, one for the inner hole). So a gluDisk is really nothing but a single triangle strip that loops around, and its vertices are determined by two circles with radius <span style="font-family: "Courier New",Courier,monospace;">r1</span> and <span style="font-family: "Courier New",Courier,monospace;">r2</span>.<br />
The vertices are given by:</div>
<blockquote class="tr_bq">
// inner vertex<br />
ax = r1 ⋅ cos α<br />
ay = r1 ⋅ sin α<br />
az = 0.0<br />
<br />
// outer vertex<br />
bx = r2 ⋅ cos α<br />
by = r2 ⋅ sin α<br />
bz = 0.0</blockquote>
<b><span style="color: #b45f06;">Calculating Disk Normals</span></b><br />
<div style="text-align: justify;">
The disk is really a 2D object in 3D space. The normals point to +Z, they are all defined as (0, 0, 1).<br />
<br />
<b><span style="color: #b45f06;">Calculating Disk Texture Coordinates</span></b><br />
Texturing a disk is not at all like texturing a sphere. When you put a texture on a gluDisk, it's like sticking a label onto a DVD. You can take a rectangular image and put it onto the circular disk without distorting the image or wrapping it around the object. The texture is mapped to the bounding rectangle of the disk, which is two times the outer radius.</div>
<blockquote class="tr_bq">
x = ax / (2.0 * r2) + 0.5<br />
y = ay / (2.0 * r2) + 0.5</blockquote>
<div style="text-align: justify;">
Do the same for <span style="font-family: "Courier New",Courier,monospace;">bx</span> and <span style="font-family: "Courier New",Courier,monospace;">by</span>. Note the plus one half, you want to be in the center of the texel or else OpenGL may show artifacts.</div>
<br />
<b><span style="color: #b45f06;">Implementing gluCylinder()</span></b><br />
<div style="text-align: justify;">
I did not implement <span style="font-family: "Courier New",Courier,monospace;">gluCylinder()</span>, but with the information given for <span style="font-family: "Courier New",Courier,monospace;">gluDisk()</span> you should be able to pull it off. It's just two circles and an added Z-axis.<br />
<br />
<b><span style="color: #b45f06;">Roundup</span></b><br />
So, that's it. A bit of simple math, but still a sizable amount of work to get spheres going again in OpenGL. It's something that any serious 3D programmer should be able to do, but I still feel a bit bummed that they just deprecated those really useful functions.<br />
<br />
On a final note I'd like to mention that there is another way of creating spheres, one that presumably looks better. Imagine an icosahedron, and subdividing each face into more triangles to enhance resolution. You can subdivide as many times as you like, and at a certain point it will be a good approximation of a sphere. This method is described in the <a href="http://www.glprogramming.com/red/chapter02.html#name8" target="_blank">Red Book</a>.</div>
<br />Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-38283740986441232842014-09-22T13:29:00.000+02:002014-09-22T13:29:37.935+02:00How To strcpy()<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://4.bp.blogspot.com/-YcnKZNgl7MQ/UAKN-Env0RI/AAAAAAAADU8/1q4tVe_Nhmo/s1600/c_flames.png" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
I've written a lot about <span style="font-family: "Courier New",Courier,monospace;">strcpy()</span> already, the tiny but famous C library function for copying strings. There is a lot to say about its (in)security, and ways to improve on that. I didn't realize before that implementing your own <span style="font-family: "Courier New",Courier,monospace;">strcpy()</span> function could be part of a job interview for a programmer position. If I were asked to rewrite <span style="font-family: "Courier New",Courier,monospace;">strcpy()</span>, I would at least put in an <span style="font-family: "Courier New",Courier,monospace;">abort()</span> (for lack of exceptions) when passing in a <span style="font-family: "Courier New",Courier,monospace;">NULL</span> pointer. Moreover, I would add a length parameter for doing bounds checking — albeit that might be considered cheating, because it would change the function declaration; the signature of the function.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Rather than zooming in on this rather old topic, I'm going to be lazy for once and just link to this excellent post that tells the story of the solutions given by job applicants for the problem of writing your own <span style="font-family: "Courier New",Courier,monospace;">strcpy()</span> function. It's a must read, hilarious, but also somewhat alerting.</div>
<br />
<a href="http://davidavraamides.net/blog/2005/05/07/strcpy/" target="_blank">How Do I Copy Thee? Let Me Count the Ways</a> by David Avraamides<br />
<br />Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-26450821171895476412014-08-03T15:46:00.000+02:002014-08-03T15:46:50.120+02:00A Bugs Life<div class="separator" style="clear: both; text-align: justify;">
<img border="0" src="http://3.bp.blogspot.com/-ha4cm-xLii4/UPvgiUGpzWI/AAAAAAAADXQ/rmi9m79paEY/s1600/bug_small.png" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
Software bugs are everywhere. This shouldn't be so bad, except that software is everywhere — from your office computer, to your cell phone, home router, television set, car engine, and the list goes on and on. Heck, even an electric toothbrush has a tiny microcontroller programmed with firmware in it. Most of this software has been tested and tried, but bugs keep popping up nevertheless.</div>
<div style="text-align: justify;">
Last week I kept a bug diary. The dreadful results are described below.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #b45f06;">Saturday, July 26</span> - FaceTime would not connect. After updating iOS the problem was resolved. Of course, “you must always update to the latest version”, but mind you, this problem never existed before. User experience: Everything was working fine, and then all of a sudden it didn't.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #b45f06;">Sunday, July 27</span> - Tried the OS X Yosemite Beta. Okay, it's a beta release. Reported a bug where the GUI was actually messing up the screen. Sadly couldn't take a screenshot, because it would redraw the screen, effectively clearing things up. Luckily, the bug did reproduce every time.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #b45f06;">Monday, July 28</span> - Didn't go near my computer, but today a colleague sent in a bug report for a piece of software of mine. My bad. One of those cases of “this should never happen”. A set of dictionary keys holds only unique items, so how could it be that an item appeared twice?</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #b45f06;">Tuesday, July 29</span> - Encountered trouble trying to build the PCRE regular expression library with gcc 4.9.1 on Linux. Somehow it wouldn't pass confidence tests. Reportedly <i>did</i> work with older gcc; smelled an awful lot like a <i>compiler</i> problem.</div>
<div style="text-align: justify;">
Found a recent online tirade by Linus Torvalds, in which he called gcc, I quote, “pure and utter *shit*.” Turns out the compiler's code generator emits buggy code.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #b45f06;">Wednesday, July 30</span> - The pedestrian lights near the office stayed on “Walk” all day long. Nearly got run over by a BMW, as the driver hit the gas when he got the green light. Could have been caused by a loose wire somewhere, but I like to think it was a software bug.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #b45f06;">Thursday, July 31</span> - Twitter app on iPhone kept crashing upon loading a particular news article. Who knows, maybe it was ad malware trying to infect the system.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #b45f06;">Friday, August 1</span> - At last, a small victory. Fixed an endless loop problem in the RADIUS PAM module. Found some additional related issues, such as by the looks of it is a <strike>broken</strike> <strike>stupid</strike> unusual implementation of our RADIUS server.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Special bonus bug of the week (Linux): Exiting an interactive bash shell and getting “Illegal instruction” on the screen. This is most likely caused by stack corruption.</div>
<div style="text-align: justify;">
<br /></div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-13861617112798709142014-07-06T14:22:00.000+02:002014-07-07T11:53:58.276+02:00Game On<article>
<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://1.bp.blogspot.com/-UlAqZS--hYU/SCIALEeDLII/AAAAAAAAAB0/yS2qlEy0WbU/s1600/SDL_logo.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
I have a PS3 game controller lying around, so I got the idea to add support for it to an old SDL game code. This code was still using SDL 1.2, which is now a thing of the past. So the first thing to do was quickly porting it over to SDL 2, a more modern game programming API. It even includes a GameController API, and because of that, programming support for game controllers with SDL is remarkably easy.<br />
<br />
<span style="color: #b45f06;"><b>Initialising initialisation</b></span><br />
For initialisation, call <span style="font-family: "Courier New",Courier,monospace;">SDL_Init()</span> with extra flag <span style="font-family: "Courier New",Courier,monospace;">SDL_INIT_GAMECONTROLLER</span>. Additionally, you must open the game controller. The game controller is found by first querying the system for the number of connected <i>joysticks</i>.
</div>
<blockquote class="tr_bq">
SDL_GameController *ctrl = NULL;<br />
<br />
SDL_Init(SDL_INIT_VIDEO|SDL_INIT_GAMECONTROLLER);<br />
<br />
int num = SDL_NumJoysticks();<br />
if (num <= 0) {<br />
printf("no joy connected\n");<br />
return false;<br />
}<br />
if (!SDL_IsGameController(0)) {<br />
printf("it's not a game controller\n");<br />
return false;<br />
}<br />
ctrl = SDL_GameControllerOpen(ctrl);<br />
if (ctrl == NULL) {<br />
printf("open failed: %s\n", SDL_GetError());<br />
return false;<br />
}<br />
name = SDL_GameControllerName(ctrl);<br />
printf("game controller: %s\n", name);</blockquote>
This code is just an example. It assumes the first connected joystick is indeed the game controller that the player wants to use. This is of course ridiculous, and you should really loop over all joysticks and open as many connected devices as possible.<br />
<div style="text-align: justify;">
<br />
<span style="color: #b45f06;"><b>Mapping the mappings</b></span><br />
In them old days, supporting game controllers was ... problematic. The thing is, every game controller is different and there is no common standard among manufacturers for numbering buttons and analog sticks. Even a simple D-pad is different between different controllers — think XBox360, PS3, PS4, but even PS2, Logitech, OUYA game controller? SDL 2 solves this incompatibility by mapping the raw device button ids to generic SDL enums. You do need to provide the correct mappings, these can be imported from a flat-file game controller database.</div>
<blockquote class="tr_bq">
SDL_GameControllerAddMappingsFromFile("gamecontrollerdb.txt");<br />
<br />
if (SDL_GameControllerMapping(ctrl) == NULL) {<br />
printf("no mapping for %s\n", name);<br />
SDL_GameControllerClose(ctrl);<br />
ctrl = NULL;<br />
return false;<br />
}</blockquote>
Finally, enable events so that SDL will generate events for us to collect in the main event loop.
<br />
<blockquote class="tr_bq">
SDL_GameControllerEventState(SDL_ENABLE);</blockquote>
<div style="text-align: justify;">
<br />
<span style="color: #b45f06;"><b>Moving forward</b></span><br />
When SDL game controller events are enabled, you will receive events of type <span style="font-family: "Courier New",Courier,monospace;">SDL_CONTROLLERAXISMOTION</span> in the event loop. You will find that SDL sends lots of these events, even when just holding the controller. This is just noise on the sensor, and the analog sticks are quite sensitive. Merely touching already registers movement. The ‘axis values’ are a signed 16-bit value, between -32768 and 32767. The SDL documentation says values between -3200 and 3200 are in the ‘dead zone’ where the sticks should be considered at rest.<br />
<br />
The <span style="font-family: "Courier New",Courier,monospace;">SDL_CONTROLLERAXISMOTION</span> event boils down to this:
</div>
<blockquote class="tr_bq">
event.caxis.which connected joystick number<br />
event.caxis.axis axis number<br />
event.caxis.value amount of movement</blockquote>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-I2n0PPWksXQ/U7k3xc4tuiI/AAAAAAAADa4/rlwmpPbozYA/s1600/dualshock3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-I2n0PPWksXQ/U7k3xc4tuiI/AAAAAAAADa4/rlwmpPbozYA/s1600/dualshock3.png" height="225" width="400" /></a><br />
<span style="font-size: small;"><span style="color: #666666;">Click image to enlarge</span></span>
</div>
<br />
<div style="text-align: justify;">
Analog triggers L2 and R2 register as axis 4 and 5, but they only report value 32767 when pressed, acting like digital buttons. This could be a driver issue (I'm on a Mac) or SDL not properly supporting the triggers. I'm not sure.
</div>
<div style="text-align: justify;">
<br />
<span style="color: #b45f06;"><b>Benjamin Buttons</b></span><br />
SDL generates <span style="font-family: "Courier New",Courier,monospace;">SDL_CONTROLLERBUTTONDOWN</span> and <span style="font-family: "Courier New",Courier,monospace;">SDL_CONTROLLERBUTTONUP</span> events. Inspect <span style="font-family: "Courier New",Courier,monospace;">event.cbutton.button</span> to see which button was pressed.<br />
The button codes are hard to find in the documentation, but look in <span style="font-family: "Courier New",Courier,monospace;">SDL_gamecontroller.h</span> to find the enums. The API is modeled after an XBox type controller, for PlayStation the button ‘A’ is a cross, button ‘Y’ is a triangle, <i>etcetera</i>. The ‘guide’ button is the PS logo button.
</div>
<blockquote class="tr_bq">
SDL_CONTROLLER_BUTTON_INVALID = -1<br />
SDL_CONTROLLER_BUTTON_A<br />
SDL_CONTROLLER_BUTTON_B<br />
SDL_CONTROLLER_BUTTON_X<br />
SDL_CONTROLLER_BUTTON_Y<br />
SDL_CONTROLLER_BUTTON_BACK<br />
SDL_CONTROLLER_BUTTON_GUIDE<br />
SDL_CONTROLLER_BUTTON_START<br />
SDL_CONTROLLER_BUTTON_LEFTSTICK<br />
SDL_CONTROLLER_BUTTON_RIGHTSTICK<br />
SDL_CONTROLLER_BUTTON_LEFTSHOULDER<br />
SDL_CONTROLLER_BUTTON_RIGHTSHOULDER<br />
SDL_CONTROLLER_BUTTON_DPAD_UP<br />
SDL_CONTROLLER_BUTTON_DPAD_DOWN<br />
SDL_CONTROLLER_BUTTON_DPAD_LEFT<br />
SDL_CONTROLLER_BUTTON_DPAD_RIGHT</blockquote>
<div style="text-align: justify;">
<br />
<span style="color: #b45f06;"><b>Any other business</b></span><br />
I have a single player game so programmatically, I took the easy road and just map all buttons and stick movement to key presses, internally translating to a keyboard joystick. This works, but it's not good enough for multiplayer games. It also loses the ability to properly react to stick acceleration; you might want to scale the axis values to have a proper difference between large and small stick movement.<br />
<br />
Some tutorials suggest you ignore the ‘dead zone’, but this is wrong because when you fully ignore those events, you are unable to detect when someone suddenly lets go of the joystick. Instead, the dead zone should be mapped to zero movement. You can also keep track of sticks entering/leaving the dead zone, and react correspondingly.<br />
<br />
The DualShock 3 controller has pressure sensitive buttons. SDL has no support for measuring the amount of pressure. Nor does it support the sixaxis motion sensor. SDL 2 can do rumble however via its Haptic API.<br />
<br />
The cool thing about SDL is that it also ports to other platforms, maybe most notably the Steam OS. Since I'm on OS X I also had a quick look at how to do this joystick & game controller thing in pure Cocoa — not so easy at all. I'm with SDL 2 on this one.<br />
<br /></div>
<span style="color: #b45f06;"><b>Links</b></span><br />
<a href="https://github.com/gabomdq/SDL_GameControllerDB" target="_blank">SDL 2 GameControllerDB file</a><br />
<a href="https://wiki.libsdl.org/APIByCategory" target="_blank">SDL 2 API by Category</a><br />
<a href="https://gist.github.com/hlung/8385683" target="_blank">How to connect a PS3 controller</a><br />
<br />
</article>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-68980183707412146932014-06-08T13:13:00.001+02:002014-06-08T13:23:23.192+02:00Go Rust Swift, Vala<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://3.bp.blogspot.com/-EoH1lyKKtyc/TQU5SPX1NgI/AAAAAAAAC2Y/EC0IgRFSTBI/s1600/Escher_Chameleon.png" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
In the WWDC keynote speech, Apple announced their new programming language named Swift. It looks like a scripting language, but it actually compiles to binary, giving you lots of performance. For years, I've been griping about the need for a programming language that's as easy as scripting, while delivering decently high performance. Lately, we've seen a number of new languages that do exactly this. So, it's time to discuss a few of these new, modern languages.</div>
<br />
<span style="color: #b45f06;"><b>Go</b></span><br />
<div style="text-align: justify;">
Go was developed at Google and was first publicly released at the end of 2009. It's no secret that I'm a bit of a fan of Go, and I'll admit that of the four new languages discussed here, it's the only one I actually wrote a decent amount of code in.<br />
Go is awesome for parallel programming. Goroutines and communication channels are plain brilliant. In fact, it completely changes the way you think about parallel programming. Go's duck typing system is marvelous, an enormous sigh of relief after decades of OOP and I just want to hug the creators of Go.<br />
And yet, it has not yet become my primary language of choice. A big reason for that is mandatory CamelCasing. In Go, lower/upper CamelCasing actually controls whether the symbol is being exported or not; a really awkward design decision (golang's only mishap?).<br />
Go is great for systems programming (ie. for <span style="font-family: Times,"Times New Roman",serif; font-variant: small-caps;">UNIX</span> tools, services), however ... it's as clunky as C when trying to parse a config file, and its <span style="font-family: "Courier New",Courier,monospace;">flag</span> command-line argument parser module isn't great either. Then I tried making a game in Go and simply couldn't get started because there were no stable bindings for the needed libs at hand at the time.<br />
Go has a very “C feel” to it, every time I code in it I feel like I'm constructing a tiny, delicate machine. Everything about Go is excellently engineered. Some clever men with beards really thought things through.</div>
<div style="text-align: justify;">
What will golang's popularity be, five years from now? We shall see if it's truly to replace C/C++ as top systems programming language.</div>
<br />
<span style="color: #b45f06;"><b>Rust</b></span><br />
<div style="text-align: justify;">
Mozilla's language named Rust started in 2006, but didn't go public until 2010. Rust is quite impressive, but the tutorial kind of lost me once it started on heap allocation, object ownership of memory, control of mutability, references, and confusing syntax involving glyphs. This leads to some pretty horrific code that shouldn't have been that way, in my opinion. Secretly, Rust is trying hard to be like C++.<br />
The tutorial has some notes on features that are currently missing or buggy. In that respect, Rust is not ready for prime time yet. It does show a lot of potential and it did manage to create a buzz in the open source world, so check back once it hits version 1.0.</div>
<br />
<span style="color: #b45f06;"><b>Swift</b></span><br />
<div style="text-align: justify;">
Apple's Swift language was announced only last week and is said to have been in development since 2010. At a glance, it's suspiciously a lot like Rust. This is particularly interesting given the fact that Rust is sponsored by rival company Samsung. What's the true story behind this?</div>
<div style="text-align: justify;">
Anyway, Swift looks awesome, and it's already miles ahead of Rust. Swift builds upon a foundation laid by Objective-C, all the way from NextStep to OS X and iOS. Cocoa and Cocoa Touch bindings are in place and ready to use. Under the hood Swift uses reference counting for garbage collection, and here lies a major pitfall; you should take care not to create any cyclic references, and/or know the difference between weak and unowned references. If you get this wrong, memory will leak just as it did in Obj-C. <span style="color: #666666;">[Note that the same is true for Rust and Vala].</span></div>
<div style="text-align: justify;">
Swift lives in an Apple-only world, which should be no surprise — I suppose this compares to Microsoft's strategy regarding Visual Basic, C#, .NET.<br />
There is no doubt that Swift is going to be huge on iOS, and probably also on OS X. Nobody likes Obj-C (apart from some weird guys like me). The matter of truth is, Obj-C is outdated, it has just been obsoleted by Swift. Two years from now, no one will be writing new applications in Obj-C.</div>
<br />
<span style="color: #b45f06;"><b>Vala</b></span><br />
<div style="text-align: justify;">
Last but not least, there is Vala, which has been around since 2006 already. Vala is a Java/C#/Mono-like language for developing GNOME applications. It is said Elementary OS's main apps are built in Vala. The Vala compiler converts Vala code to plain C with GObject, which is in turn compiled by a generic C compiler. In a sense, Vala is just syntactical sugar, removing all the bad things that C has (the tedious task of programming intricate details), while keeping all the good (high performance).<br />
Its syntax will be appealing to mostly Java and C# programmers. Personally, I get lost in the soup of keywords. A Vala source can be like a C++ code in disguise. There are just too many keywords; the designers forgot that less is more. It's probably too late now, but my wish for Vala is to be stripped down, simplified, made easier to work with.<br />
Nevertheless it is a great choice for developing GNOME applications, and there is Elementary OS to prove it. Currently it looks like Vala's success is tied to that of GNOME, and it's a bit of a shame because it deserves more attention from a broader audience.<br />
Vala wins the prize for “Best chosen name for new programming language” simply because googling it lands you on the right pages right away.</div>
<br />
<span style="color: #b45f06;"><b>Links</b></span><br />
<ul>
<li><a href="http://golang.org/doc/effective_go.html" target="_blank">Effective Go</a></li>
<li><a href="http://doc.rust-lang.org/tutorial.html" target="_blank">Rust tutorial</a></li>
<li><a href="https://developer.apple.com/library/prerelease/ios/documentation/Swift/Conceptual/Swift_Programming_Language/TheBasics.html" target="_blank">Swift Basics</a></li>
<li><a href="https://wiki.gnome.org/Projects/Vala/Tutorial" target="_blank">Vala tutorial</a></li>
</ul>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-62743441645931210512014-05-29T14:10:00.004+02:002014-05-29T14:16:04.479+02:00Programming is hard<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://3.bp.blogspot.com/-W6mtAwWC-jc/Sj_naZWj2-I/AAAAAAAACFE/KjGIPykg76w/s1600/garbage.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
I have always seen programming as something creative, like an art. And like in art, it is hard work to create something beautiful. It's hard to get the details right. If you get it right, then people will like the result, and of course, some may not because tastes differ and you can't make everybody happy. But aside from people's reaction to the picture painted, a painter also gets his satisfaction from mere crafting, using this technique here and that trick there, knowing the internals of his work.<br />
Picture a musician, selecting an instrument to play, a rhythm, writing a melody.<br />
<br />
That sure is a romantic view of programming. In reality, it's more of a purely technical process. When you want to accomplish goal G you are going to have to make work A and B together. This might be done with technique C. This is engineering. Apply a trick here, and it is called a hack. Hacks can be dirty hacks, or they can be clever hacks. In a security context, they might even be dangerous hacks.<br />
<br />
Some say building applications is like building a house, and there should be blueprints for building software as well. Houses are built from small, simple building blocks. All houses have walls and a roof, a kitchen, a living and a bedroom, yet each house is unique. For each room, you can zoom in and work out the details. Users have the option of changing the wallpaper. And of course, a construction error may take down the entire house.<br />
It's a nice metaphore that works well, but I suspect that whoever thought up that idea is not a programmer.<br />
<br />
Programming is a mind game. The programmer builds a model in his mind, writes down the code and tries it. If it doesn't work, she tries to find out where it doesn't work. You may use a debugger or sketch it on paper but mostly the problem is cracked in your mind. Like a game of chess, it takes a lot of concentration trying to think six moves ahead, a single mistake will play out badly sooner or later.<br />
<br />
Bugs are rarely ever “happy little accidents” like in Bob Ross paintings. They range from apparently random program crashes to worldwide Heartbleed scale disaster. Slowly the world is waking up, it has become apparent that eventually all software is flawed. Consequently, using the internet is as big a risk as a running across a highway, naked.<br />
<br />
Programming is hugely underestimated. Programming is hard. It is really, really hard. The human mind was not made for this job. It is incredible that we have computers at all. Steve Jobs said that everyone should learn to program, because it teaches you to <i>think</i>. I agree a little, but it's like saying “Everyone can paint!” Or, “everyone should learn how to build a house.”<br />
<br />
The best way to keep out bugs? <span style="color: red;">Keep It Simple, Stupid.</span> Simplify, simplify. Write code in a clear, consistent style. Add useful comments. Sometimes comment on why you did it this way. Test, test, test. Eat your own dog food.<br />
<br />
There is a nice test to see if you've written good code. Open up a source file that you wrote six months ago. If you can't make out how it works within five minutes, then it's just not good enough. Now open up a source code that <i>someone else</i> wrote.<br />
<br />
Boom. Oh, the horror.<br />
<br />
Note that even this test is flawed, because you might not be seeing what you <i>think</i> you are seeing. Think about it. Programming is hard. And so is building houses.<br />
</div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-65460376186429411282014-04-07T15:18:00.000+02:002014-04-07T16:01:59.133+02:00In exceptional conditions<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://1.bp.blogspot.com/-oy-1dOSPT4A/Spw8Ud0gc6I/AAAAAAAACFk/6rojRVSOu9c/s1600/esher_knot2.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
On the topic of condition variables, wikipedia says: <span style="background-color: #f3f3f3;">“In concurrent programming, a monitor is a synchronization construct that allows threads to have both mutual exclusion and the ability to wait (block) for a certain condition to become true.”</span></div>
<div style="text-align: justify;">
This is commonly used in producer/consumer-like problems.</div>
<br />
The code for programming such a monitor looks like this:<br />
<blockquote class="tr_bq">
lock(&mutex);<br />
while (!predicate) {<br />
condition_wait(&mutex);<br />
}<br />
consume(&item);<br />
unlock(&mutex);</blockquote>
<div style="text-align: justify;">
This can be implemented, for example, with <span style="font-family: "Courier New",Courier,monospace;">pthread_mutex_lock()</span> and <span style="font-family: "Courier New",Courier,monospace;">pthread_cond_wait()</span>. If you would wrap things into a class, it might look like the following:</div>
<blockquote class="tr_bq">
mutex.lock();<br />
while (!predicate) {<br />
cond.wait(mutex);<br />
}<br />
consume(item);<br />
mutex.unlock();</blockquote>
<div style="text-align: justify;">
However, in a language that uses exceptions, this code would be flawed. Suppose that <span style="font-family: "Courier New",Courier,monospace;">consume()</span> (or even <span style="font-family: "Courier New",Courier,monospace;">cond.wait()</span>, however unlikely) throws an exception, then the mutex remains locked. Therefore, in a language like Java, you should write:</div>
<blockquote class="tr_bq">
mutex.lock();<br />
try {<br />
while (!predicate) {<br />
cond.wait(mutex);<br />
}<br />
consume(item);<br />
} finally {<br />
mutex.unlock();<br />
}</blockquote>
<div style="text-align: justify;">
This ensures the mutex gets unlocked, even in case an exception occurs.<br />
In Python, the mutex is part of the condition object. This simplifies things somewhat by hiding the mutex under the hood:</div>
<blockquote class="tr_bq">
cond.acquire()<br />
try:<br />
while not predicate:<br />
cond.wait()<br />
consume(item)<br />
finally:<br />
cond.release()</blockquote>
C++ does not have a <span style="font-family: "Courier New",Courier,monospace;">finally</span> keyword — it doesn't need it, because it has RAII (“Resource Acquisition Is Initialisation”). In C++, it should be coded as:<br />
<blockquote class="tr_bq">
std::mutex mx;<br />
{<br />
std::unique_lock<std::mutex> lk(mx);<br />
while (!predicate) {<br />
cond.wait(lk);<br />
}<br />
consume(item);<br />
}</blockquote>
<div style="text-align: justify;">
Note that the destructor of <span style="font-family: "Courier New",Courier,monospace;">std::unique_lock</span> will ensure that the mutex is unlocked. The destructor will run when the stack unwinds, which is also done when an exception occurs.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Although entirely correct, and arguably the most elegant solution, I have to say I find the C++ code unintuitive and difficult to follow. The reason is that there are no explicit statements in this code block for dealing with locking; the mutex is locked by the constructor and unlocked by the destructor, that run <i>automagically</i>. It is almost as if the mutex variable isn't doing anything at all. Annoyingly, this code can not be refactored to resemble the other code fragments, because of RAII. And you are being forced to write it this way now that threads, mutexes, and condition variables are part of the <span style="font-family: "Courier New",Courier,monospace;">std</span> library; there is no other way in which a condition variable can be used because the other way would be incorrect in C++.<br />
This is one of those cases where C++ shines but somehow it doesn't feel gratifying.</div>
<div style="text-align: justify;">
<br /></div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-89465561341036928162014-03-16T12:27:00.000+01:002014-03-16T12:27:28.398+01:00Python multiprocessing: I did it my way<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://3.bp.blogspot.com/-UJRBrK4wDFc/TLGhs3lLinI/AAAAAAAAC1g/E-tpSxjjv44/s1600/stressful_worker.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
Python has a wonderful <span style="font-family: "Courier New",Courier,monospace;">multiprocessing</span> module for parallel execution. It's well known that Python's <span style="font-family: "Courier New",Courier,monospace;">threading</span> module does not do <i>real</i> multithreading due to the Global Interpreter Lock (GIL). Hence <span style="font-family: "Courier New",Courier,monospace;">multiprocessing</span>, which forks a pool of workers so you can efficiently run tasks in parallel.</div>
<br />
In principle:<br />
<blockquote class="tr_bq">
p = multiprocessing.Process(target=func)<br />
p.start()<br />
p.join()</blockquote>
In practice, I have had weird issues with <span style="font-family: "Courier New",Courier,monospace;">multiprocessing</span>. Two notable issues:<br />
<br />
<ul>
<li>program does not start any processes at all, and just exits</li>
<li>program raises an exception in QueueFeederThread at exit</li>
</ul>
<br />
<div style="text-align: justify;">
Apparently the <span style="font-family: "Courier New",Courier,monospace;">multiprocessing</span> module has its problems, and although there are fixes in the most recent versions, I can't rely on that when the users of my program are running on operating systems that don't always include the latest greatest Python.</div>
<br />
<div style="text-align: justify;">
So, a decision had to be made. I ditched the <span style="font-family: "Courier New",Courier,monospace;">multiprocessing</span> module and replaced it with my own, that calls <span style="font-family: "Courier New",Courier,monospace;">os.fork()</span>. The resulting code is much easier to handle than with <span style="font-family: "Courier New",Courier,monospace;">multiprocessing</span>, too.</div>
<div style="text-align: justify;">
Note that <span style="font-family: "Courier New",Courier,monospace;">os.fork()</span> does not port to Microsoft Windows. My target platform is <span style="font-family: Times,"Times New Roman",serif; font-variant: small-caps;">UNIX</span> anyway.</div>
<br />
Pseudo-code:<br />
<blockquote class="tr_bq">
parallel(func, work_array):<br />
for i < numproc:<br />
fork()<br />
if child:<br />
work_items = part_of(work_array)<br />
for item in work_items:<br />
func(item)<br />
<br />
# child exits<br />
exit(0)<br />
<br />
wait_for_child_procs()</blockquote>
<div style="text-align: justify;">
So, a pool of <span style="font-family: "Courier New",Courier,monospace;">numproc</span> workers is spawned. Each of the child processes do their part of the work given in <span style="font-family: "Courier New",Courier,monospace;">work_array</span>. No communication is needed here because <span style="font-family: "Courier New",Courier,monospace;">fork()</span> causes the child to be a copy of the parent process, and thus getting a copy of <span style="font-family: "Courier New",Courier,monospace;">work_array</span>.</div>
<div style="text-align: justify;">
This is the simplest kind of parallel programming in <span style="font-family: Times,"Times New Roman",serif; font-variant: small-caps;">UNIX</span>. Surprisingly, this bit of code works better for me than the <span style="font-family: "Courier New",Courier,monospace;">multiprocessing</span> module — which supposedly does the same thing, under the hood.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Having decent language support for parallel programming is of utmost importance in today's world, where having a quadcore CPU is no exception; practically all modern computers are multi-core machines. A modern programming language should offer some easy mechanisms to empower the programmer, enabling you to take advantage of the hardware at hand.</div>
<div style="text-align: justify;">
Proper multithreading is exceptionally hard (if not impossible) to do for interpreted languages. This is a fact of life. Using a forking model, you can still get good parallellism.<br />
<br /></div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-36708239984481483392014-02-24T14:27:00.000+01:002014-02-24T14:48:54.979+01:00MD5 and libs and licenses<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://3.bp.blogspot.com/-6c4npUT3_AE/T7ED7oSEJCI/AAAAAAAADNU/6iN1Xua12ec/s1600/matrix_code.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
I keep running into code that looks like this:<br />
<blockquote class="tr_bq">
#define S21 5<br />
#define S22 9<br />
#define S23 14<br />
#define S24 20<br />
GG ( a, b, c, d, in[ 1], S21, UL(4129170786)); /* 17 */<br />
GG ( d, a, b, c, in[ 6], S22, UL(3225465664)); /* 18 */<br />
GG ( c, d, a, b, in[11], S23, UL( 643717713)); /* 19 */<br />
GG ( b, c, d, a, in[ 0], S24, UL(3921069994)); /* 20 */<br />
GG ( a, b, c, d, in[ 5], S21, UL(3593408605)); /* 21 */<br />
GG ( d, a, b, c, in[10], S22, UL( 38016083)); /* 22 */<br />
GG ( c, d, a, b, in[15], S23, UL(3634488961)); /* 23 */<br />
/* ... etcetera ... */<br />
HH ( ... )<br />
II ( ... )</blockquote>
<div style="text-align: justify;">
This is an excerpt of C code for the MD5 algorithm as implemented by Ron Rivest, published in RFC-1321 in 1992. What's wrong with this picture? Well, not so much, except that there is this nice OpenSSL library that is present practically everywhere. The OpenSSL library provides this functionality and the code has been reviewed by dozens of people and is being used by millions. </div>
<br />
Using libssl for MD5 digests is easy:<br />
<blockquote class="tr_bq">
#include <openssl/md5.h><br />
<br />
unsigned char digest[16];<br />
MD5_CTX ctx;<br />
<br />
MD5_Init(&ctx);<br />
MD5_Update(&ctx, buf, buf_len);<br />
/* multiple calls to MD5_Update() may be made */<br />
MD5_Final(digest, &ctx);</blockquote>
Link with <span style="font-family: "Courier New",Courier,monospace;">-lssl</span> and you're done.<br />
<br />
<div style="text-align: justify;">
On OSX, something is up. OSX no longer ships OpenSSL. Instead, Apple now provides their “CommonCrypto API”, which is, ironically, not so common. An ugly trick to get OpenSSL MD5 code to work with CommonCrypto:</div>
<blockquote class="tr_bq">
#include <CommonCrypto/CommonDigest.h><br />
<br />
#define MD5_CTX CC_MD5_CTX<br />
#define MD5_Init CC_MD5_Init<br />
#define MD5_Update CC_MD5_Update<br />
#define MD5_Final CC_MD5_Final<br />
<br />
#ifdef MD5_DIGEST_LENGTH<br />
#undef MD5_DIGEST_LENGTH<br />
#define MD5_DIGEST_LENGTH CC_MD5_DIGEST_LENGTH<br />
#endif</blockquote>
<div style="text-align: justify;">
Why did Apple do this, you may ask. The reason is probably software licensing; OpenSSL's dualistic license is problematic in particular for apps on iOS devices. Apple could probably have kept OpenSSL on the Mac, but they didn't.</div>
<div style="text-align: justify;">
(* It's either this or the NSA forced them into putting some weakened PRNGs into CommonCrypto).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Coming full circle, you would be justified <i>not</i> to use a common library in cases where you run into issues with the software license. There are many free and open software licenses, and many of them have quirks. This is a major annoyance in software development, and something to consider when using libraries.</div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-65815080911131918692014-01-19T14:30:00.000+01:002014-01-19T18:07:29.360+01:00One and one are eleven<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://1.bp.blogspot.com/-dRYSCCoZEd8/UHm78LoXQ-I/AAAAAAAADWI/akBbEZBhCCI/s1600/learned_c%252B%252B.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
Oh C++, why art thou so hard? Thou giveth us nought but grievance and pain. Often I've cried out because C++ just wouldn't play nice. The language is too hard to learn, and it couldn't do anything you could do in plain C. But I've actually started to like C++. I have come to appreciate it a lot. I suppose it ages like fine wine. C++11 brings a lot of good to the table, well, as long as your compiler is up to par.</div>
<br />
<div style="text-align: justify;">
Rather than go over all the features of C++11 (there's enough of that on the web to go 'round), I want to show you a neat trick. Lo and behold.</div>
<blockquote class="tr_bq">
template <typename... Tpack><br />
void go(Tpack&&... args) {<br />
std::function<void()> func = std::bind(std::forward<Tpack>(args)...);<br />
std::thread(trampoline, func);<br />
}</blockquote>
<div style="text-align: justify;">
This defines a <span style="font-family: "Courier New",Courier,monospace;">go()</span> function that allows you to fire up any function (with arguments!) as a thread. The thing with <span style="font-family: "Courier New",Courier,monospace;">Tpack</span> is a variadic template, which is a type-safe kind of varargs. The call to <span style="font-family: "Courier New",Courier,monospace;">std::bind</span> turns it into a <span style="font-family: "Courier New",Courier,monospace;">std::function</span> object, which is a kind of functor. <span style="font-family: "Courier New",Courier,monospace;">std::thread</span> creates a new thread and launches a trampoline function that will invoke the function functor for us. Okay, the actual implementation is a little different, I have simplified it here for the sake of example. For one thing, you should catch the <span style="font-family: "Courier New",Courier,monospace;">std::system_error</span> exception that is thrown in case the operating system fails to launch the thread.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
So now we can easily run any function as a thread. Next, I wanted <span style="font-family: "Courier New",Courier,monospace;">go()</span> to accept my custom <span style="font-family: "Courier New",Courier,monospace;">Functor</span> class as well. Normally you would go about that doing that like so:</div>
<blockquote class="tr_bq">
void go(const Functor&);<br />
void go(Functor&);</blockquote>
<div style="text-align: justify;">
Guess what, it didn't work! The result was a (not so) glorious Segmentation fault. Tracing it revealed that the compiler was favoring the <span style="font-family: "Courier New",Courier,monospace;">go(Tpack&&...)</span> variant over the specific Functor functions. But no worries, template specialization to the rescue:</div>
<blockquote class="tr_bq">
// specialized templates for passing in a Functor<br />
// const Functor& is only used for const functions or<br />
// explicitly const Functors<br />
// in most cases go(Functor&) is used<br />
template <><br />
inline void go<const Functor&>(const Functor& f) {<br />
do_something_with(f);<br />
}<br />
<br />
template <><br />
inline void go<Functor&>(Functor& f) {<br />
do_something_with(f);<br />
}</blockquote>
<div style="text-align: justify;">
This weird looking stuff is C++ dark magick, written in the black of night by the light of a candle. If you look past all the confusing glyphs, it's simply telling the compiler to call the right function for what kind of argument we have.</div>
<br />
<div style="text-align: justify;">
All of this wasn't quite possible to this extent before C++11. Other than impressing your girlfriend with this trickery (not!), C++11 comes with a most important change: the move constructor. This is something you <i>MUST</i> learn. It's hard to believe, but before eleven it was not possible to simply return instances from a function, because the destructor would kick in — you would return a copy and the original object would be destroyed. C++11 fixes this long standing problem by handing you the opportunity to move the content of the original to the copy, allowing you to return objects, like in any sane programming language. Actually, the move constructor is much more important than showing off template and thread trickery, but Brian and me really wanted to bring you the <span style="font-family: "Courier New",Courier,monospace;">go()</span> function this time.</div>
<br />Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-71227852819304697442013-12-29T14:54:00.000+01:002013-12-29T15:02:58.117+01:00The Yearly Countdown<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://3.bp.blogspot.com/-tYCe-dZro0E/R7b8PhwK52I/AAAAAAAAAA8/yJI1YtiBZ2A/s1600/prague_clock.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
End of year, December 31st. Every year millions of families world-wide gather round the tv-set just before midnight to watch the clock tick tick tick away the final seconds of the year. The clock strikes twelve and we yell “Happy New Year!” while outside fireworks burst, lighting up the sky.</div>
<br />
<span style="color: #e69138;"><b>It's about time</b></span><br />
<div style="text-align: justify;">
For some years already I run a desktop widget that counts down the year. I made it with Dashcode when I was just messing around on a boring <i>January</i> day. After nearly a year, I found an interesting bug: it started counting up in the next year..! Just like a regular clock does.</div>
<div style="text-align: justify;">
Anyway, it's fun saying things like “99 days left!” out of the blue and see people reacting “what was that all about?”.</div>
<br />
<div style="text-align: justify;">
The year 2012 had a leap second so there was another bug: it was off by a second. This was totally unnecessary had I implemented it properly ... Just saying that dealing with time can be tricky.</div>
<br />
<span style="color: #e69138;"><b>Two minutes to midnight</b></span><br />
<div style="text-align: justify;">
Counting down the days until <i>year+1 Jan 1 00:00:00</i> is fun and not hard to do. Be mindful though that the naive implementation of doing <span style="font-family: "Courier New",Courier,monospace;">sleep(1)</span> in a loop is plain wrong. You will find your family and friends yelling out Happy New Year a split second before the counter reaches zero. Why is that? The thing is, you forgot about milliseconds, microseconds, and nanoseconds. This is illustrated by the following timeline:</div>
<blockquote class="tr_bq">
23:59:56.852 countdown program start<br />
sleep 1<br />
23:59:57.852 sleep 1<br />
23:59:58.852 sleep 1<br />
23:59:59.852 sleep 1<br />
00:00:00.000 Happy New Year! on tv<br />
00:00:00.852 countdown program reacts too late</blockquote>
<div style="text-align: justify;">
Here, the program starts at an offset of 852 milliseconds causing it to react too late; it practically misses the strike of midnight. 852 milliseconds feels like almost a full second, so the countdown program just isn't good enough.</div>
<div style="text-align: justify;">
A better way to do it is to use either <span style="font-family: "Courier New",Courier,monospace;">usleep()</span> or <span style="font-family: "Courier New",Courier,monospace;">nanosleep()</span> to sleep fractions of a second and round it off to an (near) exact tick of a second.</div>
<br />
<span style="color: #e69138;"><b>Sleep with one eye open</b></span><br />
<div style="text-align: justify;">
Although you can do nanosecond sleeps, the practical accuracy of your software clock is about 15 milliseconds. This is because of timeslicing in multitasking operating systems. Really? Well, yes and no. Modern computers have a High Precision Event Timer (HPET) that can be used to do more accurate timing of events. This is entirely meant for doing things like performance measurements and not so much for wallclock time keeping.</div>
<br />
<span style="color: #e69138;"><b>No time to waste</b></span><br />
<div style="text-align: justify;">
A countdown timer is pretty neat and not so hard to write. It gets more fun when you add a big LED display in OpenGL graphics. I suppose it's too late now to get it approved in the App store and get rich before 2014 starts, but we can always try again next year.</div>
<br />
<div style="text-align: justify;">
Best wishes! and if you want to read more about high precision timing, see this excellent blog post by another code monkey:</div>
<a href="http://tdistler.com/2010/06/27/high-performance-timing-on-linux-windows">http://tdistler.com/2010/06/27/high-performance-timing-on-linux-windows</a><br />
<br />Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-85417847698791289292013-11-27T21:53:00.000+01:002013-11-27T22:24:41.311+01:00C++ shared_ptr (or really C++ standard mess)<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://1.bp.blogspot.com/-dRYSCCoZEd8/UHm78LoXQ-I/AAAAAAAADWI/akBbEZBhCCI/s1600/learned_c%252B%252B.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
The C++ shared_ptr is a pointer to an allocated instance that ensures the instance is deleted after the last shared_ptr to the object goes out of scope (or is explicitly deleted). It's a great mechanism for doing some behind-the-scenes automated memory management. You can point multiple shared_ptr's to the same object and not have too much difficulty in managing the memory of the object that it's pointing at.</div>
<br />
<div style="text-align: justify;">
The shared_ptr first appeared in the boost library. Once upon a time we would write:</div>
<blockquote class="tr_bq">
#include <boost/shared_ptr.hpp><br />
<br />
boost::shared_ptr<T> ptr;</blockquote>
<div style="text-align: justify;">
<br />
Later, a committee found shared_ptr so cool, they said “we want that too” and incorporated it into the C++ library. The shared_ptr became part of the TR1 extension (first Technical Report), and we would write:</div>
<blockquote class="tr_bq">
#include <tr1/memory><br />
<br />
std::tr1::shared_ptr<T> ptr;</blockquote>
<div style="text-align: justify;">
<br />
For about a decade, work on the next C++ standard continued under the name C++0x. At some point, the gcc team at GNU recognized that TR1 would soon become part of the next C++ standard. So they already incorporated TR1 into the <span style="font-family: "Courier New",Courier,monospace;">std</span> namespace, but you would have to invoke the compiler with a special flag because it wasn't quite standard yet:</div>
<blockquote class="tr_bq">
#include <memory><br />
<br />
std::shared_ptr<T> ptr;<br />
<br />
g++ -std=c++0x</blockquote>
<div style="text-align: justify;">
<br />
Not that long ago, the new standard arrived as ‘C++11’. The compilers and library were updated. There was more to C++11, and it was a new standard after all, so GNU added a new flag:</div>
<blockquote class="tr_bq">
#include <memory><br />
<br />
std::shared_ptr<T> ptr;<br />
<br />
g++ -std=c++11</blockquote>
<div style="text-align: justify;">
<br />
At this very moment, things should have been good now since we have C++11. In practice however, we're in bad luck. Every platform ships their own ‘current’ version of the compiler, and it works differently every time. Older compilers choke on C++11, and newer compilers that prefer C++11 choke on the TR1 stuff. In order to write C++ code that actually compiles across multiple platforms, you have to:</div>
<ul>
<li>use autoconf (yuck)</li>
<li>use ugly ifdefs to get the correct includes</li>
<li>use typedefs, or hack the <span style="font-family: "Courier New",Courier,monospace;">std</span> namespace, or import entire namespaces</li>
<li>use the correct compiler flags, and mind which flag to use for what version of <span style="font-family: "Courier New",Courier,monospace;">g++</span></li>
</ul>
It's either this or telling people “Your compiler is too old!”.<br />
We can only hope that time passes quickly and that TR1 will soon be out of the picture, only a vague memory of an obscurity in the distant past. Until then, ...<br />
<br />Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-90935896332592610212013-08-18T23:45:00.000+02:002013-08-18T23:52:16.191+02:00The quad-tree<div class="separator" style="clear: both;">
<img border="0" src="http://4.bp.blogspot.com/-mRUgXh00Zdw/TzaZFXzLHYI/AAAAAAAADHg/8rxeTYBxuvs/s1600/treeview.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
Six years ago (which seems like an eternity already) I made a 2D arcade shooter in which you walk around the screen, zapping monsters. Each level is just one screen, and once you clear it, it's on to the next one. For collision detection, it checked every monster against the player, and every bullet against every monster. The game ran fine – especially on today's computers with near <i>infinite</i> power – but in terms of efficiency, it's quite bad to do that many collision tests. To make things better, I followed the good advice of the game programming gurus and added in a quad-tree.</div>
<br />
<div style="text-align: justify;">
Objects can only collide when they are near each other. So if an object is far away, you don't even want to test whether it might collide. Furthermore, if a group of objects is far away, you don't want to test for collisions with any member of that group. A quad-tree enables skipping large amounts of objects, thus bringing down the number of collision tests to perform.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The quad-tree is a space partitioning “device”; it divides 2D space up in compartments, automatically grouping objects together. It helps you select only the relevant group of objects for a possible collision and keeping the count of collision tests down to a minimum.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This post is not a tutorial on quad-trees, see below for a link to a good tut at gamedev.net. I had some additional thoughts on quad-trees.</div>
<br />
<span style="color: #e69138;"><b>Considerations</b></span><br />
<div style="text-align: justify;">
People often resort to using a uniform grid, also known as <i>binning</i>. With this technique, imagine putting a regular grid over the screen and adding objects to the bin (grid cell) that they are in. It's easy, but this is a poor solution that has all kinds of issues. For example when an object is on a cell boundary, it can be in multiple cells at the same time. If an object is so large that it spans many cells, there will be a big problem. If the player is at the edge of a cell, you'll have to check the adjacent cell as well. If the player is at the edge of the world, take care not go out of bounds. All in all this technique just isn't very good, and it wastes memory when the world is large but sparsely populated. The quad-tree however, is elegant and efficient and works in all cases.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Some people seem to think that a quad-tree can only deal with fixed size problems, i.e. the root node should be able to hold all objects that are in the world. This is so not true (!) If your quad-tree implementation uses arrays, you are doing it wrong. It is much more efficient to chainlink objects together using pointers. Like so:</div>
<blockquote class="tr_bq">
q.objlist => obj1.neighbor => obj2.neighbor => NULL</blockquote>
This scales dynamically, without cost, without growing arrays or allocating any extra memory at all.<br />
<br />
<span style="color: #e69138;"><b>Other uses for quad-trees</b></span><br />
<div style="text-align: justify;">
The quad-tree is ideal for static geometry like wall segments. Since the walls don't change, you can generate the quad-tree only once when loading the level and keep it around for testing against while the game is running. For moving walls and destructible environments it's probably easiest to use a separate, second quad-tree. (* If you are <strike>a graficks wizard</strike> clever, you may use BSP trees for this purpose. Quad-trees are much easier to use though.)</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Another ideal use of quad-trees is for view frustum culling in 3D engines. If a node falls outside the view, then all of its leaves will not be visible either, culling a large number of vertices at once. This is typically used in terrain rendering, but also works for other kinds of scenes.</div>
<div style="text-align: justify;">
Taking it one step further, you can use the same technique for doing occlusion culling. From a nearby object you can create an occlusion frustum. The nodes that are in the occlusion frustum area are not visible and can be culled away. If it's a large building or a mountain that is in the view, it will pay off to do this little extra work. Occlusion culling is an advanced topic, and I'm not convinced that modern games actually do this.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
It's trivially easy to extend the quad-tree code to an octree for use with 3D data.</div>
<div style="text-align: justify;">
<br /></div>
<span style="color: #e69138;"><b>Quad-trees in science</b></span><br />
<div style="text-align: justify;">
Cosmologists like doing simulations of colliding galaxies. This involves simulating gravitational forces between large numbers of stars (or bodies). Because each and every body interacts with each other body, this is called an <i>n</i>-body problem. One way of implementing such a simulation is the Barnes-Hut algorithm, which uses a quad-tree to group bodies together. Barnes-Hut may treat groups of bodies as a single body by using its center of mass, and thus having a single gravitational force. It's easy to see how the use of a quad-tree greatly speeds up such a simulation.</div>
<br />
<span style="color: #e69138;"><b>Links</b></span><br />
<ul>
<li><a href="http://gamedev.tutsplus.com/tutorials/implementation/quick-tip-use-quadtrees-to-detect-likely-collisions-in-2d-space/">Quadtrees for collision detection</a> at GameDev.net</li>
<li><a href="http://arborjs.org/docs/barnes-hut">Barnes-Hut algorithm</a> by Ventimiglia & Wayne, who teach at Princeton University (not game programming class)</li>
<li><a href="http://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation">Barnes-Hut simulation</a> at Wikipedia, with cool pictures</li>
</ul>
<br />
<ul>
</ul>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-58598120394543730242013-07-15T14:19:00.000+02:002013-07-15T14:23:54.138+02:00Programming mkdir -p<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://4.bp.blogspot.com/-mRUgXh00Zdw/TzaZFXzLHYI/AAAAAAAADHg/8rxeTYBxuvs/s1600/treeview.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
It's midsummer so let's do something light and easy for a change. In UNIX there is the <span style="font-family: "Courier New",Courier,monospace;">mkdir -p</span> command, meaning “create directory <i>and</i> all leading (parent) directories”. It is a convenient command with which you can quickly create a new directory tree. In programming, there is the <span style="font-family: "Courier New",Courier,monospace;">mkdir()</span> system call. This system call creates new filesystem directories, but it does not, however, automatically create any new leading parent directories for you. An attempt to do so is an error: <span style="font-family: "Courier New",Courier,monospace;">ENOENT</span>, or “No such file or directory”. How can we programmatically make deep directories?</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
A naive answer is to call <span style="font-family: "Courier New",Courier,monospace;">system("mkdir -p my/deep/path")</span>. While not incorrect, it comparatively uses lots of resources, but even if we don't care about that, it poses the interesting question: “Well, how does the <span style="font-family: "Courier New",Courier,monospace;">mkdir</span> shell command do it?”</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
A possible approach is to to break the path argument up into pieces, and for each piece create the new subdirectory. Something along these lines:</div>
<blockquote class="tr_bq">
break up "my/deep/path" into "my", "deep", "path"<br />
mkdir "my"<br />
mkdir "my/deep"<br />
mkdir "my/deep/path"</blockquote>
<div style="text-align: justify;">
For example, in Python:</div>
<blockquote class="tr_bq">
<div style="text-align: justify;">
def mkdir_p(path):<br />
arr = path.split(os.sep)<br />
newpath = ''<br />
for elem in arr:<br />
if not newpath:<br />
newpath = elem<br />
if not elem:<br />
newpath = '/'<br />
else:<br />
newpath = os.path.join(newpath, elem)<br />
<br />
if os.path.exists(newpath):<br />
continue<br />
<br />
os.mkdir(newpath) </div>
</blockquote>
<div style="text-align: justify;">
It has some checks for correctly handling relative and absolute paths, and it gets the job done. It performs poorly however for long paths that already exist for the most part. For example, for <span style="font-family: "Courier New",Courier,monospace;">mkdir_p("/home/walter/src/python/ blog/2013/mkdir_p")</span> it would do seven loops already before creating a new directory. Surely we can do better.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
If we could work backwards, the performance could be improved. First see if the deep path exists, take a step back, see if that path exists, and if it does, rewind and create the deeper path. This is achieved by recursion.</div>
<blockquote class="tr_bq">
def mkdir_p(path):<br />
if not path:<br />
return<br />
<br />
if os.path.exists(path):<br />
return<br />
<br />
mkdir_p(os.path.dirname(path))<br />
os.mkdir(path)</blockquote>
<div style="text-align: justify;">
The recursive version is wonderfully terse and like often with recursive functions, it may warp your mind and take some effort to grasp.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
I confess I used the non-recursive implementation for years. Ironically, Python already has a function to do this:</div>
<blockquote class="tr_bq">
os.makedirs(path)</blockquote>
<div style="text-align: justify;">
I had a peek in the source of Python's <span style="font-family: "Courier New",Courier,monospace;">os</span> module, and <span style="font-family: "Courier New",Courier,monospace;">os.makedirs()</span> indeed uses the recursive implementation. Mind that like <span style="font-family: "Courier New",Courier,monospace;">os.mkdir()</span>, <span style="font-family: "Courier New",Courier,monospace;">os.makedirs()</span> will throw an exception if the path already exists — unlike the <span style="font-family: "Courier New",Courier,monospace;">mkdir -p</span> shell command that doesn't complain if the directory is already there.</div>
<br />Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-44819737159638584132013-06-24T15:46:00.000+02:002013-06-24T15:46:15.453+02:00Object(ive)-oriented programming in plain C<div class="separator" style="clear: both; text-align: justify;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://4.bp.blogspot.com/-YcnKZNgl7MQ/UAKN-Env0RI/AAAAAAAADU8/1q4tVe_Nhmo/s1600/c_flames.png" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
<div style="text-align: justify;">
Last time I wrote about a safe string type implementation in a C library, and mentioned liboco. It's a library that provides some fundamental building blocks like a safe string type, a safe array type, and a map (or dictionary). In liboco, everything is a reference counted object, and arrays and maps store objects. liboco takes concepts from Objective-C, and its basic function compares to Foundation Kit. Like in Objective-C, reference counting is done manually using retain and release. It even has a memory pool with which you can deallocate objects in a deferred fashion using <span style="font-family: "Courier New",Courier,monospace;">autorelease</span>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
How does it work? Plain C is not object oriented, it has no classes, no constructors, no inheritance, etcetera. But we can implement object orientness in C. In fact, the first C++ compiler was a translator that produced a mangled plain C source code as output.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
As you might suspect, a safe string type would be implemented as a struct with member fields for a <span style="font-family: "Courier New",Courier,monospace;">char*</span> pointer to the data and an integer length. Now take a step back, and say that everything should be an object. This means a string would be derived from an object, and inherit the base object's properties. In plain C, we do this by including the struct of the base object in the struct of the derived object.</div>
<blockquote class="tr_bq">
typedef struct {<br />
object_t obj;<br />
<br />
char *str;<br />
int length;<br />
} string_t;</blockquote>
<div style="text-align: justify;">
What does the base object type look like? As said, every object is reference counted, so it should include a reference count.</div>
<blockquote class="tr_bq">
typedef {<br />
int refcount;<br />
} object_t;</blockquote>
<div style="text-align: justify;">
It doesn't seem much, but we've already laid down a basis. We can create custom types and derive them from object. We could even create a new type and derive it from the <span style="font-family: "Courier New",Courier,monospace;">string_t</span> type, if we wanted to.</div>
<div style="text-align: justify;">
There is no automatic constructor, but like in Objective-C when we allocate an object of a certain type, we should initialize it through an init function. This is its constructor.</div>
<blockquote class="tr_bq">
string_t *s = init_string(calloc(1, sizeof(string_t)));</blockquote>
<div style="text-align: justify;">
<span style="font-family: "Courier New",Courier,monospace;">calloc()</span> zeroes out the memory for us. Alloc and init are two distinct operations. If you combine them into a single function, you'll run into trouble later (when we look at inheritance) so keep them separate.</div>
<div style="text-align: justify;">
Since I don't like the syntax of what we now have, alloc and init is wrapped as:</div>
<blockquote class="tr_bq">
string_t *s = new_string();</blockquote>
<div style="text-align: justify;">
Note that every object instance is addressed by a pointer to the object. This isn't strange when you realize that in plain C, strings are char pointers and arrays are referenced by a pointer to the array.</div>
<div style="text-align: justify;">
The reference count of a newly allocated object is 1. In a reference counting system, you don't delete instances like you do in C++. Instead, you let go of them by calling <span style="font-family: "Courier New",Courier,monospace;">release()</span>. Releasing an object instead of bluntly deleting it will start to make sense once you put objects into containers.</div>
<blockquote class="tr_bq">
release(s);</blockquote>
<div style="text-align: justify;">
The <span style="font-family: "Courier New",Courier,monospace;">release()</span> function lowers the reference count, and when the reference count drops to zero it will destruct the object and free the memory. To keep an object around, increase its reference count by calling <span style="font-family: "Courier New",Courier,monospace;">retain()</span>. When you put an object into a container, like an array or a map, the container automatically retains the object, which is another way of saying that it holds a strong reference to the object. The container keeps the object around — until the container itself is released.</div>
<br />
<div style="text-align: justify;">
Functions like <span style="font-family: "Courier New",Courier,monospace;">retain()</span>, <span style="font-family: "Courier New",Courier,monospace;">release()</span> work on any type derived from object. But C is a strictly typed language ... so you'd need to typecast down to <span style="font-family: "Courier New",Courier,monospace;">object_t</span> every time to satisfy the compiler, or else you'd get a ton of warnings. This issue is solved by declaring the functions with <span style="font-family: "Courier New",Courier,monospace;">void*</span> arguments:</div>
<blockquote class="tr_bq">
void retain(void *);<br />
void release(void *);</blockquote>
<div style="text-align: justify;">
Now, if you don't supply a valid object, the compiler will not complain and these functions will happily trash around. So to make it safe we need some kind of runtime type checking. Wouldn't that be incredibly slow? No, it's just a single if-statement. We actually already need to know the type of the object for another reason: when the reference count drops to zero, how would <span style="font-family: "Courier New",Courier,monospace;">release()</span> know what destructor function to call? It knows because every object holds run-time type information (RTTI).</div>
<blockquote class="tr_bq">
typedef struct {<br />
objtype_t *objtype;<br />
int refcount;<br />
} object_t;</blockquote>
<div style="text-align: justify;">
Mind that in C++, the compiler holds type information at compile time. In C++, RTTI may or may not be available during run-time depending on whether the class is polymorphic and whether RTTI is enabled during compilation. What's fun is that RTTI also allows us to easily implement type introspection.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
What information is in the object type structure? It holds a pointer to the destructor function. <span style="font-family: "Courier New",Courier,monospace;">release()</span> will call this function before freeing the allocated instance.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Other than just the destructor, we can also include a constructor. Mind that in Objective-C, you have to manually call the super classes init function to correctly initialize an instance of a derived class. This is also the case in Python. In C++, it automatically calls the base class constructors.</div>
<div style="text-align: justify;">
At first I actually made it work this way using the default constructor only. There is a compelling case for having manual control anyway, and <i>not</i> including a constructor in the object's type definiton.</div>
<div style="text-align: justify;">
It's because of parameters: constructors often take parameters so it makes sense to write them this way. This means a constructor of an object can have any number of parameters and there is no strict declaration that fits all types. Thus, automatic calling of base class constructors goes out the window.</div>
<div style="text-align: justify;">
As slight added advance, it will be more clear how to do multiple inheritance.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
As said, the whole thing is a reference counting system. So when we copy an array, the contents of the copy point at the same elements as the original, and each element has had its reference count increased by one. In order to make a deep copy, we need a copy constructor. The copy constructor is a common idiom in C++, but it's not as well-known in Objective-C. Objective-C has the <span style="font-family: "Courier New",Courier,monospace;">NSCopying </span>protocol, and you implement a function <span style="font-family: "Courier New",Courier,monospace;">-copyWithZone:</span> that returns a newly allocated instance; the copied object.</div>
<div style="text-align: justify;">
liboco adds the copy constructor as a special kind of constructor (analog to C++), and likewise it automatically calls any base class copy constructors.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
C++11 adds a move constructor, with which you can move values from one instance to another under the hood. This is typically needed for doing return by value efficiently.</div>
<div style="text-align: justify;">
We have no need for this feature. The main reason is that objects are pointers, and pointers already efficiently return by value. Another reason is that we do not automatically destroy instances when they go out of scope, like C++ does. It's a manual reference counting system, so the programmer decides when an object is released or not.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Now that we have our own typing system, this gives way to adding a feature that is not present in Objective-C nor C++ (but is present in Python and Go), and it is printing an object. For printing any object we need a method that converts an object to a string.</div>
<div style="text-align: justify;">
Additionally, we need a special printing function that calls the string conversion method when requested.</div>
<blockquote class="tr_bq">
print("object type: %T\n", o);<br />
print("object value: %v\n", o);</blockquote>
<div style="text-align: justify;">
It's quite powerful; this allows us to print the value of any type of object with the format string specifier ‘%v’.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
It's also possible to add operator functions. Although plain C does not allow redefinition of operators, they can be simulated using function calls. This would be pretty annoying for hand-written codes however, but could be useful for machine generated codes.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
I haven't touched upon overloading. It gets close but doesn't become real OOP until it can do function overloading, which enables polymorphism. Function overloading can be implemented by including a table of methods (function pointers) in the object's type definition. Calling a method would resemble this chain:</div>
<blockquote class="tr_bq">
self->objtype->methods.method(self, parameters);</blockquote>
<div style="text-align: justify;">
This is how virtual function calls work in C++. A derived type has a copy of the table of its base class. In Objective-C, it works a little different. There, invoking a method means “sending a message”. What happens is, the table is searched for the method, and if the method is not found, the table of the superclass will be searched. This way, all methods of the superclass are inherited by the subclass.</div>
<div style="text-align: justify;">
Both ways can be simulated in plain C, but there is the tiresome work of writing out the virtual tables. Moreover, the first argument to the method would always have to be <span style="font-family: "Courier New",Courier,monospace;">void*</span> to prevent compiler warnings.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Finally, a word on using liboco. Programming in C with liboco is a lot like programming in Objective-C, and like in Objective-C, you practically leave plain C behind. You are now working with this new API, and the code is full of calls into this library, which is your foundation. Adding a new type for some kind of data structure means implementing it as an object, and writing the destructor, copy constructor, and string representation methods. This can be a drag, but it's part of what liboco is. Having an array and map type at your disposal in which any object can be placed, is a great win. You can quickly prototype something in Python and convert it to C rather easily. Wouldn't it be great if we could do automated machine translation from a scripting language to C and then compile it to native code? It's maybe something to work on at a later time.</div>
<br />Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.comtag:blogger.com,1999:blog-4453593041129550116.post-72508597033534483652013-05-13T14:33:00.000+02:002013-05-15T20:29:15.693+02:00More on unsafe C strings: A followup<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<img border="0" src="http://3.bp.blogspot.com/-EoH1lyKKtyc/TQU5SPX1NgI/AAAAAAAAC2Y/EC0IgRFSTBI/s1600/Escher_Chameleon.png" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;" /></div>
Recently John Carmack, lead programmer of DOOM, wrote in a tweet: <i>“Considering that 8 bit BASICs of the 70s had range checked and garbage collected strings, it is amazing how much damage C has done”</i>. This is in line with my previous posts, and it's fairly remarkable that C stuck around so long without having this fixed. There have been many updates to the language, but this issue was simply not addressed — at least, not until C11. C11 includes the <span style="font-family: "Courier New",Courier,monospace;">strcpy_s()</span> and <span style="font-family: "Courier New",Courier,monospace;">strcat_s()</span> safe string functions. It feels a bit too little, too late but, at last, something has been done about it.<br />
<br />
Are our troubles over now? Not really, C11 is still too new. My system does not have the new safe string functions so I can't use them. Adoption of the new standard takes time. Documentation (and school books) have to be updated. Next, the old <span style="font-family: "Courier New",Courier,monospace;">strcpy()</span> and <span style="font-family: "Courier New",Courier,monospace;">strcat()</span> have to be deprecated and finally, be eradicated from our source codes.</div>
<div style="text-align: justify;">
The bounds checking problem continues to exist for arrays.<br />
<br />
Other than C11 I want to expand on two other possibilities for fixing the issue. The first solution is using an external library, the second is choosing a different programming language.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #e69138;"><b>1. Using an external library</b></span></div>
<div style="text-align: justify;">
Rather than just supplying a safe string copying function, I mean using a library that supplies a safe string type and associated functions. As is often the case, this library would be written in C itself. One problem is that you have to rigorously adapt to using the library's string functions, and resist the temptation of using the old <span style="font-family: "Courier New",Courier,monospace;">char*</span> pointers for quick and dirty strings anyway.</div>
<div style="text-align: justify;">
Another problem is that system functions such as <span style="font-family: "Courier New",Courier,monospace;">open()</span>, <span style="font-family: "Courier New",Courier,monospace;">write()</span>, etcetera expect <span style="font-family: "Courier New",Courier,monospace;">char*</span> pointers, so you keep converting between the safe string type and the <span style="font-family: "Courier New",Courier,monospace;">char*</span> pointer.</div>
<div style="text-align: justify;">
Typical library solutions allocate strings on the heap, giving an extra performance hit. Using a library means marrying that library; your whole program source code will look different and be tightly coupled to it. Switching libraries will have a big impact and involve lots of work.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
My github project <span style="font-family: "Courier New",Courier,monospace;">liboco</span> is a C library that implements a safe string type, as well as an array and a map type, and a few other things. It follows the Objective-C design principle that everything is a reference counted object. Like in Objective-C, this means manual reference counting. It isn't bad, but can be problematic if you are not used to it. Hard to track down memory leaks may happen. Other than that, <span style="font-family: "Courier New",Courier,monospace;">liboco</span> is pretty neat.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Comparably, I have written <span style="font-family: "Courier New",Courier,monospace;">oolib</span> for C++ which is much easier to use because it does automatic reference counting under the hood. This brings us to the next point.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #e69138;"><b>2. Choosing a different programming language</b></span></div>
<div style="text-align: justify;">
It is fair to say that the problem of unsafe strings and having no bounds checks is intrinsic to C. It is a low-level language, after all. So if you want that extra bit of safety, you have to let go of C and choose a different language.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
C++ fixes strings with <span style="font-family: "Courier New",Courier,monospace;">std::string</span>. Its interface annoys me, but anyway, C++ is a complicated language and it's really hard to learn. Even after thoroughly studying C++ I'm not sure I would recommend the language to anyone. Sure, classes are useful, but you almost have to be an academic to master C++.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Other languages more popular than C on github are: PHP, Python, Java, JavaScript — all byte code interpreted languages. JavaScript is a total mess, but anyway, these languages by design lack the punch that C can deliver in terms of performance. The process of byte code interpreting is essentially emulation, and emulation is slow. I bet that even the laziest, dumbest compiler would produce a faster running code than these byte code interpreters.<br />
The argument of performance appears debatable, “do you really need that kind of performance?” The answer, in short, is: Yes. I have personally seen scripts run for hours, where the equivalent written in C ripped through it in half a minute. A speedup by a factor of 100 is no exception. Compiled languages offer vastly superior performance.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
There is Objective-C, but it is different enough to say it changes everything. Choosing Objective-C in practice means marrying OpenStep's Foundation Kit. It is a good choice if you are Apple-only and intend to keep it that way. Objective-C has manual reference counting, which is alright but kind of tough to get right. Apple's compiler however, sports automatic referencing counting (ARC) which does some clever source code analysis.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Whatever happened to Pascal? Although a rather verbose language (consider <span style="font-family: "Courier New",Courier,monospace;">PROCEDURE / BEGIN / END</span> versus <span style="font-family: "Courier New",Courier,monospace;">void { }</span>), Pascal wasn't so bad — except from the fact that it has the worst string type of all (!). Although Pascal strings are safe, they have an absolute maximum length of 255 bytes. In modern times, with unicode encoding, this means that for some languages only 63 characters can be stored in a single Pascal string, making it a totally useless language by today's standards.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Go deserves honorable mention. It is an easy to learn language with cool core features, and it compiles to native code. Even though I'm not fond of its mandatory CamelCasing, golang is really great. Go was designed as a systems programming language, and it shows. There are some Plan9 peculiarities like the command-line flags parser not being GNU <span style="font-family: "Courier New",Courier,monospace;">getopt()</span>like, and the tar archiving module doesn't grok GNU tar extensions. Other than that, Go has great potential.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Truthfully, I don't want a language that strays too far from C. Interestingly, although many modern languages are depicted “C like”, none of them are plain C with an added safe string type. I suppose it is hard to change C. For one thing, a ‘string’ is a fairly high-level concept that doesn't fit nicely in a low-level language.</div>
<div style="text-align: justify;">
<br /></div>
Walterhttp://www.blogger.com/profile/12455311338607724493noreply@blogger.com