Sunday, December 16, 2012

oolib gets notifications

Apple's Cocoa API has an interesting feature named the Notification Center. It enables UI components to communicate by posting messages to whomever is interested. I suppose it's a bit like Twitter in the insides of a computer program; objects may blurt out messages regardless of who is listening, and objects may show interest in receiving messages from other objects.

Notifications allow for loose coupling between objects. Suppose you have two classes A and B, and you want to make a state change in an instance of class A known to an instance of class B. The traditional way of doing that is to have A call a method in B. Now, A depends on B, and they are tightly coupled.
With notifications, the situation changes. A announces its state change to the world. B listens to state change messages, so it picks up the state change. There is no interdependency between the two, they are just handling messages. You can add a third class C that reacts to the message in another way, without touching class A.

This is particularly useful for GUI widgets; for example, a button announces it's been pressed, a scroll bar announces it's been moved, and a movie player announces it's finished playing.

For oolib, my personal C++ framework, I wanted to have this functionality too. It basically works like this: an object may register itself as an observer of certain messages. Whenever that message is sent, the observer will be notified.
class MyObserver : public Observer {
public:
    void notify(const char *event) {
        print("observer: %s", event);
    }
};

const char *Event1 = "This is event #1";
const char *Event2 = "This is event #2";

int main(int argc, char *argv[]) {
    MyObserver o;
    add_observer(o, Event1);
    add_observer(o, Event2);

    notify(Event1);
    notify(Event2);
    return 0;
}
The global notify() function will see to it that observer->notify() will be called for all registered observers.

There is one big difference between oolib's notify() and Cocoa's NSNotificationCenter: Cocoa works with Objective-C, and allows you to supply a selector—which is a function pointer, so it's essentially a callback mechanism.
With oolib, you are required to inherit from Observer and implement the virtual method notify(), which is more like ‘the C++ way’ of doing things.

Finally, I should mention the Qt framework. Qt has a feature called slots and signals, which allows you to connect a slot (class method) to a signal, which can be emitted by some object. It looks like Qt works by virtue of pointers to methods... which is peculiar because C++ does not deal well with pointers to methods. The Qt folks actually use a meta-compiler that generates the code needed to make it work.

This programming trick is also known as the observer pattern.

Sunday, December 2, 2012

Starry, starry night

The holidays are nearing, and it always makes me ponder the universe. Maybe it's because the stars come out in the cold and clear dark winter evenings, or maybe it's because of the unusually bright star in the Christmas story. Anyway, it gave me the idea of creating something like a classic 3D star field. What if you used actual star positions for this star field, and would you see something like you do on Star Trek if you flew through it at warp speed?

I started out by searching for star catalogs. These are plain text databases with lots and lots of scientific data about the stars.
Star databases are a bit of an annoyance; each has its own format, and they all contain different fields. Some contain parallax data (which is needed for calculating the distance of stars), but most don't. Some contain the visible color of stars, others don't. But most annoying of all, they don't use the same unique identifiers; star Procyon A is named HIP-37279 in the Hipparcos catalog, HD 61421 in the Henry Draper catalog, Gliese 280 in the Gliese catalog, while it's Bayer Flamsteed designation is 10 Alpha Canis Minoris.
The Tycho-2 star catalog contains over 2.5 million stars, but unfortunately it is worthless for making a 3D model because it lacks parallax data. So I took the HYG database, which is a modified version of the Hipparcos database, and combined it with usable data from the SKY2000 star catalog, plus a short list of well-known brightest stars. This involved some awk and Python scripting to cross reference the databases and calculate the 3D positions.

Star positions are usually given in right ascension (RA) and declination (Dec). These are angles, like latitude and longitude, but are often described in hours, minutes, and seconds. Given the angles and the parallax, you can calculate an x,y,z position in space, and this is what we need for a 3D star field. These coordinates have the earth at its origin. You might want to use galactic coordinates instead, which puts the Sun at the origin.
Now, because all of this stuff is turning, turning in outer space, the positions of stars change (unlike what I was told as a child). Star databases contain the position at a point in time (referred to as the epoch) and you can adjust the position using the proper motion of stars. Frankly, I didn't bother, but if you want to know the correct position of a star as it is today, you are going to have to do some extra math.

The constellation Orion
At a distance of 4 kpc

So, I put all coordinates into OpenGL and the result is pretty neat. In the left image you can clearly see Orion. The red star in the top-left of Orion is Betelgeuse. All the way to the left from Betelgeuse is Procyon. The image on the right shows what you see from a distance of 4 kiloparsecs. It's like a bubble with a disc through the center. This is just what the data set looks like though; in reality, it probably does not have this shape as we are part of a much, much larger Milky Way.

This is what almost 220 thousand stars look like on a computer screen, and you can look around and point at stars in real-time.

Fun facts:
  • The furthest bright star, just outside the bubble, is Deneb at 4754.8 light years.
  • Most stars around us are red dwarfs. Hence the pinkish color of the bubble.
  • When you travel to another star, the night sky looks different.
  • OpenGL vertex buffer objects (VBO) kick ass.

I was amazed that my (now three years old) computer had no problems whatsoever in dealing with this data. Once again, I find that the modern PC is an amazingly powerful computer. Although I'm no scientist and no astronomer, it enabled me to put together a pretty realistic 3D model of our stellar neighborhood in my attic.

Sunday, November 11, 2012

A Pool Allocator or The (un)importance of Code Optimization

In school I learned to use dynamic memory allocation, and to use linked lists to keep track of dynamically allocated objects. The idea was that arrays are static and fixed-size, while linked lists can virtually limitless continue to grow dynamically. The linked list also doesn't use memory that it doesn't need, and there are some nice performance characteristics when inserting and removing items from a linked list.

The malloc() library function takes memory from the heap, which is a region of memory. Whenever malloc() runs out of heap space, it will call sbrk() to grow the heap by asking the operating system kernel for extra memory pages. Modern operating systems actually use mmap() instead, which works by virtue of virtual memory and demand paging. It's all pretty sophisticated.

How different were things for classic arcade game machines, tiny computers with very small system memory. There was practically no operating system at all, it was just the game code itself running on the bare metal. Arcade games like Space Invaders and Galaga did not have the luxury of a malloc() library function. They were not even written in C, but in the machine's assembly language. The way to allocate monsters and bullets was to use an array in the bss segment—an area of system RAM memory sitting just outside where the program code was loaded, or rather, copied from eeprom to RAM when the machine was switched on.

Things were so much simpler then, but even today I like allocating monsters and bullets from a ‘free’ list backed by a statically allocated array. Such a pool allocator gives a very clear picture of what is going on and how things work.

Even though the pool allocator has a hard maximum on how many objects may be allocated, it will never deny you memory if you stay within that limit. No other task in a multitasking operating system can steal this memory away from you, making it a very robust memory allocator.
In terms of performance, the pool allocator always completes in a fixed time, which is important for games that need constant performance.

Discussions about memory usage and performance are becoming pointless in this day and age, when we have massive amounts of memory and CPU power at our disposal even on mobile devices. As computers before were frustratingly slow machines, you had to push the limits to get the most out of it. Writing efficient code was the challenge, and every time you managed to squeeze out a few cycles more it was an achievement well earned.

Nowadays this hardly applies anymore. Although I have to say, lately I had some scripting code for doing large data processing. I rewrote it in C++ with multi-threading, and boy, did that make a difference in run time.

Monday, October 29, 2012

File wrapper class in C++ (2)

Last time I showed how to implement a File class. In the solution Brian (pictured on the right) and me used a second class as a wrapper around a standard FILE pointer. The two classes worked together by means of a shared_ptr. The shared_ptr ensured that the file would not be closed until there were no more references to the file wrapper object. Reread the last post to see what I'm talking about.

A reader of this blog pointed out that there is another solution, one that turns out far more idiomatic. The constructor of shared_ptr accepts an instance of a functor, named a Deleter. This functor will be called when the reference count drops to zero, and the object will be destructed. The resulting code looks like this:
#include <cstdio>
#include <tr1/memory>

class File {
public:

    // conversion constructor
    File(FILE *f) : stream_(std::tr1::shared_ptr<FILE>(f,
        FileDeleter() )) { }

    // copy constructor copies the shared_ptr
    File(const File& f) : stream_(f.stream_) { }

private:
    std::tr1::shared_ptr<FILE> stream_;

    // local functor closes the FILE upon destruction   
    class FileDeleter {
    public:
        void operator()(FILE *f) const {
            if (f != NULL)
                std::fclose(f);
        }
    };
};
Right here I inlined the FileDeleter class. You may also choose to put it outside the enclosing class, or it can even be implemented as a struct with operator()().

All in all, this solution is more elegant and more idiomatic than the previous one.

Saturday, October 13, 2012

File wrapper class in C++

Last week my good friend Brian (see picture on the right) and I encountered an interesting problem: wrapping up a file descriptor in a File class. Making class wrappers in C++ generally isn't all that hard, but this particular one was a lot more tricky than we had expected.

What we wanted to have was a File class that would act the following way:
  • based on standard C's FILE*
  • factory function open() creates a File, and opens it
  • destructor of File closes the open file
So that you could write code like this:
    File f = open("filename");
    f.write("hello\n");

    // destructor closes the file
You might code open() like this:
File open(const char *filename) {
    File f;
    f.open(filename);
    return f;
}
Unfortunately, this code doesn't work, as it has a bug. Do you see it already?
The thing is, when you return f, the value f gets copied [by means of the copy constructor] and then the destructor gets called in the original f, because it goes out of scope. Since our destructor closes the open file, the result is that the file is closed once open() returns. Read that last sentence again, argh.

There are two issues at play here:
1. The copy constructor doesn't work because you can't properly copy an open FILE* object.
2. Exiting the function causes the destructor to run, closing the file — despite the fact that we are returning the instance.

Can I have the attention of the class
How to solve this problem, and no ugly hacks, please. A good solution lies in using shared_ptr. You can copy a shared_ptr and safely return it, and its internal reference count will cause the File instance to stick around. The File object will not destructed until it really needs to be.
[More specifically, the shared_ptr keeps a pointer to dynamically allocated memory, which never auto-destructs like instances on the stack do. The copy constructor and operator=() of shared_ptr happily copy the pointer, but while doing so it increments a reference counter that keeps track of how many shared_ptr instances are currently referencing the dynamically allocated object. The destructor of the shared_ptr decrements the reference count, but does not delete the pointer until the refcount drops to zero. So, a shared_ptr can be used to 'keep objects around' for as long as they're needed].

Beware, you can't simply dump a FILE* into a shared_ptr, since the FILE* may be allocated with malloc() rather than with new. And besides, a FILE* opened (or allocated) with fopen() must be deallocated using fclose(). Therefore we wrap the FILE* into another new class named FileStream and use that class with the shared_ptr.
#include <cstdio>
#include <tr1/memory>

class FileStream {
public:
    FileStream(FILE *f) : f_(f) { }

    ~FileStream() {
        if (f_ != NULL)
            std::fclose(f_);
    }

    ...

private:
    FILE *f_;
};

class File {
public:

    // copy constructor copies the shared_ptr
    File(const File& f) : stream_(f.stream_) { }

    ...

private:
    std::tr1::shared_ptr<Filestream> stream_;
};

File open(const char *);
Wrapping up
This is one of the reasons why C++ is so hard. In any other language, you'd just return the basic File object and be done with it. Even simple things like returning a value is hard in C++. You need a degree in Computer Science to even attempt it.
[It makes sense when you realize that upon exiting the subroutine, the stack is cleaned and therefore the destructor is called. But we're returning a value that is on that stack ... so the value needs to be copied to another place, first].

Of course, I should've used the std::fstream class to begin with ... but std::fstream doesn't give me fdopen() nor popen(). It feels so unfinished. The documentation doesn't say whether its destructor closes the stream. I didn't feel like fixing fstream, so I went with FILE*.

C++ is a nice language but sometimes trivial things can get quite complex. Situations like this remind you that coding in C++ is really hard.

Saturday, September 8, 2012

What if Python had a strict mode

Python, our most-beloved byte-code interpreted language, recognizes only a few errors during compile time. Being a dynamic language, it simply doesn't see errors until the interpreter tries to run the code, only to find out that it isn't going to work, and throws an exception. This delayed nature of giving an error can be very frustrating. A Python code may run alright for weeks and then suddenly go BOOM as it runs into an exception. Wouldn't it be nice if Python could catch trivial programming errors at compile time? What if Python had a strict mode?

Perl, ‘that other’ byte-code interpreted language, has a strict mode in which the compiler is, well, more strict about code constructs that may well be programming mistakes. In Python, there is no such thing as a strict mode. But suppose there was, what would it be like? It would surely be nice if the following could be caught at compile-time.
  • syntax errors, including indentation errors;
  • undeclared local variables: this expresses the need for a local keyword that declares local variables. Alternatively, Python could have an operator := that declares and assigns a new local variable, like Go has. (In addition, do away with the global keyword. New variables outside a function should be treated as global in that module or class scope);
  • stricter type checking: reusing a variable as a variable of a different type is an error. This would eliminate the bad coding style of reusing the same variable name for a different purpose;
  • type errors: certain operations can't be done on certain types; e.g, the divide operation does not exist for the string type;
  • format string checking: check the number of arguments for format strings;
  • format string checking: check the types of the arguments for format strings.
Note that these suggestions would change Python from a dynamically typed language to a statically typed one — which is something we would like to avoid. On the other hand, it paves the way to more far reaching changes:
  • strict checking of function parameter types: function parameters should specify type;
  • strict checking of function return types: functions should have a return type.
Python does have typing, but you seldom explicitly state the type of a variable. There are a few cases where you do explicitly specify the type. For example, when creating a new class: class MyType(object) declares a class of type MyType with parent class object. When catching an exception, you specify what type of exception: e.g. except KeyError.

Wrapping up
Python is a dynamically typed language. This is one of its key strengths; you can do things in Python which are not possible in other languages due to their typing restrictions. The dynamic nature of the language allows for quickly throwing some code together. However, particularly for maintaining larger codes this becomes more of a problem. Python does not offer the option of having a strict mode, like there is in Perl. I feel that Python could benefit from having a strict mode.

A lint-like tool for checking your Python code is PyChecker.

Sunday, August 26, 2012

Crash course in ncurses

The UNIX terminal is an archaic text interface to the computer. It evolved from teletype machines, which were essentially keyboards with a line printer attached. Any text I/O would be directly printed on paper. Later, the paper was replaced by a monitor. If you want to create a terminal program that has a text interface anything fancier than just lines of text scrolling up the screen, you are likely to go with ncurses. ncurses is a library for creating text mode user interfaces.

A word of warning, like with all things that are new, learning to use ncurses is very much a step by step process. ncurses is not a magical tool for easily crafting great UIs. It has the same 1970s style interface as the rest of the UNIX that we all love.
In this post I will throw a lot of library function names at you with only a minimal description of what they do. Trust me, this is all you need to get you kickstarted with ncurses, but do check out the man pages afterwards to get more deeply involved.

Start off by including <ncurses.h>. First thing in main(), initialize the library with a call to initscr(). To deinitialize, call endwin(). You should always call endwin() upon program termination to prevent leaving the terminal in a state that the UNIX shell (or rather, the user) doesn't like. By the way, why endwin() isn't named deinitscr(), we may never know.

initscr() initializes a global variable in the library named stdscr. It's a pointer to an ncurses window that represents the terminal screen. It is not really the terminal screen, but think of it as a back buffer; you do your operations on the back buffer, then call refresh() to get output to the screen. Because of this, do not use printf() to display text. Instead, use the wprintw() function to print to a window and call refresh() at an appropriate time afterwards. To print text at a fixed position on the screen or window, use mvwprintw(). You can print text with attributes and in other colors using attron() and attroff(). For example, attron(A_BOLD) enables the bold attribute. Clear the screen with clear(). To get the screen dimensions, call the macro getmaxyx(stdscr, height, width).

The cursor can be freely controlled. You can hide or show it using curs_set(visible), and you can move it around with move(y, x).

You can get input key presses with getch(). But before that, you should put the terminal in the right mode, usually during program initialization. Normally, the terminal is in cooked mode, meaning that a lot of processing has already been done before the pressed key was passed on to the running process. For example, the user might suspend the process by hitting Ctrl-Z.
By default, input is line buffered. This means that the user has to hit return before the key is passed to the process. To keep this from happening, set cbreak mode (break after each character). Setting cbreak mode is easy: call cbreak().
getch() will echo the characters to the screen. If you don't want this, call noecho(). What's really nice, getch() responds to nearly all keys on your keyboard, including the arrow keys, page up, page down, and so on. The symbolic constants for keys are named KEY_xxx. They should be in the man page for curs_getch or do a grep for "KEY_" in /usr/include/ncurses.h.
The function keys are not enabled by default. Set keypad(stdscr, true) to enable them.
getch() will implicitly call refresh() as necessary.

Call raw() to set the terminal in raw mode. In raw mode, interrupt and flow control characters like Ctrl-C, Ctrl-Z, and the like behave like any other key press without producing a signal. My personal preference is to set cbreak, noecho, and keypad, but there are cases in which raw mode is well-suited too.
You may be familiar with the tcsetattr() function that manipulates the terminal mode. Word of advice: don't bother using it—the ncurses functions are easier and more comprehensible.

ncurses allows you to work with windows. A window is a rectangular screen area. Allocate a new window with newwin(), and deallocate it with delwin(). To neatly close a window and erase its content, use wborder() and call wrefresh(), before finally deallocating it with delwin().
Subwindows are created using derwin() [as in ‘derived window’]. I prefer derwin() over subwin() because derwin() uses coordinates relative to its parent window.
Subwindows are windows just like any other ncurses window, they are all of type WINDOW pointer. Again, deallocate with delwin().

Windows are confined to the screen area. To create windows larger than the screen, or even off-screen windows, use pads. A pad window is allocated using newpad(), and deallocated using ... delwin(). Because a pad can only have a small visible portion onscreen, you should use prefresh() rather than wrefresh() to redisplay pad windows.

This is ncurses in a nutshell. It should be just enough to get you started. Don't forget to call refresh()! Frankly, I just couldn't get used to ncurses odd parameter order of "h,w,y,x" so I wrapped everything in my own interface. People say it's better to learn the standard API, but hey. ncurses is not exactly UIKit for terminal based apps. But fair enough, it's fun being able to handle the arrow keys in a UNIX program, and to print text at any screen position without having to write ANSI escape sequences.

For a more elaborate instruction with code examples and everything, please see the NCURSES Programming HOWTO at The Linux Documentation Project.

Saturday, August 11, 2012

Multibyte and wide character strings in C (addendum)

Last time I wrote about multibyte character strings in C, and said that the easiest way to deal with them is to convert them to wide character strings. Unfortunately, there is an issue with the wide character wchar_t type; it just so happens that its size is 32 bits on UNIX (and alike) platforms, while it is only 16 bits wide on the Windows platform. On UNIX mbstowcs() converts to a UTF-32 string, and on Windows mbstowcs() converts to a UTF-16 string. What this means is that everything I talked about in last week's post is quite alright on UNIX and not so cool on Windows. I'm a UNIX programmer and I don't work on Windows, but I do care about portability across platforms, and the wchar_t is hopelessly broken across platforms.

So, what is going on? The C standard actually says that the size of wchar_t is compiler dependent, and that portable code should not use wchar_t. Oddly enough, C does provide a complete set of functions for handling wide character strings (!). Since wchar_t is not defined as a portable type, then a) how are we supposed to work with strings and unicode, and b) what is it doing in the standard in the first place.
The origin of the problem stems from the fact that Unicode started out with just 16 bits code points, but later realized they were going to need a few bits more. Hence the jump to 32 bits. By that time, Microsoft was long happily using a 16 bits wchar_t. When others started supporting unicode, they implemented wchar_t as a 32 bits value so it could hold a single UTF-32 character. Portability ended right there.
Consequently, wchar_t was a good idea that turned out as a failure.

Today, if you want to work around this problem, you are going to have to work with uint32_t for characters and roll your own string functions, including your own UTF-8 encoding and decoding functions. It's pretty sad. There is a bit of good news on the horizon; the proposed ISO C11 standard includes two new data types: char16_t and char32_t and associated conversion functions. Missing however are string handling functions for these types. Basically, it is discouraged that you use strings in their UTF-32 form.
There is no compiler today that implements C11. These new character types are also present in C++11, and recent versions of the g++ and clang++ compilers do support them.

Tuesday, August 7, 2012

Multibyte and wide character strings in C

Over a century ago, man transmitted messages over a wire using a four bit character encoding scheme. Much later, the ASCII table became the standard for encoding characters, using seven bits per character. ASCII is nice for representing English text but it can't work well for other languages, so nowadays we have unicode. Unicode defines code points (numbers) for characters and symbols of every language there is, or was. Documents aren't written in raw unicode; we use UTF-8 encoding.

UTF-8 encoding is a variable length encoding and uses one to four bytes to encode characters. This means some characters (like the ones in the English alphabet) will be represented by a single byte, while others may take up to two, three, or four bytes.
For C programmers using char pointers this means:
  • strlen() does not return the number of characters in a string;
  • strlen() does return the number of bytes in the string (minus terminating nul character)
  • buf[pos] does not address a character;
  • buf[pos] addresses a byte in the UTF-8 stream
  • buf[32] does not reserve space for 31 characters;
  • buf[32] might reserve space for only 7 characters ...
  • strchr() really searches for a byte rather than a character
If you want to be able to address and manipulate individual characters in multibyte character strings, the best thing you can do is converting the string to wide character format and work with that. A wide character is a 32-bit character.

The operating system is configured to work with a native character encoding set (which is often UTF-8, but could be something else). All I/O should be done using that encoding. So if you do a printf(), print the multibyte character string.

During initialization of your program (like in main()), set the locale. If you forget to do this, the string conversion may not work properly.
setlocale(LC_ALL, "");
Converting a multibyte character string to a wide character string:
mbstowcs(wstr, str, n);
Converting a wide character string back to a multibyte character string:
wcstombs(str, wstr, n);
One problem with these functions is estimating the buffer size. Either play it safe and assume each character takes four bytes, or write a dedicated routine that correctly calculates the needed size.

It's fun seeing your program being able to handle Chinese, Japanese, etcetera. For more on the subject, these two pages are highly recommended:

Monday, July 30, 2012

oolib: link with -loo

Just when it started to look like July was becoming the ‘month of Go’, I made a dramatic shift to C++. I don't even like C++ very much, but in fact Go made me go back to defeat an old enemy. Once upon a time I decided that programming in C and C++ should be easier [than it is] and set out to develop a library that allows you to have safe strings, automatically growing and bounds checking arrays, and dictionaries (associative arrays) in C. The holy grail of programming: the simplicity of Python but with the power of C. It turned out to be harder than I imagined, and resulted in thousands of lines of library code that were never actually meaningfully used. At some point I realized that such a beast is better implemented in C++ than C, and continued the holy mission, only to fail. And then came the intermezzo with Go.

Go is a funny language. There is just something about it that makes you think differently about programming. Go doesn't care about object orientation, it only matters if a class (or struct) implements the right functions to adhere to a protocol. Another thing Go does right is multithreading and communication using channels. It makes things easy. A lot of things in Go are easy, yet powerful.
It does have its quirks, I don't find Go code as easy to read as it should be. Go code typically has a lot of typecasts (or type conversions) because of its strict typing. And not totally unimportant, my colleagues don't like Go because they don't know it and stubbornly don't want to know it. That may be their problem, but it's also a problem of Go.

Back to libobjects, which now had three implementations in plain C and two others in C++. But something was wrong with it, and that something was the complexity of loose typing. The whole library leaned on the Objective-Cish idea that everything is a reference counted Object, and that arrays and dictionaries would be able to hold any type, just like they do in Python and in Objective-C. But looking at Go, it seems you can do perfectly without loose typing. In Go, arrays, maps, channels all are defined explicitly for a single type. And that's okay, things only get better with strict typing.

I took the golden advice of a co-worker that the library should be implemented using STL. I don't like writing large programs using STL, but it's chockfull of convenient standard templates perfectly fit for the job.
So for oolib, the new incarnation of the object library, I settled with C++'s template syntax: Array<int> is an array of integers and Dict<String> is a dictionary of strings. Add the Python string interface and stir in a bit of goroutines and channels, and oolib looks pretty nice. Just build with a C++ compiler and link with -loo.

Behind the scenes, oolib leans heavily on shared_ptr. The shared_ptr is a shortcut for having reference counted objects, but what's really strange is that you can't write efficient C++ code without it. Now consider that shared_ptr did not exist when C++ first entered the scene.
A consequence is that when you assign an array variable to another array, they both point at the same backing store. That's exactly what happens in Python too, so for now I'll leave it that way.

Performance-wise, oolib delivers what you expect it to: it's faster than Python, more fast than Go, and a fraction slower than pure C code.

A snippet of oolib code (that shows only a few features):
#include "oolib.h"

using namespace oo;

int main(int argc, char *argv[]) {
    if (argc <= 1) {
        print("usage: listdir [directory]");
        return 1;
    }

    Array<String> a;

    if (listdir(argv[1], a) < 0) {
        perror("listdir");
        return -1;
    }

    print("%v", &a);

    foreach(i, a)
        print("%v", &a[i]);

    return 0;
}
Finally, it begs the question whether this lib will be meaningfully used. Only time will tell, there are probably a zillion of personal libs like this one out there, and I'm glad to have mine. Mission accomplished.

Sunday, July 15, 2012

Go channels in good old C

Last week I wrote about goroutines and channels, and said that you could do the same thing in C by using pthreads and pipes. That's not exactly true. Although a pipe is well-suited for sending data from one thread to another, like in a parent-and-child-process situation, it's not going to work with multiple readers and writers. Well, how about a socketpair? No (!), same problem here. To implement something like a Go channel in C, you have to put in some extra effort.

The Go channel is a queue of items. Multiple threads may access the queue, and it is guaranteed that the concurrent access is safe and free of race conditions. In C you would implement such a queue as an array and include a mutex lock for doing mutual exclusion.
typedef struct {
    size_t max_items, num_items, item_size;
    void *items;
    pthread_mutex_t mutex;
    pthread_cond_t cv;
} chan_t;

chan_t *make_chan(size_t item_size, size_t max_items);
void read_chan(chan_t *c, void *restrict item);
void write_chan(chan_t *c, const void *item);
void close_chan(chan_t *c);
A channel can be initialized to hold any type. In standard C, "any type" means using a void pointer. Mind that because of this, the interface is not fully type-safe. It is up to the programmer to use the channel in the correct way. In C++ you would use a template and have type-safe code.

The items pointer points at an array of items for a buffered channel. For an unbuffered channel, set max_items to one. The mutex and the condition variable work together to take care of the locking. Reading from the channel will wait on the condition, while a write to the channel will signal the condition. Since C does not do automatic garbage collection, close_chan() will deallocate the channel. Certainly, close_chan() should be called in one thread only.

The full code is only like a hundred lines. Not too big, but too big to include here. With this code you can have threads easily communicate just like goroutines communicate over channels in Go. Having channels in C is nice. The main program code, the code that really matters, is so much easier to understand with channels as there are no mutex locks cluttering the code anymore. Now it's also possible to use Go idioms like waiting for all threads to finish using a channel (by writing a value of true to a chan bool named done).

You can write anything you like in C. Still, channels in Go are more powerful than what I presented here. In Go, you can select on channels. Mimicking Go's select in C is not easy ... (grinds teeth). But why bother with C anyway...? It's time to Go.

Sunday, July 8, 2012

Golang: The Mach programming model in Go

Last week's post was only a prelude to what I really wanted to show you is possible using Go. Something that I call the Mach programming model.

In the Mach microkernel, core operating system functions such as memory management, disk management, file system handling, device management, and the like run as separate tasks. These tasks communicate with each other by sending messages to ports. The Go programming language uses a similar concept, and allows goroutines to communicate through channels. This allows you to write programs that work much alike the Mach microkernel.

The basic idea is that you break your program up into services. A service manages a resource. The service is being run by a manager. You can get service by issuing a request. The request is communicated through a port, or in the case of Go, a channel.
const (
    DoSomething = iota
    DoSomethingElse
    DoSomethingNice
    DoSomethingCool
)

type Message struct {
    Code  int
    Args  []interface{}
    Reply chan *Message
}

func DoRequest(manager chan *Message, code int, args ...interface{}) (reply *Message) {
    reqArgs := make([]interface{}, 0)
    for _, a := range args {
        reqArgs = append(reqArgs, a)
    }
    replyChan := make(chan *Message)
    manager <- &Message{code, reqArgs, replyChan}
    reply := <-replyChan
    return
}

func ManagerGoRoutine(in chan *Message) {
    for {
        req := <-in

        switch req.Code {
            // handle request
            // ...
        }

        req.Reply <- &answer
    }
}
So, what is going on here? A request is a message that has a request code, which is a constant (that is easily enumerated using iota). Additionally, the request may have a variable number of arguments. Arguments can be of any type you like. The request also includes a reply channel, on which the requestor will receive an answer. The request is written to the service manager's channel.
All the manager does is sit in an infinite loop in its main goroutine answering requests. It sends the answer to the Reply channel that is in the request. The reply is also of type Message, so that it can have all kinds of replies through the Args field.

Try doing this in plain C — it's hard. In Go, you practically get everything for free. In good old C you can do this using pthreads and pipes. Pipes aren't nearly as easy to use as channels. And that trick we pulled with the request's arguments, that's near impossible to do in C. Sure there is va_list but it's limited; you can't make an array with varying typed elements in C.
As a solution in C you might pass the address of the va_list through a pipe, which is scary because the va_list points into the stack memory of the requesting thread. It's a recipe for stack corruption. Now, because the requesting thread immediately blocks and waits for a reply, this just might work, but it's hackish. In Go however, the code is easy and clean, and note that all of this is possible even though it's a strictly typed language.

In the above code, the manager switches on a request code. You might as well include a function pointer in the request and call that. Now the question rises, why not call the function directly anyway? You would have to use resource locking, and things would work the ‘traditional’ way.
The answer is that the Mach programming model evades the traditional approach on purpose. It is another way of looking at large codes in which a variety of resources are managed. It's a microkernel design rather than a monolithic one. It models the program flow as an information flow.
This different way of thinking gives a level of abstraction that leads to easier to understand code, better code maintainability, and (hopefully) fewer bugs.

Ultimately, the Mach kernel for operating systems was considered a failure because it yields lower performance than its monolithic counterpart. Nevertheless it remains an interesting concept and you might as well use it in places where you don't need that ultra-high performance. You can couple this model with asynchronous networking code and then it really starts to shine.
What used to be hard in C, in Go you get for free.

Sunday, July 1, 2012

Golang: Master/Worker in Go

The master/worker pattern is used to get an amount of work done by a number of workers. Each worker grabs an item from a work queue and does the work on that item. When the worker is done, it will grab the next item from the work queue, and so on until all the work has been done. 
The cool thing is that all the work can be done in parallel 
(if the work items have no dependencies on each other). 
The speedup practically scales linearly with respect to the number of CPU cores used to run the workers. The master/worker pattern can be implemented on a distributed/cluster computer using message passing or on a shared memory computer using threads and mutexes.

Implementing master/worker for a shared memory system in Go is a doddle because of goroutines and channels. Yet I dedicate this post to it because it's easy to implement it in a suboptimal way. If you care about efficiency, take this to heart:
  • Spawn a worker per CPU core beforehand. If you spawn a worker per item, you are spawning too many threads. No matter how cheap spawning a thread may be, spawning fewer threads is cheaper.
  • It's a shared memory model. So pass pointers rather than full objects.
  • The workers never signal when they're done. They don't have to. Instead, the master signals he is out of work when all work has been done.
Lastly, there is no point in making the master a goroutine by itself. The master does fine running from the main thread.

So, let's code. The function WorkParallel processes all the work in parallel. Capital Work is a struct that represents a single work item, lowercase work is an array (slice) that holds all the work to be done. The work queue is implemented using a channel.

func WorkParallel(work []Work) {
    queue := make(chan *Work)

    ncpu := runtime.NumCPU()
    if len(work) < ncpu {
        ncpu = len(work)
    }
    runtime.GOMAXPROCS(ncpu)

    // spawn workers
    for i := 0; i < ncpu; i++ {
        go Worker(i, queue)
    }

    // master: give work
    for i, item := range(work) {
        fmt.Printf("master: give work %v\n", item)
        queue <- &work[i]  // be sure not to pass &item !!!
    }

    // all work is done
    // signal workers there is no more work
    for n := 0; n < ncpu; n++ {
        queue <- nil
    }
}

func Worker(id int, queue chan *Work) {
    var wp *Work
    for {
        // get work item (pointer) from the queue
        wp = <-queue
        if wp == nil {
            break
        }
        fmt.Printf("worker #%d: item %v\n", id, *wp)

        handleWorkItem(wp)
    }
}

There is more than one way to do it, and I can imagine you wanting to rewrite this code to not use any pointers in order to increase readability. Personally I like it with pointers though because of the higher performance. Whether you actually need this performance is another question. Often it is largely a matter of opinion, even taste. In fact, Go itself isn't all that high performing. But if you want to push it to the max, then by definition, the pointer-based code will outperform the code without pointers.

Sunday, June 3, 2012

OS X Game Development Using Cocoa NSOpenGLView

Every once in a year or so I get an uncontrollable urge to write a game. I like classic arcade games, the kind where you have a spaceship and zap alien monsters. Traditionally you could draw sprites by copying tiles of pixels to screen memory, nowadays we use OpenGL. If you want to do this on the Mac, use Cocoa and its NSOpenGLView class.

Coming from the Linux world, I became somewhat attached to the SDL library. SDL is great but under OS X, it doesn't feel native and the end product isn't as good. The differences are in little things like the application menu, app icon, and support for different keyboard layouts. So let's just use OS X's native Cocoa layer and make a great game.

First off, Cocoa is an API for the Objective-C programming language. Since Objective-C can be mixed with plain good old C, we can write the entire game in C and have the display be a front end written in Objective-C and using NSOpenGLView.
If you google around for "NSOpenGLView tutorial" or "mac opengl" you'll find a lot of old code and horror stories. Using OpenGL on the Mac used to be much harder than it is today. Let's get started.

Making an OpenGL capable view
Like always in Cocoa, create your own new view class derived from an already existing view class:
@interface GameView : NSOpenGLView

@end
In Interface Builder (IB) draw an NSOpenGLView into the main window and change its class to the GameView class that we just made. Be sure to enable double buffering in the Attributes Inspector.
In GameView.m, implement this stretch of code (see below for explanation):
@implementation GameView

- (void)prepareOpenGL {
    init_gl();

    // this sets swap interval for double buffering
    GLint swapInt = 1;
    [[self openGLContext] setValues:&swapInt forParameter:NSOpenGLCPSwapInterval];
   
    // this enables alpha in the frame buffer (commented now)
//  GLint opaque = 0;
//  [[self openGLContext] setValues:&opaque forParameter:NSOpenGLCPSurfaceOpacity];
}

- (void)drawRect:(NSRect)dirtyRect {
    glClear(GL_COLOR_BUFFER_BIT);   
    glLoadIdentity();
  
    draw_screen();
   
//  glFlush();
// the correct way to do double buffering on Mac is this:
    [[self openGLContext] flushBuffer];
   
    int err;
    if ((err = glGetError()) != 0)
        NSLog(@"glGetError(): %d", err);
}

- (void)reshape {
//  NSLog(@"view reshape {%.02f %.02f}", [self frame].size.width, [self frame].size.height);
   
    // window resize; width and height are in pixel coordinates
    // but they are floats
    float screen_w = [self frame].size.width;
    float screen_h = [self frame].size.height;

    // here I cast floats to ints; most systems use integer coordinate systems
    screen_resize((int)screen_w, (int)screen_h);
}

- (BOOL)acceptsFirstResponder {
    return YES;
}

- (void)keyDown:(NSEvent *)theEvent {
    if ([theEvent isARepeat])
        return;
   
    NSString *str = [theEvent charactersIgnoringModifiers];
    unichar c = [str characterAtIndex:0];
   
    if (c < ' ' || c > '~')     // only ASCII please
        c = 0;
   
    key_down([theEvent keyCode], c);
}

- (void)keyUp:(NSEvent *)theEvent {
    key_up([theEvent keyCode]);
}

@end
This code unites the Objective-C API with the standard C code. The pure C init_gl() function should set up the projection and modelview matrices and other OpenGL parameters just like before with SDL, GLUT, GLFW or any other library. Likewise, screen_resize() should call glViewport() to update OpenGL's viewport.

As you can see, some things are a little different on the Mac, like having to enable swapping for double buffering. If you don't do this, you won't see anything being displayed.
Also note the keyUp and keyDown event handlers. Remember the SDL event loop? This is hidden in Cocoa, already built-in. All you do is write the event handlers. You might also add mouse event handlers.

Frame rates and timings
Frankly, the way that screen redraws work under Cocoa was a little mind-bending for me in the beginning. With SDL you just set up a main loop, redraw the screen, do game mechanics, and call SDL_Delay() to sleep some milliseconds for getting the frame rate right. In Cocoa, you can not have a main loop because it would interfere with the invisible (it's hidden!) main event loop. So to get a frame rate going you have to set up a timer that periodically updates the screen. But rather than just that, the frame rate timer has to drive the entire game: do animations, do game mechanics, and finally tell Cocoa to redraw the screen.

To get a super consistent frame rate in SDL I would actually take into account the time passed since the last loop. In Cocoa, the timer fires just like you specified (its accuracy is good enough) but rather than having a frame rate, it's the rate at which you drive the game. The end result is practically the same. You can't really control the frame rate anyway because OS X decides what happens on the display — and it does a nice job too, no need to worry about flicker or tearing whatsoever.

Setting up a timer sounds easy enough, but there is more to it. When the window is minimized or goes out of focus, you will want to stop the timer to freeze the game. In Cocoa you can catch these window events by implementing the NSWindowDelegate protocol. So by adding the timer and the protocol to the GameView class, we get exactly what we want.

In GameView.h change the class declaration to:
@interface GameView : NSOpenGLView<NSWindowDelegate>
In GameView.m add this code:
static NSTimer *timer = nil;

- (void)windowDidResignMain:(NSNotification *)notification {
//    NSLog(@"window did resign main");
    [timer invalidate];
   
    game_deactivate();      // freeze, pause
    [self setNeedsDisplay:YES];
}

- (void)windowDidBecomeMain:(NSNotification *)notification {
//    NSLog(@"window did become main");
   
    game_activate();
    [self setNeedsDisplay:YES];
   
    timer = [NSTimer timerWithTimeInterval:FRAME_INTERVAL
               target:self
               selector:@selector(timerEvent:)
               userInfo:nil
               repeats:YES];
   
    [[NSRunLoop mainRunLoop] addTimer:timer forMode:NSDefaultRunLoopMode];
}

- (void)timerEvent:(NSTimer *)t {
    run_game();
    [self setNeedsDisplay:YES];
}
This code ties an NSTimer to a pure C function run_game()that will do a single run of the game "loop". Next we ask Cocoa to redisplay the view by issuing setNeedsDisplay:YES. Cocoa will pick this up and send a -drawRect: message, which will call draw_screen(). Realize that this code redraws the screen on every frame, which is good for arcade action games.
When the window is minimized or goes out of focus, the game is put into a paused state and the timer is stopped. We ask for one more redraw so that we can show a nice pause screen.

To be able to detect when the window goes out of focus, we need to tell Cocoa that the GameView should receive windowing events, otherwise it doesn't see them. Put this in AppDelegate.h:
#import "GameView.h"

@property (assign) IBOutlet GameView *gameview;
Put this in AppDelegate.m:
@synthesize gameview;

- (void)applicationDidFinishLaunching:(NSNotification *)aNotification
{
    // Insert code here to initialize your application

    [[self window] setDelegate:[self gameview]];
}
In IB, connect the GameView to the gameview outlet we created. (Or use the Assistent Editor). Now when the window goes out of focus, the application will send that event to the view, and our timer will stop and the game will enter paused state.

One more thing
I like it when games can toggle between windowed and full screen mode. Especially for small, simple arcade games it adds to the experience. In code, toggling screen mode is always a bit of a hassle. Well, it was until OS X Lion added support for full screen apps. In XCode 4 you can select the application window in IB and select "Primary Window" for Full Screen in the Attributes Inspector — and that is all. If your screen_resize() function is working properly, it just works.

Concluding
As always, it takes some effort to get Cocoa going. But when it does, you get some really nice things in return. Having the game integrate with OS X adds so much value. It's simple things like having a decent About dialog, properly supporting the native keyboard layout, having the standard full screen apps button in the window title bar. Suddenly the game looks and feels like a native OS X app.
This concludes this tutorial on OpenGL under OS X. Read on for two more musings in case you haven't had enough yet.

Notes on portability
Cocoa is OS X-only so we break portability with other platforms by choosing NSOpenGLView. However, since the entire game is written in C and only the front end is "Mac native" you only have to port the front end to other platforms.
You might also choose to write the entire game in Objective-C. It's a nice language for developing games, but do note that today Apple's Objective-C 2.0 does not port nicely to other platforms — other than iOS (and even Cocoa apps do not port nicely to iOS). Sticking with C or C++ ensures your core game code is easily ported.

Why OpenGL?
In the earlier days of computing, there was no such thing as a graphics processor. There was a video chip that put a region of memory (the screen pixel buffer) onto the monitor display at the right refresh rate. Drawing sprites was done by copying tiles of pixels into the screen memory region. That was OK with screen resolutions like 320x200 or 320x240, but on today's screen resolutions you would run the CPU hot just by copying pixels. So nowadays we have hardware accelerated graphics and we use OpenGL to tell the video hardware what to display. The OpenGL library is like the standard C library for displaying graphics. Moreover, it's tailored to the graphics processing pipeline that is built into the hardware. So even if you're just doing 2D graphics, please use OpenGL. It's hardware accelerated and your games won't melt the CPU because it's the video hardware that is doing the hard work.

Monday, May 14, 2012

Hypothetical Machine Language

Last September, I wrote about virtual machines and CPU emulators, and described a virtual machine, or simulator for a non-existing CPU. It was a bit whacky, but then in April there suddenly was the Notch DCPU-16 specification. Many people jumped on it and the first implementations have already sprung up at GitHub. It inspired me to go back to my own virtual machine and completely redesign its instruction set.

Notch's DCPU-16 is a fascinatingly simple (and therefore easily implemented in a simulator) and I'm somewhat surprised that it works at all. For example, there is not even a carry flag. A lot of instructions appear to be missing [but it turns out you can do without them ... with some trickery]. There are instructions for conditional execution rather than the usual compare and branch. Push and pop are not instructions, but some kind of operator.
The registers are named A, B, I, X, Y, which sounds a lot like an 8-bit Z80 so I guess that's where he got his inspiration from. If you think about CISC or RISC then DCPU-16 is a MISC (minimal instruction set computer). Anyway, Notch is writing a game which could be a reason for keeping things simple.

The master, the teacher
Purely coincidentally, Donald Knuth updated his webpage last September with a message that after about 12 years of work his hypothetical machine language MMIX had finally reached the frozen state. MMIX defines a 64-bits RISC machine with a massive amount of 256 general purpose registers. His register naming $0-$255 looks a bit odd, but he was inspired by MIPS and SPARC.
Knuth goes far enough so that this machine could probably actually be built, I mean in hardware. There is floating point support and even register banking, virtual memory and page tables are described. Virtual memory is important if you want to be able to run any modern operating system on the CPU. Think booting Linux in the simulator. MMIX is fascinating stuff although still hugely simplified for educational reasons.
A real-world CPU does not have 256 general purpose registers like MMIX has, it's more likely to be something like 16 or 32. Modern CPUs may have hundreds of registers but they are not general purpose registers in the sense that a programmer can access them freely. The registers are dynamically renamed (mapped) to other registers to aid parallelism that arises from out-of-order execution.

Designing your own
When you design your own system, you will find that a lot of time has to be put into designing the instruction set and its instruction encoding. In theory, you can think up whatever set of instructions you like. But then comes the hard part; all these instruction have to be encoded in a sensible way. Simply numbering down opcodes (like in MMIX and DCPU-16) is extremely wasteful and you'll likely run out of available bits in the opcode space when you do. If you want to have a lot of addressing modes and a lot of registers, you will again run out of bits to store all this information in [unless, of course, you are using really long instruction codes]. The game is to come up with an encoding scheme that works nicely and yet provides a versatile instruction set for the programmer to work with.

The three machines
My own design uses a happy mix of the x86, 68000 and ARM instruction sets. Actually, I made several designs, aptly named machine #1, #2 and #3 (just working titles). Machine #1 is a 32-bit computer with 16-bit instruction words. It has load/store architecture but arguably it is not a true RISC system because immediate moves are encoded in three words. Its successor, machine #2, is a 64-bit computer with instructions encoded in still only 16 bit words, but immediate moves may take up to five words. And finally, machine #3 is a 64-bit RISC system in which all instructions have an equal length of 32 bits.

The first two machines each have 16 general purpose registers (named r0-r15) plus a program counter (pc), stack pointer (sp), and status register (sr).  The instructions are encoded in 16-bit words. Most instructions have an encoding that looks like this: oooooaaarrrrRRRR (o = opcode bit, a = addressing mode bit, r = source register bit, R = destination register bit, with MSB on the left side).
The addressing mode says whether to address a single byte, a word, or a 32-bit long word. The 64-bit machine has an additional mode for addressing 64 bits. There are no indirect addressing modes like exist on CISC systems, it's all register-to-register or register-immediate. In immediate mode the source register bits are unused and the value is stored in additional words.

Since sp is not a general purpose register in this design, I added a special code for instructions that manipulate sp. This way, you can still work with sp like you normally would, but the sp register can not be used for every operation. For example, it generally makes no sense to do bitwise operations on sp or to multiply sp with some value. On the other hand, you will want to be able to do adds and subtracts on sp. So there are opcodes that provide this functionality.

Machine #3, the 64-bit RISC system, is totally different from the other two. It has 32 registers where pc is equal to r31 and sp is r30. All instructions are encoded in 32-bit words. Here, most instructions can have three operands (three registers or two registers and an immediate). The instruction encoding uses this property to group many different operations together under a single base opcode. The result is that there is lots of room left for future extensions. This instruction encoding scheme is much like how PowerPC and MIPS work. On the MIPS, the types of instruction encoding have cool names like R-type (three registers), I-type (with immediate), J-type (jump). I must say, the MIPS encoding scheme is near perfect; simple, yet elegant and everything fits.

If you want to know more details on the instruction sets of these three machines, see the references below for a link to the design document.

From completion to perfection
Unless you are designing a minimalistic computer, you have to define an instruction set that is complete, something you can work with.
Operations that are easily forgotten are add-with-carry, sign extending, signed and unsigned multiply and division, and how to enable and disable interrupts. Maybe you want to have something like rdtsc to do high profile timings, or a cpuid instruction that tells what kind of CPU it is. MMIX has a register that holds the serial number.
Modern CPUs all do floating point arithmetic.Then there are things like SIMD instructions, processor virtualization support, enhanced security like trusted execution, and possibly GPU integration. But these are advanced topics that I haven't bothered with.

CPU simulators are fun. Although DCPU-16 and my own work are just toys, it's educational to tinker with what goes on inside a CPU. Other than emulating existing processors, you can even simulate the CPU of the next decade today.

Refs
Donald Knuth's MMIX: a RISC computer for the third millenium
Notch's DCPU-16 specification
My three designs with working title SYSTEM
ARM and Thumb-2 Instruction Set Quick Reference Card
PowerPC User Instruction Set Architecture Book I
MIPS Instruction Coding

Sunday, April 1, 2012

Golang: The Way To Go

Last week the Go programming language reached version 1.0. I toyed with Go for the first time over two years ago. Initially I didn't stick with it. This week I had a second look at Go, and this time around I'm pretty convinced they are on to something. Go is a fascinating language and worth checking out.

For the longest time, I have been searching for a programming language that fits like a glove. It had to be as easy as Python and have the low level abilities and performance of C. I created multiple implementations of libobjects to mimic Python's and Objective-C's abilities in C and C++, but since you can't change the language itself by adding a library, it was never as great as I wanted it to be. In the end I realized that to get what I wanted to have, I would have to create a new language rather than trying to bend an existing one. The creators of Go came to the same conclusion and thus Go was born.

C is an amazingly powerful and fine language with really only a few problems like tedious memory management and buffer overflow problems (which create massive security problems) that originate from the fact that strings are character arrays. C99 brought good old-fashioned C up to date by adding wide character strings, but the original problems stayed in place.
C is a low level language that doesn't do a lot for you, the programmer. C has no standard types for stacks, linked lists, binary trees, hashed maps or whatsoever, nor does the standard C library include them. You can implement them yourself (which is a nice academic exercise and also a lot of work) or you can use an external library like the gnulib, Glib, GDSL, to name a few.
C is a pointer centric language which a lot of people find problematic to work with. Pointer mistakes are often the cause behind hard to track down bugs like random program crashes. Experienced programmers see pointers as a powerful asset though. And power, C has. But you do not get it for free; a typical C program is a thousand lines of code, denoting that you will need to put in a lot of time and effort to get what you want.

Python is byte code interpreted language. Python executes slowly but it's so easy to write in that I love it anyway. It has garbage collection, large number arithmetic, unicode strings, safe arrays, even safe system calls, and you can write object-oriented code if you want to, but this is not forced upon you. Python has introspection and reflection and it can evaluate dynamically loaded Python code in the same context as the code that is already running. (Confusing, but powerful and quite advanced).
Python is perfectly suited for small to intermediate size programming tasks. Over time, I started using it for larger projects and found that it's less well suited for this. One of the reasons is that Python is an interpreted language. The interpreter won't report an error like a simple typo in a variable of function name if it doesn't go down that code path. I've seen Python codes suddenly crash with a stack trace after being in production for months already only because of a typo in the code. Another thing is, if you make a typo in your variable name when assigning to it, the interpreter will see it as a new variable, and your program will misbehave. I wish there was something like a strict mode in which these errors would be caught at compile time. Thirdly, the scope of variables in Python is annoying. Local variables just exist, but global variables must be named explicitly with the global keyword in every function where you use them. It should have been the other way around if you ask me. Or Python should have recognized all caps variables as being global. Things become really confusing when using multiple modules containing globals. The best thing you can do is make a single, separate module containing all the global vars in your project, and import that module in every other module.
Exceptions in Python look beautiful, but at some point it gets old having to catch common exceptions all the time. A Python program typically has a try/except block on every page of code. It isn't so bad, but it does interfere with the program flow, influencing readability. Moreover, exceptions make code execution slower because the try/except block has to be set up and needs to be handled in either case; whether the action raised an exception, or none at all. A language like C, that does not have exceptions, simply uses conditional branching based on an error value, which performs much better.
Python doesn't play nice when you do want to allocate memory, use byte arrays, or need pointers. The language isn't meant for doing low level stuff.

Can we have the best of both worlds? All the good from C and Python, while discarding all the bad from C and fixing the minor issues of Python? For a long time, the answer was no. I started to believe that only an interpreter can do garbage collection because an interpreter can easily see whether a variable is being referenced or not.
Well, now there is Go. At first sight I mistakenly took it for just another Python clone, but now I see that Go is probably exactly the crossover language that I was looking for. The goodness of C and Python, molded into one.
Variables can be created "on the fly" with a special assignment operator. It has automatic garbage collection. It compiles to binary code, and hence it executes fast. The compiler catches syntax errors during the compilation step. The language has strict typing and has no implicit type conversions in order to keep you from making stupid mistakes. There is as little syntactic sugar as possible, e.g. there are no useless semicolons, no do while loop, no header files, no extern declarations. The runtime guarantees that arrays are safely bounded. Strings are a proper string type and not byte arrays. Unicode characters are called runes. There is a standard map type. Pointers may be (and should be) used whenever you need them. You can not use pointers to loop over strings, because strings are not byte arrays. There is an init() function that initializes the package (module) before main() is called. Functions can return multiple values, which is especially nice since you can now return a value and an error condition at the same time. It has a bool type. I've started liking languages that feature a boolean type.

Go goes further than just taking features from other languages. Go includes two special features for parallel programming: goroutines and communication channels. In other languages you typically use the pthread library directly or a wrapper class around it. Thread management is a tedious task. In Go, there is no such thing, but you simply launch a concurrent code path by invoking a function with the go keyword. Communication can be done through a channel, which essentially acts like a UNIX pipe: a reader blocks on the channel until a writer writes something to it. Combined with a special syntax, channels are super easy to use. Writing multi-threaded code has never been easier.

Go is not an object-oriented language like C++ or Java. However, you can add method functions to structs and you can use embedding of structs to mimic inheritance and even multiple inheritance. Note that this would be rather problematic in C; C structs are very 'static'.
Another special feature of Go are interfaces. This is Go's way of adding polymorphism to types. It is called duck typing: "if it quacks like a duck, it is a duck." The fun is that a particular type can be used in a certain situation if it implements the interface. So (like it says in the documentation) if a type can do this, it can be used here. I suppose the idea of interfaces is just like protocols in Objective-C.
The cool part is that in Go there is no class declaration, so you never have to go back and change your header file. You can just add the functions and the compiler will sort out the rest.

Go does have a few quirks that you'll just have to get used to. Semicolons (as a terminator for a statement) do exist, but they are invisible. Generally, you don't type them in, and the compiler secretly inserts them automatically. As a consequence, you can not put an opening curly brace on a line by itself, nor can you close off a static array declaration with a dangling curly brace. This is a fixed coding style, and while I didn't do the former, I'm very used to doing the latter.
Arrays are weird in the sense that they are different from arrays in C. In C, an array is basically the pointer to that array. In Go, when you pass an array to a function, it will copy the entire array onto the stack. That's weird. If you just always use the slice syntax there will be no problem though. Also, append() is not a method like it is in Python, and to extend an array you have to use the "[...]" syntax.
Being a garbage collected language, it is possible to return the address of a local variable. This is actually a great feature, but may be confusing to C programmers.
Channel syntax isn't all that (but it is terse). Why couldn't we simply have methods chan.Read() and chan.Write()? [The answer is probably that Go built-in types do not have methods].
And last but certainly not least, variable declarations are written in an odd, inverted way: first the name of the variable, then the type of the variable. I understand why Go's creators think it's better, and I'll certainly get used to it. But after decades of programming in dozens of languages that have it the other way around, this is just screwing with people's minds. It's like putting the steering wheel of a car on the right side, when everyone is used to having it on the left side. Or the other way around, of course (!)

There is too much to write about Go to put in a single blog post. I hope I've wet your appetite and you now want to learn more. The best way to learn Go is to start doing it. You will find that this language is easy to learn. As always, give it some time to familiarize yourself with the provided standard library and do lots of reading documentation. Especially in the beginning, practice short code snippets. My first few Go programs were simply rewrites of small existing Python scripts.

Two useful links:
Have fun!