Monday, July 30, 2012

oolib: link with -loo

Just when it started to look like July was becoming the ‘month of Go’, I made a dramatic shift to C++. I don't even like C++ very much, but in fact Go made me go back to defeat an old enemy. Once upon a time I decided that programming in C and C++ should be easier [than it is] and set out to develop a library that allows you to have safe strings, automatically growing and bounds checking arrays, and dictionaries (associative arrays) in C. The holy grail of programming: the simplicity of Python but with the power of C. It turned out to be harder than I imagined, and resulted in thousands of lines of library code that were never actually meaningfully used. At some point I realized that such a beast is better implemented in C++ than C, and continued the holy mission, only to fail. And then came the intermezzo with Go.

Go is a funny language. There is just something about it that makes you think differently about programming. Go doesn't care about object orientation, it only matters if a class (or struct) implements the right functions to adhere to a protocol. Another thing Go does right is multithreading and communication using channels. It makes things easy. A lot of things in Go are easy, yet powerful.
It does have its quirks, I don't find Go code as easy to read as it should be. Go code typically has a lot of typecasts (or type conversions) because of its strict typing. And not totally unimportant, my colleagues don't like Go because they don't know it and stubbornly don't want to know it. That may be their problem, but it's also a problem of Go.

Back to libobjects, which now had three implementations in plain C and two others in C++. But something was wrong with it, and that something was the complexity of loose typing. The whole library leaned on the Objective-Cish idea that everything is a reference counted Object, and that arrays and dictionaries would be able to hold any type, just like they do in Python and in Objective-C. But looking at Go, it seems you can do perfectly without loose typing. In Go, arrays, maps, channels all are defined explicitly for a single type. And that's okay, things only get better with strict typing.

I took the golden advice of a co-worker that the library should be implemented using STL. I don't like writing large programs using STL, but it's chockfull of convenient standard templates perfectly fit for the job.
So for oolib, the new incarnation of the object library, I settled with C++'s template syntax: Array<int> is an array of integers and Dict<String> is a dictionary of strings. Add the Python string interface and stir in a bit of goroutines and channels, and oolib looks pretty nice. Just build with a C++ compiler and link with -loo.

Behind the scenes, oolib leans heavily on shared_ptr. The shared_ptr is a shortcut for having reference counted objects, but what's really strange is that you can't write efficient C++ code without it. Now consider that shared_ptr did not exist when C++ first entered the scene.
A consequence is that when you assign an array variable to another array, they both point at the same backing store. That's exactly what happens in Python too, so for now I'll leave it that way.

Performance-wise, oolib delivers what you expect it to: it's faster than Python, more fast than Go, and a fraction slower than pure C code.

A snippet of oolib code (that shows only a few features):
#include "oolib.h"

using namespace oo;

int main(int argc, char *argv[]) {
    if (argc <= 1) {
        print("usage: listdir [directory]");
        return 1;
    }

    Array<String> a;

    if (listdir(argv[1], a) < 0) {
        perror("listdir");
        return -1;
    }

    print("%v", &a);

    foreach(i, a)
        print("%v", &a[i]);

    return 0;
}
Finally, it begs the question whether this lib will be meaningfully used. Only time will tell, there are probably a zillion of personal libs like this one out there, and I'm glad to have mine. Mission accomplished.

Sunday, July 15, 2012

Go channels in good old C

Last week I wrote about goroutines and channels, and said that you could do the same thing in C by using pthreads and pipes. That's not exactly true. Although a pipe is well-suited for sending data from one thread to another, like in a parent-and-child-process situation, it's not going to work with multiple readers and writers. Well, how about a socketpair? No (!), same problem here. To implement something like a Go channel in C, you have to put in some extra effort.

The Go channel is a queue of items. Multiple threads may access the queue, and it is guaranteed that the concurrent access is safe and free of race conditions. In C you would implement such a queue as an array and include a mutex lock for doing mutual exclusion.
typedef struct {
    size_t max_items, num_items, item_size;
    void *items;
    pthread_mutex_t mutex;
    pthread_cond_t cv;
} chan_t;

chan_t *make_chan(size_t item_size, size_t max_items);
void read_chan(chan_t *c, void *restrict item);
void write_chan(chan_t *c, const void *item);
void close_chan(chan_t *c);
A channel can be initialized to hold any type. In standard C, "any type" means using a void pointer. Mind that because of this, the interface is not fully type-safe. It is up to the programmer to use the channel in the correct way. In C++ you would use a template and have type-safe code.

The items pointer points at an array of items for a buffered channel. For an unbuffered channel, set max_items to one. The mutex and the condition variable work together to take care of the locking. Reading from the channel will wait on the condition, while a write to the channel will signal the condition. Since C does not do automatic garbage collection, close_chan() will deallocate the channel. Certainly, close_chan() should be called in one thread only.

The full code is only like a hundred lines. Not too big, but too big to include here. With this code you can have threads easily communicate just like goroutines communicate over channels in Go. Having channels in C is nice. The main program code, the code that really matters, is so much easier to understand with channels as there are no mutex locks cluttering the code anymore. Now it's also possible to use Go idioms like waiting for all threads to finish using a channel (by writing a value of true to a chan bool named done).

You can write anything you like in C. Still, channels in Go are more powerful than what I presented here. In Go, you can select on channels. Mimicking Go's select in C is not easy ... (grinds teeth). But why bother with C anyway...? It's time to Go.

Sunday, July 8, 2012

Golang: The Mach programming model in Go

Last week's post was only a prelude to what I really wanted to show you is possible using Go. Something that I call the Mach programming model.

In the Mach microkernel, core operating system functions such as memory management, disk management, file system handling, device management, and the like run as separate tasks. These tasks communicate with each other by sending messages to ports. The Go programming language uses a similar concept, and allows goroutines to communicate through channels. This allows you to write programs that work much alike the Mach microkernel.

The basic idea is that you break your program up into services. A service manages a resource. The service is being run by a manager. You can get service by issuing a request. The request is communicated through a port, or in the case of Go, a channel.
const (
    DoSomething = iota
    DoSomethingElse
    DoSomethingNice
    DoSomethingCool
)

type Message struct {
    Code  int
    Args  []interface{}
    Reply chan *Message
}

func DoRequest(manager chan *Message, code int, args ...interface{}) (reply *Message) {
    reqArgs := make([]interface{}, 0)
    for _, a := range args {
        reqArgs = append(reqArgs, a)
    }
    replyChan := make(chan *Message)
    manager <- &Message{code, reqArgs, replyChan}
    reply := <-replyChan
    return
}

func ManagerGoRoutine(in chan *Message) {
    for {
        req := <-in

        switch req.Code {
            // handle request
            // ...
        }

        req.Reply <- &answer
    }
}
So, what is going on here? A request is a message that has a request code, which is a constant (that is easily enumerated using iota). Additionally, the request may have a variable number of arguments. Arguments can be of any type you like. The request also includes a reply channel, on which the requestor will receive an answer. The request is written to the service manager's channel.
All the manager does is sit in an infinite loop in its main goroutine answering requests. It sends the answer to the Reply channel that is in the request. The reply is also of type Message, so that it can have all kinds of replies through the Args field.

Try doing this in plain C — it's hard. In Go, you practically get everything for free. In good old C you can do this using pthreads and pipes. Pipes aren't nearly as easy to use as channels. And that trick we pulled with the request's arguments, that's near impossible to do in C. Sure there is va_list but it's limited; you can't make an array with varying typed elements in C.
As a solution in C you might pass the address of the va_list through a pipe, which is scary because the va_list points into the stack memory of the requesting thread. It's a recipe for stack corruption. Now, because the requesting thread immediately blocks and waits for a reply, this just might work, but it's hackish. In Go however, the code is easy and clean, and note that all of this is possible even though it's a strictly typed language.

In the above code, the manager switches on a request code. You might as well include a function pointer in the request and call that. Now the question rises, why not call the function directly anyway? You would have to use resource locking, and things would work the ‘traditional’ way.
The answer is that the Mach programming model evades the traditional approach on purpose. It is another way of looking at large codes in which a variety of resources are managed. It's a microkernel design rather than a monolithic one. It models the program flow as an information flow.
This different way of thinking gives a level of abstraction that leads to easier to understand code, better code maintainability, and (hopefully) fewer bugs.

Ultimately, the Mach kernel for operating systems was considered a failure because it yields lower performance than its monolithic counterpart. Nevertheless it remains an interesting concept and you might as well use it in places where you don't need that ultra-high performance. You can couple this model with asynchronous networking code and then it really starts to shine.
What used to be hard in C, in Go you get for free.

Sunday, July 1, 2012

Golang: Master/Worker in Go

The master/worker pattern is used to get an amount of work done by a number of workers. Each worker grabs an item from a work queue and does the work on that item. When the worker is done, it will grab the next item from the work queue, and so on until all the work has been done. 
The cool thing is that all the work can be done in parallel 
(if the work items have no dependencies on each other). 
The speedup practically scales linearly with respect to the number of CPU cores used to run the workers. The master/worker pattern can be implemented on a distributed/cluster computer using message passing or on a shared memory computer using threads and mutexes.

Implementing master/worker for a shared memory system in Go is a doddle because of goroutines and channels. Yet I dedicate this post to it because it's easy to implement it in a suboptimal way. If you care about efficiency, take this to heart:
  • Spawn a worker per CPU core beforehand. If you spawn a worker per item, you are spawning too many threads. No matter how cheap spawning a thread may be, spawning fewer threads is cheaper.
  • It's a shared memory model. So pass pointers rather than full objects.
  • The workers never signal when they're done. They don't have to. Instead, the master signals he is out of work when all work has been done.
Lastly, there is no point in making the master a goroutine by itself. The master does fine running from the main thread.

So, let's code. The function WorkParallel processes all the work in parallel. Capital Work is a struct that represents a single work item, lowercase work is an array (slice) that holds all the work to be done. The work queue is implemented using a channel.

func WorkParallel(work []Work) {
    queue := make(chan *Work)

    ncpu := runtime.NumCPU()
    if len(work) < ncpu {
        ncpu = len(work)
    }
    runtime.GOMAXPROCS(ncpu)

    // spawn workers
    for i := 0; i < ncpu; i++ {
        go Worker(i, queue)
    }

    // master: give work
    for i, item := range(work) {
        fmt.Printf("master: give work %v\n", item)
        queue <- &work[i]  // be sure not to pass &item !!!
    }

    // all work is done
    // signal workers there is no more work
    for n := 0; n < ncpu; n++ {
        queue <- nil
    }
}

func Worker(id int, queue chan *Work) {
    var wp *Work
    for {
        // get work item (pointer) from the queue
        wp = <-queue
        if wp == nil {
            break
        }
        fmt.Printf("worker #%d: item %v\n", id, *wp)

        handleWorkItem(wp)
    }
}

There is more than one way to do it, and I can imagine you wanting to rewrite this code to not use any pointers in order to increase readability. Personally I like it with pointers though because of the higher performance. Whether you actually need this performance is another question. Often it is largely a matter of opinion, even taste. In fact, Go itself isn't all that high performing. But if you want to push it to the max, then by definition, the pointer-based code will outperform the code without pointers.