Sunday, March 16, 2014

Python multiprocessing: I did it my way

Python has a wonderful multiprocessing module for parallel execution. It's well known that Python's threading module does not do real multithreading due to the Global Interpreter Lock (GIL). Hence multiprocessing, which forks a pool of workers so you can efficiently run tasks in parallel.

In principle:
    p = multiprocessing.Process(target=func)
    p.start()
    p.join()
In practice, I have had weird issues with multiprocessing. Two notable issues:

  • program does not start any processes at all, and just exits
  • program raises an exception in QueueFeederThread at exit

Apparently the multiprocessing module has its problems, and although there are fixes in the most recent versions, I can't rely on that when the users of my program are running on operating systems that don't always include the latest greatest Python.

So, a decision had to be made. I ditched the multiprocessing module and replaced it with my own, that calls os.fork(). The resulting code is much easier to handle than with  multiprocessing, too.
Note that os.fork() does not port to Microsoft Windows. My target platform is UNIX anyway.

Pseudo-code:
    parallel(func, work_array):
        for i < numproc:
            fork()
            if child:
                work_items = part_of(work_array)
                for item in work_items:
                    func(item)

                # child exits
                exit(0)

        wait_for_child_procs()
So, a pool of numproc workers is spawned. Each of the child processes do their part of the work given in work_array. No communication is needed here because fork() causes the child to be a copy of the parent process, and thus getting a copy of work_array.
This is the simplest kind of parallel programming in UNIX. Surprisingly, this bit of code works better for me than the multiprocessing module — which supposedly does the same thing, under the hood.

Having decent language support for parallel programming is of utmost importance in today's world, where having a quadcore CPU is no exception; practically all modern computers are multi-core machines. A modern programming language should offer some easy mechanisms to empower the programmer, enabling you to take advantage of the hardware at hand.
Proper multithreading is exceptionally hard (if not impossible) to do for interpreted languages. This is a fact of life. Using a forking model, you can still get good parallellism.