Monday, July 15, 2013

Programming mkdir -p

It's midsummer so let's do something light and easy for a change. In UNIX there is the mkdir -p command, meaning “create directory and all leading (parent) directories”. It is a convenient command with which you can quickly create a new directory tree. In programming, there is the mkdir() system call. This system call creates new filesystem directories, but it does not, however, automatically create any new leading parent directories for you. An attempt to do so is an error: ENOENT, or “No such file or directory”. How can we programmatically make deep directories?

A naive answer is to call system("mkdir -p my/deep/path"). While not incorrect, it comparatively uses lots of resources, but even if we don't care about that, it poses the interesting question: “Well, how does the mkdir shell command do it?”

A possible approach is to to break the path argument up into pieces, and for each piece create the new subdirectory. Something along these lines:
break up "my/deep/path" into "my", "deep", "path"
mkdir "my"
mkdir "my/deep"
mkdir "my/deep/path"
For example, in Python:
def mkdir_p(path):
    arr = path.split(os.sep)
    newpath = ''
    for elem in arr:
        if not newpath:
            newpath = elem
            if not elem:
                newpath = '/'
        else:
            newpath = os.path.join(newpath, elem)

        if os.path.exists(newpath):
            continue

        os.mkdir(newpath)
It has some checks for correctly handling relative and absolute paths, and it gets the job done. It performs poorly however for long paths that already exist for the most part. For example, for mkdir_p("/home/walter/src/python/ blog/2013/mkdir_p") it would do seven loops already before creating a new directory. Surely we can do better.

If we could work backwards, the performance could be improved. First see if the deep path exists, take a step back, see if that path exists, and if it does, rewind and create the deeper path. This is achieved by recursion.
def mkdir_p(path):
    if not path:
        return

    if os.path.exists(path):
        return
   
    mkdir_p(os.path.dirname(path))
    os.mkdir(path)
The recursive version is wonderfully terse and like often with recursive functions, it may warp your mind and take some effort to grasp.

I confess I used the non-recursive implementation for years. Ironically, Python already has a function to do this:
os.makedirs(path)
I had a peek in the source of Python's os module, and os.makedirs() indeed uses the recursive implementation. Mind that like os.mkdir(), os.makedirs() will throw an exception if the path already exists — unlike the mkdir -p shell command that doesn't complain if the directory is already there.