Monday, June 24, 2013

Object(ive)-oriented programming in plain C

Last time I wrote about a safe string type implementation in a C library, and mentioned liboco. It's a library that provides some fundamental building blocks like a safe string type, a safe array type, and a map (or dictionary). In liboco, everything is a reference counted object, and arrays and maps store objects. liboco takes concepts from Objective-C, and its basic function compares to Foundation Kit. Like in Objective-C, reference counting is done manually using retain and release. It even has a memory pool with which you can deallocate objects in a deferred fashion using autorelease.

How does it work? Plain C is not object oriented, it has no classes, no constructors, no inheritance, etcetera. But we can implement object orientness in C. In fact, the first C++ compiler was a translator that produced a mangled plain C source code as output.

As you might suspect, a safe string type would be implemented as a struct with member fields for a char* pointer to the data and an integer length. Now take a step back, and say that everything should be an object. This means a string would be derived from an object, and inherit the base object's properties. In plain C, we do this by including the struct of the base object in the struct of the derived object.
typedef struct {
    object_t obj;

    char *str;
    int length;
} string_t;
What does the base object type look like? As said, every object is reference counted, so it should include a reference count.
typedef {
    int refcount;
} object_t;
It doesn't seem much, but we've already laid down a basis. We can create custom types and derive them from object. We could even create a new type and derive it from the string_t type, if we wanted to.
There is no automatic constructor, but like in Objective-C when we allocate an object of a certain type, we should initialize it through an init function. This is its constructor.
string_t *s = init_string(calloc(1, sizeof(string_t)));
calloc() zeroes out the memory for us. Alloc and init are two distinct operations. If you combine them into a single function, you'll run into trouble later (when we look at inheritance) so keep them separate.
Since I don't like the syntax of what we now have, alloc and init is wrapped as:
string_t *s = new_string();
Note that every object instance is addressed by a pointer to the object. This isn't strange when you realize that in plain C, strings are char pointers and arrays are referenced by a pointer to the array.
The reference count of a newly allocated object is 1. In a reference counting system, you don't delete instances like you do in C++. Instead, you let go of them by calling release(). Releasing an object instead of bluntly deleting it will start to make sense once you put objects into containers.
release(s);
The release() function lowers the reference count, and when the reference count drops to zero it will destruct the object and free the memory. To keep an object around, increase its reference count by calling retain(). When you put an object into a container, like an array or a map, the container automatically retains the object, which is another way of saying that it holds a strong reference to the object. The container keeps the object around — until the container itself is released.

Functions like retain(), release() work on any type derived from object. But C is a strictly typed language ... so you'd need to typecast down to object_t every time to satisfy the compiler, or else you'd get a ton of warnings. This issue is solved by declaring the functions with void* arguments:
void retain(void *);
void release(void *);
Now, if you don't supply a valid object, the compiler will not complain and these functions will happily trash around. So to make it safe we need some kind of runtime type checking. Wouldn't that be incredibly slow? No, it's just a single if-statement. We actually already need to know the type of the object for another reason: when the reference count drops to zero, how would release() know what destructor function to call? It knows because every object holds run-time type information (RTTI).
typedef struct {
    objtype_t *objtype;
    int refcount;
} object_t;
Mind that in C++, the compiler holds type information at compile time. In C++, RTTI may or may not be available during run-time depending on whether the class is polymorphic and whether RTTI is enabled during compilation. What's fun is that RTTI also allows us to easily implement type introspection.

What information is in the object type structure? It holds a pointer to the destructor function. release() will call this function before freeing the allocated instance.

Other than just the destructor, we can also include a constructor. Mind that in Objective-C, you have to manually call the super classes init function to correctly initialize an instance of a derived class. This is also the case in Python. In C++, it automatically calls the base class constructors.
At first I actually made it work this way using the default constructor only. There is a compelling case for having manual control anyway, and not including a constructor in the object's type definiton.
It's because of parameters: constructors often take parameters so it makes sense to write them this way. This means a constructor of an object can have any number of parameters and there is no strict declaration that fits all types. Thus, automatic calling of base class constructors goes out the window.
As slight added advance, it will be more clear how to do multiple inheritance.

As said, the whole thing is a reference counting system. So when we copy an array, the contents of the copy point at the same elements as the original, and each element has had its reference count increased by one. In order to make a deep copy, we need a copy constructor. The copy constructor is a common idiom in C++, but it's not as well-known in Objective-C. Objective-C has the NSCopying protocol, and you implement a function -copyWithZone: that returns a newly allocated instance; the copied object.
liboco adds the copy constructor as a special kind of constructor (analog to C++), and likewise it automatically calls any base class copy constructors.

C++11 adds a move constructor, with which you can move values from one instance to another under the hood. This is typically needed for doing return by value efficiently.
We have no need for this feature. The main reason is that objects are pointers, and pointers already efficiently return by value. Another reason is that we do not automatically destroy instances when they go out of scope, like C++ does. It's a manual reference counting system, so the programmer decides when an object is released or not.

Now that we have our own typing system, this gives way to adding a feature that is not present in Objective-C nor C++ (but is present in Python and Go), and it is printing an object. For printing any object we need a method that converts an object to a string.
Additionally, we need a special printing function that calls the string conversion method when requested.
print("object type: %T\n", o);
print("object value: %v\n", o);
It's quite powerful; this allows us to print the value of any type of object with the format string specifier ‘%v’.

It's also possible to add operator functions. Although plain C does not allow redefinition of operators, they can be simulated using function calls. This would be pretty annoying for hand-written codes however, but could be useful for machine generated codes.

I haven't touched upon overloading. It gets close but doesn't become real OOP until it can do function overloading, which enables polymorphism. Function overloading can be implemented by including a table of methods (function pointers) in the object's type definition. Calling a method would resemble this chain:
self->objtype->methods.method(self, parameters);
This is how virtual function calls work in C++. A derived type has a copy of the table of its base class. In Objective-C, it works a little different. There, invoking a method means “sending a message”. What happens is, the table is searched for the method, and if the method is not found, the table of the superclass will be searched. This way, all methods of the superclass are inherited by the subclass.
Both ways can be simulated in plain C, but there is the tiresome work of writing out the virtual tables. Moreover, the first argument to the method would always have to be void* to prevent compiler warnings.

Finally, a word on using liboco. Programming in C with liboco is a lot like programming in Objective-C, and like in Objective-C, you practically leave plain C behind. You are now working with this new API, and the code is full of calls into this library, which is your foundation. Adding a new type for some kind of data structure means implementing it as an object, and writing the destructor, copy constructor, and string representation methods. This can be a drag, but it's part of what liboco is. Having an array and map type at your disposal in which any object can be placed, is a great win. You can quickly prototype something in Python and convert it to C rather easily. Wouldn't it be great if we could do automated machine translation from a scripting language to C and then compile it to native code? It's maybe something to work on at a later time.