Sunday, September 4, 2011

Bytecode interpreters, virtual machines, emulators

Lately I've been working on programming a utility that translates more or less natural text to more cryptic firewall statements. In the translation process, I use two passes: the first pass translates to bytecode, and the second pass translates from bytecode to the final output. A huge advantage of using the intermediate bytecode stage is that this makes it possible to easily support multiple kinds of firewalls while having only one syntax for the input. The code for the output generator is completely decoupled from the input parser code. Cross compilers work in the same way. Modern compilers like gcc also work in a different way; they support multiple frontends for compiling different programming languages.

When I was still a student, I once wrote an assembler/disassembler for a nonexistent architecture. While a nonexistent architecture may seem an odd choice, I was inspired by the Java bytecode that executes in a virtual machine rather than on a real CPU. The professor did not give me a straight A only because I had not actually written the virtual machine that would run the code. The instruction set was rather extensive and implementing the full virtual machine would have been too much work for the limited time of the assignment. The feeling of what if always stuck with me however.

An emulated CPU has the following properties:
  • a bunch of general purpose registers
  • a program counter (this is just an address; the instruction pointer)
  • a stack pointer
  • a flags register (with a zero, sign, carry, overflow flag)
  • optionally: interrupt flag or current interrupt level
  • optionally: supervisor mode flag or current privilege level
Emulated interrupts enable you to let the system interact with I/O devices much like it happens in real computers. Whenever an I/O device is ready, it will interrupt the CPU, which will then automatically jump to the Interrupt Service Routine (ISR). The address of the ISR is read from an interrupt vector table that typically resides at a low memory address.
The supervisor mode or privilege levels offer the possibilities of implementing a sturdy “operating system” in the virtual machine, adding a hypervisor, and ultimately having virtual machines running in virtual machines.

The instruction set needs to have load, store, arithmetic operations like add, subtract, multiply and divide, logical operations like and, or, exclusive or, and bitwise operations like bit shifting and possibly bit rotation. Furthermore you need a jump (goto) instruction and a set of instructions for conditional branching, and push/pop to work with the stack. Lastly you could add some privileged instructions like loading the flags register or resetting the machine. At the bare minimum, there are about 25 instructions to be implemented. By comparison, the 8086 CPU has about 75 different instructions.

For a memory model, 64K ought to be enough for anybody ... Or you could give the system very little real memory and implement paging and swapping. Implementation wise, the virtual memory management could take place in either the guest or the host ... (when implemented in the host, the guest OS would never know its memory was being swapped in and out!)
A related issue is protected memory: can the current process write to a given address or not? Early micro-computers did not have support for paging, but they did have read-only memory in the form of ROM. You probably want to have memory mapped I/O, an interrupt vector table and some display memory representing the screen.

One of the problems I've always had with this model is that programming for such an emulated system is hell as you would need to write everything in its assembly language. To overcome this problem, you would really need to make a high level compiler generate the code. Nowadays we have gcc and if you're handy it should be possible to have it generate code for platform X.

Now let us take a step back and get back to bytecode interpreters. Java, Perl, PHP, Python, various kinds of BASIC and even some MUD codes all use a stack machine at the heart of their bytecode interpreters. The stack machine is the simplest kind of processor: it has no registers and its instructions only manipulate a stack. The lack of registers make both compilation and execution (or emulation) easy. Interpreted languages benefit from bytecode because interpreting bytecode is faster than having to go through the language parsing procedures all the time.

A bytecode interpreter for a scripting language is quite different from a full blown system emulator. For a bytecode interpreter, you can make up your own opcodes for any common operation you like. There is no need for an opcode to perform only low level operations; they can be high level operations just the same. For example, Python has opcodes for slicing strings. Until Python 3, print was a statement with an opcode equivalent rather than being a function. It's not cheating, it increases performance.

It's good fun designing your own bytecode interpreting virtual machine. There are a lot design of choices to be made beforehand. Do you want to fully emulate a CPU or not? Maybe you want to emulate any other hardware components? Will you write anything like a system BIOS in bytecode? Are you going to develop a self-hosting system? How far are you willing to take it?