Monday, May 14, 2012

Hypothetical Machine Language

Last September, I wrote about virtual machines and CPU emulators, and described a virtual machine, or simulator for a non-existing CPU. It was a bit whacky, but then in April there suddenly was the Notch DCPU-16 specification. Many people jumped on it and the first implementations have already sprung up at GitHub. It inspired me to go back to my own virtual machine and completely redesign its instruction set.

Notch's DCPU-16 is a fascinatingly simple (and therefore easily implemented in a simulator) and I'm somewhat surprised that it works at all. For example, there is not even a carry flag. A lot of instructions appear to be missing [but it turns out you can do without them ... with some trickery]. There are instructions for conditional execution rather than the usual compare and branch. Push and pop are not instructions, but some kind of operator.
The registers are named A, B, I, X, Y, which sounds a lot like an 8-bit Z80 so I guess that's where he got his inspiration from. If you think about CISC or RISC then DCPU-16 is a MISC (minimal instruction set computer). Anyway, Notch is writing a game which could be a reason for keeping things simple.

The master, the teacher
Purely coincidentally, Donald Knuth updated his webpage last September with a message that after about 12 years of work his hypothetical machine language MMIX had finally reached the frozen state. MMIX defines a 64-bits RISC machine with a massive amount of 256 general purpose registers. His register naming $0-$255 looks a bit odd, but he was inspired by MIPS and SPARC.
Knuth goes far enough so that this machine could probably actually be built, I mean in hardware. There is floating point support and even register banking, virtual memory and page tables are described. Virtual memory is important if you want to be able to run any modern operating system on the CPU. Think booting Linux in the simulator. MMIX is fascinating stuff although still hugely simplified for educational reasons.
A real-world CPU does not have 256 general purpose registers like MMIX has, it's more likely to be something like 16 or 32. Modern CPUs may have hundreds of registers but they are not general purpose registers in the sense that a programmer can access them freely. The registers are dynamically renamed (mapped) to other registers to aid parallelism that arises from out-of-order execution.

Designing your own
When you design your own system, you will find that a lot of time has to be put into designing the instruction set and its instruction encoding. In theory, you can think up whatever set of instructions you like. But then comes the hard part; all these instruction have to be encoded in a sensible way. Simply numbering down opcodes (like in MMIX and DCPU-16) is extremely wasteful and you'll likely run out of available bits in the opcode space when you do. If you want to have a lot of addressing modes and a lot of registers, you will again run out of bits to store all this information in [unless, of course, you are using really long instruction codes]. The game is to come up with an encoding scheme that works nicely and yet provides a versatile instruction set for the programmer to work with.

The three machines
My own design uses a happy mix of the x86, 68000 and ARM instruction sets. Actually, I made several designs, aptly named machine #1, #2 and #3 (just working titles). Machine #1 is a 32-bit computer with 16-bit instruction words. It has load/store architecture but arguably it is not a true RISC system because immediate moves are encoded in three words. Its successor, machine #2, is a 64-bit computer with instructions encoded in still only 16 bit words, but immediate moves may take up to five words. And finally, machine #3 is a 64-bit RISC system in which all instructions have an equal length of 32 bits.

The first two machines each have 16 general purpose registers (named r0-r15) plus a program counter (pc), stack pointer (sp), and status register (sr).  The instructions are encoded in 16-bit words. Most instructions have an encoding that looks like this: oooooaaarrrrRRRR (o = opcode bit, a = addressing mode bit, r = source register bit, R = destination register bit, with MSB on the left side).
The addressing mode says whether to address a single byte, a word, or a 32-bit long word. The 64-bit machine has an additional mode for addressing 64 bits. There are no indirect addressing modes like exist on CISC systems, it's all register-to-register or register-immediate. In immediate mode the source register bits are unused and the value is stored in additional words.

Since sp is not a general purpose register in this design, I added a special code for instructions that manipulate sp. This way, you can still work with sp like you normally would, but the sp register can not be used for every operation. For example, it generally makes no sense to do bitwise operations on sp or to multiply sp with some value. On the other hand, you will want to be able to do adds and subtracts on sp. So there are opcodes that provide this functionality.

Machine #3, the 64-bit RISC system, is totally different from the other two. It has 32 registers where pc is equal to r31 and sp is r30. All instructions are encoded in 32-bit words. Here, most instructions can have three operands (three registers or two registers and an immediate). The instruction encoding uses this property to group many different operations together under a single base opcode. The result is that there is lots of room left for future extensions. This instruction encoding scheme is much like how PowerPC and MIPS work. On the MIPS, the types of instruction encoding have cool names like R-type (three registers), I-type (with immediate), J-type (jump). I must say, the MIPS encoding scheme is near perfect; simple, yet elegant and everything fits.

If you want to know more details on the instruction sets of these three machines, see the references below for a link to the design document.

From completion to perfection
Unless you are designing a minimalistic computer, you have to define an instruction set that is complete, something you can work with.
Operations that are easily forgotten are add-with-carry, sign extending, signed and unsigned multiply and division, and how to enable and disable interrupts. Maybe you want to have something like rdtsc to do high profile timings, or a cpuid instruction that tells what kind of CPU it is. MMIX has a register that holds the serial number.
Modern CPUs all do floating point arithmetic.Then there are things like SIMD instructions, processor virtualization support, enhanced security like trusted execution, and possibly GPU integration. But these are advanced topics that I haven't bothered with.

CPU simulators are fun. Although DCPU-16 and my own work are just toys, it's educational to tinker with what goes on inside a CPU. Other than emulating existing processors, you can even simulate the CPU of the next decade today.

Refs
Donald Knuth's MMIX: a RISC computer for the third millenium
Notch's DCPU-16 specification
My three designs with working title SYSTEM
ARM and Thumb-2 Instruction Set Quick Reference Card
PowerPC User Instruction Set Architecture Book I
MIPS Instruction Coding