The Instruction Set

Up Next

CAD Project: WIll Crusoe Choke on Apple?

The Instruction Set

Roughly ten years ago it was generally accepted that reducing the size of the instruction set (in order to simplify the processor, reduce die size and enable more space on-die for the register file and, later, cache) was the way forward for processor design. This idea was pioneered by, the now Transmeta CEO, David Ditzel and his colleague David Patterson in "The Case for a Reduced Instruction Set Computer". However, these days the designers of processors are microprogramming very complex operations into processors in order to enable branch-prediction and other techniques that increase the throughput of a processor. In fact, David Ditzel has been paraphrased as saying that: "today's RISC chips are more complex than today's CISC chips". Set in this context it will be interesting to see which of these ethics most closely matches the implementation of the Transmeta Crusoe processor.

The initial picture of Transmeta's IS, is a little confusing. The Crusoe is designed to execute x86 instructions but it does not (internally) know how to handle them. This is not a problem because this is generally the case with many of today's CPUs, for example AMD's K7 CPU. However what is different about the Crusoe is that it has no hardware to translate the x86 instructions into its native instructions. Instead it does this in software, in fact the whole of its IS is implemented in software. This is stored in roughly half a Megabyte of Electrically Erasable Read Only Memory (EEPROM) at a system-suspend. Because the Crusoe does not have specialised hardware to translate these instructions and optimise code on the fly it has to sacrifice CPU time, cache space and other such resources. This may have some performance drawbacks (which I will investigate later on in a more appropriate section of this analysis). Transmeta's decision to solve a problem this way is interesting, because it is revolutionary. It is a huge break from standard practice in the CPU industry to move functional units out of silicon and into software. Transmeta have acknowledged that their implementation may not get the best performance but they do think that the performance hit will be small enough to still make systems with a Transmeta processor fast enough to compete in the less demanding (performance wise) mobile computing market.

Transmeta's Crusoe processor uses an internal instruction format, which is described as a Very Long Instruction Word (VLIW) format. This is because it has a word-length of 128bits. So how can a 128bit processor execute x86 32bit instructions? Well the 32bit instructions are "bundled" into "molecules" of either 128 or 64 bits. These are then passed on to separate functional units simultaneously. This is illustrated in this diagram:


This diagram shows the make-up of one of the Crusoe's "molecules". As can be seen it is made up of four "atoms" (component parts of the Crusoe's instruction word), which are sent simultaneously to the four indicated functional units. In this way (by combining four instructions, of different types, together into "molecules") the Crusoe processor can execute the equivalent of four x86 instructions simultaneously, four instructions per clock. Once the "molecules" arrive at the processor there is no need for the processor to worry about branch prediction or code re-organisation problems as they have already been taken care of by the resident software that translates the x86 instructions into these blocks of Crusoe instructions. I like the sound of this system a lot. It is not as simple as RISC but I think it is very elegant and still simpler than the processors that have been heavily driven by performance requirements (such as the current offerings from Intel and AMD). It is important to remember that one of the design goals of the Crusoe was to make a very efficient processor and with this technique I think this is evidence of a degree of success at this goal.

The code morphing software used by Transmeta, to make the Crusoe possible, re-compiles a (compiled) x86 program into native Transmeta instructions on the fly. I think this is an ambitious scheme. I would have been tempted to put this stage into a compiler so that compiled programs would be already in the Crusoe's native language and would theoretically execute faster (because the code translation and optimisation would have been done at compile time). Although Transmeta's choice to do it on the fly has advantages; no-need to re-compile x86 programs, scope for JAVA like portability i.e. to run programs compiled for an Alpha you need only swap in the correct code morphing software (like with the JAVA virtual machine). It also makes the analysing of code easier to do, as the amount of information able to be stored about a piece of code is not limited by the physical cache size (since its done in software). The code morphing software has a special cache at its disposal. It is called a translation cache and is used to store programs in after they have been translated, to enable them to be accessed again without another translation. The size of this cache can be adjusted by programs (through the Operating System) so that they could theoretically be run with a customisable, optimised cache size, which is an interesting feature that could conserve the computer's resources and also increase performance.

The code morphing software has some very clever features to enable it to increase the system's performance. It analysis the code its translating as it translates it, and looks for code that is re-used . When it finds a block of code that is run very frequently it decides to spend more time optimising it for the Crusoe architecture. This has the effect that programs that are used regularly get faster each time they are run. In my opinion this is a huge advantage of the Code Morphing software implemented as it is. I think that this is one of the major reasons why the Crusoe has not taken a huge performance hit from the software implementation of what is normally done in hardware: it gives the designers more freedom to develop sophisticated methods to increase performance, which could not have been done if they were constrained by the physical limitations normally associated with hardware.

The code morphing software also does an analysis of which branches are taken and flags this in the code. This way when the branch prediction takes place the software knows how likely it is that the branch will go one way or another, and can therefore tell the processor to execute down the most likely side. This is a major step forward in being able to successfully predict a branch and get some more performance from the CPU. The branch predict part of the software can also tell the processor to execute down both sides of the branch if they are both equally likely to be executed. I have heard of the idea of executing both sides of a branch and then holding the result and only supplying the one that is needed, but never in the context of a single CPU system. I know that Symmetric Multi Processor (SMP) systems have had one CPU scanning through code executing both sides of the branches and storing the results in cache to be used when needed. This seems like a similar concept contained within a single CPU that should give another performance increase. Again this is only really possible with the approach Transmeta has taken to the software implementation of some functional units commonly found in hardware.

One of my original goals for this section was to decide whether the Crusoe conforms to RISC or CISC. The answer is probably: "its neither!!". But, I do think that the elegance it displays is more in line with RISC than CISC. I think this can be said because if the processor is looked at out of context it appears to be RISC. The Crusoe has no branch predict unit, no code re-organisation unit and is lacking the sheer size that is evident in many of today's CISC processors. One of Transmeta's goals for the Crusoe was to have low power consumption. The IS features have contributed greatly to this. The Crusoe has moved away from the approaches pioneered by Intel for increasing sheer performance. By doing this it has enabled them to create a very efficient CPU that only uses 1 Watt of electrical power when fully running.

Thus, to conclude this section: Crusoe is in my opinion is (underneath the complex transform required to run x86 instructions) a RISC CPU, which has a very elegant, modern Instruction Set.

Back to Top, Next Page

Website Design Copyright 2000, Iain Gibson.
For problems or questions regarding this web contact [webmaster].
Last updated: January 30, 2001.