|
| |
The Architecture
The Architecture behind the Transmeta Crusoe has some interesting issues to deal with. The Crusoe is a VLIW processor, which sends instructions through its different functional units simultaneously. It also has to deal with the complications of LongRun technology (dynamically scaling the CPU's clock frequency and core voltage). In the process of researching for this topic I have found very little information to do with the architecture of the CPU. I think that this is usual for a new processor; there is a lot of information about the wonderful things that the processor is capable of doing, but not a lot about how it does it.
There has been some information available on the Crusoe's Alias Detection Hardware (ADH), this is a circuitry on the chip which helps to deal with speculation (or speculative loads). The Crusoe CPU uses speculative loading to give greater throughput. It does this by, loading data before it is needed so that it is available for use from inside the CPU when required. This has some problems which need to be overcome: namely that the data could be changed in memory before it is used. To deal with this problem Transmeta have designed the Crusoe to deal with speculative loading. The Crusoe uses a "load-and-protect" instruction to do a speculative load, and a "store-under-alias-mask" instruction to keep track of speculated data and make sure that it is not invalidated. The ADH is used in combination with these instructions to identify when a mistake has been made. When a mistake is found the ADH stalls the execution and then tells the Code Morphing Software about the error and asks for a re-translate allowing for the error it made the first time. The Crusoe is the first CPU that has been designed with this amount of speculative loads in mind. This should improve performance compared to other processors, which do not allow for speculative loads as much as the Crusoe does.
A major issue when investigating the Architecture of the Crusoe CPU is its pipeline. Pipelining is a modern technique used to improve the throughput of a CPU: I expect the Crusoe processor to use a pipelined implementation. There are issues associated with the use of a pipeline: for example the use of variable instruction lengths and/or variable instruction run-times can adversely affect the performance of a pipelined CPU. The Crusoe processor may have some issues in this area because it's native instructions have an instruction length of 128bits but the software it will be targeted at (windows9x / millennium particularly) expect an instruction length of 32bits.
There has been very little information about the pipelining used in the Crusoe. Transmeta have really only said that it is pipelined, and left it at that. One small piece of information that I was able to find, comes directly from a Transmeta engineer:
"about six pipe stages for integer operations" - Malcolm Wing. This is interesting because it is a relatively small amount of stages. It is generally recognised that about eight or nine pipeline stages is optimal for performance. Malcolm Wing also described one of his design goals (he was involved with developing the Crusoe's micro architecture) as trying to not put out a too aggressive product initially. This would seem to suggest that the pipeline might be altered to give a performance increase in a future processor, more geared towards the server market.
Because of the use of Code Morphing Software (CMS) the Crusoe does not have any issues with varying instruction lengths. Any problems which may have occurred are not relevant, since the Crusoe has a uniform instruction length of 128Bits (made up of multiple translated x86 instructions). Also, the complications that could have been associated with implementing the LongRun technology are dealt with in a similar manner: basically the chip was designed from the ground-up to have LongRun implemented on it and as such there have not been any problems with it. At this time it is not known how any problems were worked around: Transmeta could just have altered the CMS to accommodate any shortcomings in the silicon, as they did for earlier test cases.
The Crusoe processor has a Gated Store Buffer inside the chip. This is part of the VLIW engine that buffers writes to memory until the writes are committed or discarded (via the appropriate atom). The term "gated" is used because the writes can be thought of as being held behind a fence until they are ready to be released to memory. It is foreseeable that the Buffer could fill up and so this has been accounted for by Transmeta. The state of the Gated Store Buffer is monitored by the CMS and controlled by it: if the Buffer gets full the CMS forces a commit onto the buffer to clear it into memory. I am not sure how this compares to what is done in other processors, but I think that it is something else to aid with speculation. I think that it will allow the CMS to analyse what needs to be written out to memory and so economise on the amount of memory writes that it needs to do. Again I think this will raise the performance of the Crusoe to help it compete with other, more CISC, processors.
|