Features —

Introducing the IBM/Sony/Toshiba Cell Processor — Part II: The Cell Architecture

Part 2 of Hannibal's coverage of the new Cell processor from IBM, Sony, and …

In today's session, IBM introduced the overall architecture of the Cell processor. Unfortunately, they didn't include many more microarchitectural details in today's session than they did in yesterday's. Most of the session covered issues like power management, clocking, the design process, and so on. So today's article is going to be more along the lines of a follow-up to yesterday's piece. I'll fill in the new information that I've picked up, as well as clarifying leftover questions from yesterday.

The Cell's basic architecture

The basic architecture of the Cell is described by IBM as a "system on a chip" (SoC) design. This is a perfectly good characterization, but I'd take it even further and call Cell a "network on a chip." As I described yesterday, the Cell's eight SPUs are essentially full-blown vector "computers," insofar as they are fairly simple CPUs with their own local storage.

These small vector computers are connected to each other and to the 512KB L2 cache via a element interface bus (EIB) that consists of four sixteen-byte data rings with 64-bit tags. This bus can transfer 96 bytes/cycle, and can handle over 100 outstanding requests.

The individual SPEs can use this bus to communicate with each other, and this includes the transfer of data in between SPEs acting as peers on the network. The SPEs also communicate with the L2 cache, with main memory (via the MIC), and with the rest of the system (via the BIC). The onboard memory interface controller (MIC) supports the new Rambus XDR memory standard, and the BIC (which I think stands for "bus interface controller" but I'm not 100% sure) has a coherent interface for SMP and a non-coherent interface for I/O.

Unfortunately, today's session was severly lacking in information on the 64-bit PPC core that handles the Cell's general-purpose computing chores. We do know that this core has a VMX/Altivec unit, at least one FPU, and supports simultaneous multithreading (SMT). It's also in-order issue, like the SPUs. So it appears that this core also lacks an instruction window, presumably for the same reasons that the SPUs do (i.e. to save on die space and cut down on control logic.) I have in my notes that the core is two-issue, like the SPUs, but I can't find this corroborated anywhere else. So it's possible that the core only issues two instructions per cycle peak, i.e. one from each currently-running thread. I'd imagine that if this is the case, this core's pipeline is very short. This would fit with the SPUs, in which the pipeline was also kept short and simple.

The entire Cell is produced on a 90nm SOI process with 8 layers of copper interconnect. The Cell sports 234 million transistors, and its die size is 221mm2. (This is roughly the size of the Emotion Engine at its introduction.) The PPC core's 32KB L1 cache is connected to the system L2 cache via a bus that can transfer 32 bytes/cycle between the two caches.

The Cell and Apple

Finally, before signing off, I should clarify my earlier remarks to the effect that I don't think that Apple will use this CPU. I originally based this assessment on the fact that I knew that the SPUs would not use VMX/Altivec. However, the PPC core does have a VMX unit. Nonetheless, I expect this VMX to be very simple, and roughly comparable to the Altivec unit o the first G4. Everything on this processor is stripped down to the bare minimum, so don't expect a ton of VMX performance out of it, and definitely not anything comparable to the G5. Furthermore, any Altivec code written for the new G4 or G5 would have to be completely reoptimized due to inorder nature of the PPC core's issue.

So the short answer is, Apple's use of this chip is within the realm of concievability, but it's extremely unlikely in the short- and medium-term. Apple is just too heavily invested in Altivec, and this processor is going to be a relative weakling in that department. Sure, it'll pack a major SIMD punch, but that will not be a double-precision Alitvec-type punch.

Channel Ars Technica