Two rivals take DIFFERENT PATHS to the next generation of computing
Extending their 32-bit rivalry into the next generation of computing, Intel and AMD plan to release new 64-bit x86-compatible processors in 2001. Like the jump in processing power we saw when the PC platform evolved from the 16-bit 286 to the 32-bit 386 processor, the jump to 64 bits promises to take PC technology to new heights on the enterprise ladder. Although 64-bit processors won't make your Microsoft Excel spreadsheets recalculate faster or speed up most other desktop applications, the new processors will address the perpetual need for more processing power in computing's upper tiers. High-end graphics workstations and large database systems, such as Microsoft SQL Server and Oracle, will benefit most directly from the new processors. Dot-com stores and the increasing number of decision support and data warehousing applications are typically the driving forces behind massive database growth. Such applications will derive less benefit from increases in raw processor speed than they will from increased memory addressability. Database applications in particular are infamous for being RAM-hungry; the more memory those applications have, the better they perform.
Today's crop of 32-bit processors can natively address up to 4GB (232 bytes) of data. Windows 2000 Server reserves 2GB of a 32-bit processor's storage for its own use, leaving 2GB for applications. Enterprise Management Architecture (EMA), which Win2K Advanced Server and Win2K Datacenter Server support, provides two methods of extending the amount of RAM available for applications: 4GB RAM Tuning (4GT) and Physical Address Extension (PAE). 4GT adds the /3GB switch to the Advanced RISC Computing (ARC) path in the boot.ini file to let applications address as much as 3GB of RAM. PAE uses a window to map chunks of physical memory to an application's virtual address space and extend physical memory addressability to 8GB on Win2K AS and to 64GB on Datacenter. (For more information about Datacenter's EMA support, see Greg Todd, "Win2K Datacenter Server," page 49.)
The upcoming 64-bit processors will dramatically extend the amount of addressable physical memory available to high-end systems. Intel's and AMD's 64-bit processors will raise the bar to a staggering 16 exabytes (EB), or roughly 18 billion gigabytes (264 bytes)more than enough headroom for even the most massive of today's applications.
Sixty-four-bit processors actually have two important capabilities. In addition to being able to use 64 bits to define a memory address, these processors can manipulate 64 bits of data simultaneously. Because the ability to manipulate 64 bits of data at once is as much a function of the bus structure as the processor, significant advances in system bus technology go hand in hand with the move to 64-bit processing.
At the Crossroads
Although they share a clear goal, AMD and Intel have chosen quite different paths to the destination. In a move that might seem surprising, Intel plans to abandon its flagship x86 architecture in favor of the new and radically different IA-64 architecture that Intel codeveloped with Hewlett-Packard (HP). The IA-64 architecture introduces a different instruction set and is based on a much more sophisticated and complex design whose effectiveness ultimately depends on new compiler technology.
In contrast, AMD plans to extend the x86 architecture into a new design known as x86-64. AMD's new architecture is a logical and simple extension of the current x86-32 instruction set architecture that all x86-based processors use.
These competing approaches will have a tremendous impact on the transition of Win2K and the upcoming Windows.NET platforms to 64-bit technology. Let's look in more detail at the different routes Intel and AMD have chosen.
Intel Takes the High Road
Intel has officially dubbed its 64-bit processor the Itanium (formerly code-named Merced). Intel expects to release the Itanium in the first half of 2001 and will target it as a replacement for the Pentium III Xeon processor in systems that are used primarily as servers and occasionally as very-high-end workstations. Industry observers expect the initial Itanium version to run at 733MHz, which might seem a bit disappointing considering that current Pentium systems already run at speeds well in excess of 1GHz. However, the Itanium's radically different architecture makes traditional speed comparisons a bit like comparing apples and oranges.
Although the Itanium's initial speeds will be more modest, Intel designed the Itanium to be capable of 6GFLOPS (i.e., 6 billion floating-point operations per second). The Itanium will have four integer units and two floating point units. The processor package, which will be about the size of a 3" * 5" index card, will have 32KB of L1 cache and 96KB of L2 cache on the chip and will be able to access up to 4MB of outboard L3 cache. OEMs also will be able to add L4 cache. The Itanium will have as many as 128 registers to store numbers and instructions, and Intel will use a 0.18 micron die-set to build the processor. The processor will use a new Slot M motherboard interface, and the front-side bus will run at 266MHz.
The Itanium is a Very Long Instruction Word (VLIW) processor. VLIW processors read instruction strings (aka words) that consist of a combination of multiple instructions. Manufacturers use the VLIW architecture in several specialized single-purpose CPUs, but VLIW has never before been used in a general-purpose microprocessor.
Moving the Itanium away from the x86 architecture eliminates the floating- point weaknesses that plague the x86 family. In its design, the Itanium more closely resembles a high-end RISC processor than it does the x86. However, unlike modern RISC processors, the Itanium uses enhanced parallel-processing techniques. Don't confuse the Itanium's type of parallel processing with the parallel processing that multiprocessor SMP systems such as the Xeon use. The Itanium's type of parallelism refers to the CPU's ability to process more than one instruction at a timea task most RISC systems do poorly. Intel's name for this ability is Explicitly Parallel Instruction Computing (EPIC).
The EPIC architecture will be able to process in parallel up to six instructions per clock cycle. The ability to execute multiple instructions per cycle makes traditional speed measurements, which are based solely on clock speed, misleading for the Itanium processor. The Itanium will probably herald a new CPU performance measurement based on instructions per cycle. EPIC eliminates the need to implement complex Pentium-style out-of-order processing to optimize speed. Instead, the Itanium hands to the compiler the job of parallelizing machine instructions. The compiler reads in the program source code and creates executable instructions, which the processor performs. The compiler must determine the dependencies of each instruction as well as which instructions the Itanium should run in parallel. This architecture promises to make the new processor simpler without requiring it to have an instruction scheduler or hidden registers. However, EPIC also depends largely on the compiler's ability to optimize code for parallel processing.
EPIC is closely related to another Itanium feature called prediction. Prediction is a compiler-based technique of looking ahead in the code to predict which code branches will actually be used. In modern processors such as the Pentium, the processor spends a portion of its time calculating which code branches the program is likely to perform next. Compiler-based prediction more accurately predicts which branches will be used than does processor-based prediction, thus reducing unneeded calculations and letting the processor operate more efficiently.
Speculation is another new capability that lets the Itanium load instructions and data into the CPU before they're actually needed, a technique that in effect uses the processor as a cache. By letting the processor load data before it's needed, speculation limits the effects of memory latency. Proactive loading also lets the processor execute instructions instantly as soon as it needs them.
With an eye toward the high-end supercomputing platform, Intel designed Itanium to support up to 512-way SMP servers. The Itanium's system bus, which implements a technology that Intel terms a Multidrop system bus, runs at 2.1Gbps to speed interprocessor communication.
Itanium's 32-Bit Penalty Box
To let the Itanium run existing 32-bit applications, Intel will provide the new processor with x86 hardware emulation for full compatibility with existing 32-bit instruction sets. This emulation will let existing 32-bit programs run without changes on Itanium-powered systems.
However, don't assume that your existing 32-bit applications will run faster on the Itanium. On the contrary, the Itanium's emulation imposes significant overhead by converting x86 instructions into equivalent IA-64 instructions. Obviously, the emulation process will also forgo Itanium's EPIC processing capabilities. An Itanium system will almost certainly run 32-bit applications more slowly than will a comparable Pentium or AMD Athlon system. Ultimately, all 32-bit applications will need to be recompiled on a 64-bit Itanium-compatible compiler to be able to take advantage of the Itanium's sophisticated new features. This trade-off clearly shows that Intel has designed the Itanium for the high-end server market, where compatibility with existing desktop software isn't a high priority.