Prozac for AMD?
AMD’s been beating the 64-bit drum for several years now, long before the company actually shipped its 64-bit processors. Since the initial launch of the Opteron server and workstation in April of 2003, AMD worked with Microsoft to bring a 64-bit version of Windows. The x86-64 version of Windows Server 2003 slipped also, though the IA-64 bit version, supporting Intel’s Itanium architecture, shipped in spring of 2003. This was somewhat mitigated by 64-bit Linux distros supporting AMD64 processors, which also shipped in 2003.
Still, AMD shipped substantial numbers of Opterons into the 64-bit server space. When AMD shipped the Athlon 64, it was expected that a 64-bit desktop version of Windows would be quickly forthcoming, but 2004 passed with nothing but beta releases in sight. Recently, Redmond hinted strongly that
The delays must be frustrating for AMD. As late to the table as Intel is with x86-64, Microsoft lagged even more in shipping its 64-bit desktop operating system. Now the company that has been promoting 64-bit desktop computing for several years won’t reap the benefit of being the only kid on the block. Conspiracy theorists have already begun questioning the delays in Windows XP 64-bit edition. But if you look at it from Microsoft’s point of view, delaying made great sense. Large strides have been made in the last few months in the area of 64-bit driver support. Plus, Intel still ships significantly more processor volume of x86 CPUs than AMD, due to its dominance among large OEMs like Dell, Gateway, and HP.
Had the 64-bit version of Windows XP shipped last fall, AMD would likely have reaped a bonanza of publicity and sales. As it stands, the company can rightfully note that it was first into the x86-64 market, and that Intel is responding to pressure from its customers and the competition.
With these thoughts in mind, let’s take a look at the new Intel CPU lineup.
Intel is announcing the Pentium 4 600 series and a new member of the Pentium 4 Extreme Edition. Unlike the earlier Extreme Edition processors, which differed in that they were based on Intel’s older, 130nm technology, the new Extreme Edition is built on the company’s 90nm manufacturing process. In fact, the only difference between the new Extreme Edition and the mainstream 600 series is the initial clock speed and front-side bus speeds.
Intel has added a few new features to its latest desktop CPU line:
- Support for EM64T (x86-64) instructions
- Enhanced SpeedStep power management technology (similar to the Pentium-M)
- Cache increased to 2MB from the previous 1MB
- NX bit (execute disable) in hardware
At the instruction set level, differences between AMD64 and EM64T are minor at this stage. Most of the differences seem to revolve around how the two processors handle context switching. Programmers have gotten a head start on AMD’s implementation, but it’s not really known yet if code optimized for the Athlon 64 will run without issues on EM64T processors. Part of the reason is paucity of 64-bit applications. We used one for testing—the freeware 3D renderer POV-Ray—which offers both a 32-bit and 64-bit version. By the time 64-bit desktop Windows actually ships, we should see more applications arriving on the scene, including a few 64-bit games, such as Far Cry.
Intel is also beefing up power management, adding the enhanced SpeedStep functionality that originally appeared on the mobile Pentium-M processor. Some of the enhancements include:
- Dynamic voltage identification: The CPU can change voltage and frequency at any time, on the fly, depending on the processing load and thermal environment. The clock rate is reduced before the voltage change is made. Clock rate shifts are made by changing the multiplier, not the front-side bus clock.
- Processor halt during idle is enabled, which can occur even between keystrokes. The CPU comes out of halt any time there’s an interrupt from any source. Intel has implemented a “C1E” halt that differs from the older type of CPU halt in that the voltage and frequency are changed on the fly (using dynamic voltage identification).
- A new on-die thermal monitor dubbed “TM2” has been implemented. The older version would decrease power—and hence, performance—by about half when the die heated up beyond rated levels. TM2 can cool the processor by about 40% using clock rate and voltage shifts, but the adverse performance impact is reduced.
The new power states are fully supported in Windows XP service pack 2.
The beefed up L2 cache has increased the die size from the original 112 mm2 to 135 mm2, and still uses the LGA775 package that originated with the Intel 900 series chipsets. Transistor count has also increased, to 169M from 125M.
Finally, the implementation of the NX bit should improve performance of certain security features built into Windows XP Service Pack 2.
Let’s take a look at pricing and see how the new Prescotts stack up against the older models and against the competition. Note that these are official, quantity 1,000 list prices.
Processor Model | Price |
Pentium 4 Extreme Edition, 3.73GHz | $999 |
Pentium 4 660, 3.6GHz | $605 |
Pentium 4 650, 3.4GHz | $401 |
Pentium 4 640, 3.2GHz | $273 |
Pentium 4 630, 3.0GHz | $224 |
Pentium 4 Extreme Edition, 3.46GHz | $999 |
Pentium 4 570J, 3.8GHz | $637 |
Pentium 4 560J, 3.6GHz | $417 |
Pentium 4 550J, 3.4GHz | $278 |
Pentium 4 540J, 3.2GHz | $218 |
Pentium 4 530J, 3.0GHz | $178 |
Athlon 64 FX-55 (2.6GHz, 1MB L2 cache) | $827 |
Athlon 64 4000 (2.4GHz, 1MB L2 cache) | $643 |
Athlon 64 3800 (2.4GHz, 512KB L2 cache) | $424 |
Athlon 64 3500+ (2.2GHz, 512KB L2 cache) | $272 |
Note that we’ve only supplied pricing for AMD’s socket 939 processors in this table. Intel is clearly positioning the new processors at a slight, but not hefty, price premium over the 500 series, due to the larger L2 cache and 64-bit support. Later this year, Intel will deliver 500 series CPUs with EM64T support, so it’s likely that pricing will shift again.
In some ways, though, it feels like a holding pattern or rear-guard action. The aircraft carrier Intel has made its big turn and is bringing vast resources to bear on developing and shipping its dual core line of CPUs. The first dual core processors, code-named Smithfield, may arrive before the end of spring. This time around, AMD, who has been talking about dual core for over a year, may actually lag behind Intel, but that’s a story for another time.
We first performed an extensive set of benchmarks using good old 32-bit Windows XP Professional. Here’s how the two testbeds were configured:
Component: | Athlon 64 system | Pentium 4 systems |
Processor: | Athlon 64 FX-55 at 2.6GHz, Athlon 64 4000 at 2.4GHz, Athlon 64 3800 at 2.4GHz | Pentium 4 model 560, Pentium 4 model 570J, P4EE 3.46GHz, Pentium 4 model 660, P4EE at 3.73GHz |
Motherboard and chipset: | ASUS A8V-E Deluxe, Via K8T890 | Intel D925XECV2 |
Memory: | 2 x 512MB Corsair DDR400 at CAS 2-2-2-5 | 2 x 512MB Corsair Twin2X 5300C4Pro DDR2/533 at CAS3-3-3-8 |
Graphics: | nVidia GeForce 6800GT PCI Express (66.93 drivers) | nVidia GeForce 6800GT PCIe (66.93 Drivers) |
Hard drive: | Seagate Barracuda 7200.7 160GB, 7200RPM SATA | Seagate 7200.7 160GB SATA Drive with support for Native Command Queuing |
Optical drive: | ATAPI DVD-ROM | ATAPI DVD-ROM |
Audio: | Sound Blaster Audigy 2 | Sound Blaster Audigy 2 |
Operating system: | Windows XP Professional with SP2 | Windows XP Professional with SP2 |
The Athlon 64 system was set up with dual channel DDR400 memory running at low latencies (CAS 2-2-2-5). The DDR2/533 memory on the P4 systems can now run at relatively low latencies for DDR2: CAS 3-3-3-8, using the Corsair XMS2 Pro modules. Note that both Pentium 4 Extreme Edition CPUs run with a 1066MHz effective front-side bus (266MHz actual clock), while the rest of the P4’s run with an 800MHz effective FSB (200MHz clock).
The hard drives were defragged prior to each major benchmark run. Also, we used the rundll32.exe advapi32.dll, ProcessIdleTasks command to execute and shut down tasks that would normally run during idle cycles.
We ran an extended suite of 32-bit benchmarks:
- SYSmark 2004, patch 2, a general applications benchmark suite from BAPCO.
- Content creation, including 3D Studio Max R6, POV-Ray 3.6, After Effects 6.0, Windows Media Encoder 9, DivX 5.2.1 and LightWave 7.5.
- Synthetic benchmarks: 3DMark05, version 120 and PCMark04, version 130 and SPEC Viewperf 8.01.
- Game benchmarks: Doom 3, Painkiller, Microsoft Flight Simulator 2004, and Unreal Tournament 2004.
- Multitasking tests, including results from PCMark04, simultaneously running an Adobe Photoshop Elements script and Norton Antivirus and also a test where we ran Flight Simulator 2004 and a Windows Media Encode simultaneously.
64-bit Test Setup
We used the latest evaluation build (build 1289) of the beta version of Windows XP Professional 64-bit Edition. We also used the available 64-bit drivers, including nVidia’s 66.96 driver set and Creative’s 64-bit Audigy 2 drivers, as well as 64-bit motherboard and networking drivers.
We ran the benchmark portion of the 64-bit version of POV-Ray 3.6, a freeware 3D rendering application. We tried to get other 64-bit Windows applications, but most weren’t ready for prime time. We also ran a few 32-bit benchmarks just to see how much of a performance hit 32-bit apps might take running on the 64-bit version of Windows. These included 3DMark05, PCMark04, and Windows Media Encoder.
First up are the results from SYSmark 2004. SYSmark simulates real-life workloads for both Internet Content Creation and Office Productivity. The content creation portion uses apps like Photoshop, 3D Studio, Dreamweaver, and others, while the office productivity tests use typical office apps, like PowerPoint, Word, and Excel.
The existence of the Pentium 4 570J puts a bit of a damper on things, posting results close to the new Extreme Edition, but costing several hundred dollars less. The Athlon 64’s have always been at a bit of a disadvantage in SYSmark tests, but the FX-55 lags by a relatively small margin in the Internet Content Creation score.
PCMark04 consists of a series of synthetic benchmark suites, each designed to test individual subsystems, such as memory, processor, and hard drive. Note that the test detects the CPU automatically and loads dynamic libraries for each function that have been optimized for the processor under the test. So an Athlon 64 would run code that’s been tweaked to run best on its architecture, while a P4 running the same test would run different code optimized for that processor. As such, it’s an idealized view of performance. In the real world, application optimizations can vary widely.
Intel has always performed well with these tests. What’s most startling, though, is how well the Intel CPUs did in overall memory score. You’d think that the Athlon 64’s integrated memory controller would give it some additional efficiency not available to the Pentium 4.
When we look more closely at memory, we can see that the increased front-side bus bodes well for the 3.73GHz Extreme Edition in the 4MB block tests. These tests run with data blocks larger than the cache. The new 660 also performs well. Note that the Athlon 64s move large chunks of data about as well as the older P4 line did.
Next, we look at the 192KB block tests. These tests fit inside the L2 cache of all the CPUs. Note that the P4’s do well in block reads and writes, but the Athlon 54’s are slightly more efficient in random accesses within the cache.
Finally, we take a look at memory latency using the PCMark04 latency inspection test. Here, the smaller numbers are better. Surprisingly, despite the lower latency of the DDR400 memory running on the Athlon 64 systems, the large block latencies seem a bit higher on the Athlon 64. On the other hand, once we get inside the cache, the Athlon 64s fare noticeably better. This is particularly true of the L2 cache test (192KB), which seems to indicate that the Athlon 64’s L2 cache is roughly 20% more efficient in this particular set of tests.
Now we turn to actual performance using real applications. We’ll take a look at a pair of popular 3D modeling and rendering tools: 3D Studio Max R6 and LightWave 7.5. 3D Studio performs double duty here, as we run the SPEC APC 3D Studio test, which tests performance of 3D Studio by running model creation, modification, and rendering script. Note that we’re stuck with R6 for the moment, as the SPEC benchmark hasn’t been updated to work with the latest 3D Studio release 7 version. We also perform several pure rendering tests with 3D Studio. We also used the latest POV-Ray 3.6 benchmark.
It’s interesting to note that the Athlon 64 simply overwhelms the P4 in the SPEC APC interactive usage test script. Much of the test is spent tweaking models and running preview animations, while rendering takes up a relatively small part of the benchmark. On the other hand, the Pentium 4 generally outperforms the Athlon 64 in 3D Studio rendering. That suggests that a studio might benefit by having its render farm built from Pentium 4 or Xeon processors while the artists could be equipped with Athlon 64 or Opteron workstations.
The LightWave 7.5 render test paints a different picture, as the Athlon 64’s carry the day here. Only the older P4, based on the 130nm Northwood micro-architecture, hangs in there, but it’s still trounced by the FX-55. POV-Ray flips the coin, with the Pentium 4’s outperforming the Athlon 64 using the built-in POV-Ray benchmark.
We used Adobe After Effects 6.0, Windows Media Encoder 9, and the latest 5.2.1 release of the DivX codec to perform our tests here. We’ve shifted our WME tests to use the Windows Media 9 advance profile codec, which was included as part of Windows Media Player 10. The advanced profile adds more functionality for encoding WMV files, including denoise, interlaced, and progressive encoding options.
The Pentium 4 has always played well with After Effects, though the FX-55 does close the gap a bit. Note that the differences between the various P4s are relatively small, however, while the Athlon 64 results scale well. What’s interesting is the apparent cache dependency of After Effects. After Effects seems to like more than 512KB of L2 cache, but more than 1MB seems to have a minimal impact. The Athlon 64 3800 and 4000 processors differ only in cache size—512KB for the 3800 and 1MB for the 4000—but run at the same 2.4GHz frequency.
Using the WMV Advanced Profile test (set to 282kbps streaming bit rate), the P4 again outpaces the Athlon 64. Note here that clock rate seems to matter more than cache size on the Intel processor line. With the DivX encode, the FX-55 outpaces all the other processors, but the Intel CPUs out-encode the other Athlon 64’s.
We use four games, plus 3DMark05 to check out performance on games. The games include Doom3, Painkiller (1.6.1 update), Flight Simulator 2004, and Unreal Tournament 2004. All of these games put some heavy use of the processor and memory subsystem.
The POV-Ray 3.6 benchmark results are puzzling, to say the least. The 64-bit tests flip-flop, with a huge disparity between AMD and Intel on 64-bit POV-Ray. The 64-bit version was written before EM64T processors were available, so it may simply be that it’s not using Intel-specific optimizations. So it would be premature, to say the least, to suggest that the Athlon 64 is faster in 64-bit than an EM64T-capable P4. We simply need more data, but we present these tests as one early data point, running on a beta version of the OS.
The other benchmarks are all 32-bit apps. One set of results were run on Windows XP 32-bit, the other on the 64-bit version. The performance remains very close for both processors, which suggests that you’ll only take a small performance hit going to Windows XP 64-bit Edition when that ships.
As is normal with Intel, we’ll likely see the new processors in off-the-shelf systems from major OEMs first; with boxed retail CPUs arriving several weeks down the road. When those are available, should you upgrade?
First, if you have a recently built system, the answer is: probably not. However, if you’re in the market for a new PC, or will be building a new one, then it really depends on your application mix. If you’re doing a lot of content creation, then the new 600 series P4’s are a good option—though the Athlon 64 do fare well in typical 3D modeling chores.
For most office users, the Pentium 4 may be a better option. The better multitasking and multithreading performance can pay dividends in today’s world, where you have a multitude of small threads running in the background.
For gamers, though, it’s no contest. While a 600 series or Pentium 4 Extreme Edition system can certainly be the core of a fine gaming system, cutting edge gamers will want an Athlon 64. We were somewhat disappointed with the new Extreme Edition. The higher front-side bus clock at higher frequency is well and good, but those are the only real differences. At roughly $275 less, you could get an older model 570J, although it should be noted that the new P4EE will be more thermally efficient due to the new power management capabilities.
Note that the price premiums for the new 600 Prescotts is pretty slight, overall. While we suspect that online retailers may charge a higher price premium for a few weeks, once that settles down, we’d certainly recommend a 600 series over a 500 series CPU currently. If nothing else, support for upcoming 64-bit operating systems is a strong factor, though both the 500 series (with 1MB of L2 cache) and 90nm Celerons will be updated in the future to support 64-bit.
So 64-bit processing arrives at last for the Pentium 4. It’s been a long, tortuous road for Intel, but we can say that it works, and generally works pretty well. Perhaps by the end of the year, we can finally say goodbye to 32-bit environments and operating systems.
Product Name: | Intel Pentium 4 Model 660 |
Company: | Intel |
Price: | $605 (qty. 1,000) |
Pros: | 2MB L2 cache; EM64T 64-bit support; better power management |
Cons: | Pricey |
Summary: | The 600 series finally gives Prescott a little respect. The new power management, hardware NX and 64-bit support give Intel the final features checkbox it needed. But performance still lags a bit in some applications. |
Rating: | 8/10 |
Product Name: | Intel Pentium 4 Extreme Edition 3.73GHz |
Company: | Intel |
Price: | $999 (qty. 1,000) |
Pros: | Updates P4EE to Prescott; 2MB L2 cache; 64-bit support; new power management |
Cons: | Expensive for what you get; slightly slower than the old P4EE or P4 579J in some applications |
Summary: | We think charging a $300+ premium for boosting the cache and FSB over the existing 570J is a bit much to ask. But it does bring EM64T to the high end of Intel’s desktop platform. |
Rating: | 6/10 |