Itanium: Intel’s Great Successor
520 segments
In June 1994, Intel and Hewlett-Packard - two of Silicon Valley's largest and
most powerful companies - announced an alliance.
From the union of these two giants, will spring forth the next generation of CPUs.
The Great Successor. Chosen to unify two architectures under one umbrella.
It was named Itanium and by 2002 Intel had spent $5 billion on it. In today’s video,
we trace one of Intel's most ambitious products.
## Intel and 64-Bits
The x86 instruction set helped turn Intel into a giant.
A massive ecosystem had built up around it. In the 1980s, four out of every five
PCs shipped with an Intel CPU. These huge volumes helped them afford to build big,
advanced semiconductor fabs and produce at the lowest cost.
Why leave it all behind? But after shipping the famous Pentium CPU,
powerful voices inside Intel indeed began to assert that the time had come for something
new. The foremost reason for doing so was something called 64-bit computing.
The "64-bit" part of that phrase refers to the size of a CPU's "register". At the time,
Intel's CPUs were 32-bit processors, and that limited them in several ways.
The most prominent limit being that a 32-bit computer can only use up to
about 4 gigabytes of working memory: 2 to the power of 32. Less in practice,
because some of that is taken up by the operating system.
In the early 1990s, this 4-gigabyte wall was not a big deal for the consumer market
because PC memories topped out at about 128 megabytes. Who can imagine ordinary
folks ever needing much more than that at least in the near future?
But it was a big deal for graphics workstations, scientific computers
handling precise calculations, and web servers delivering content over the Internet.
These are powerful, very expensive machines that at the time ran UNIX.
Intel then dominated the PC, but had no presence in that high end space.
That space was populated by RISC chips like Sun Microsystems' SPARC,
Hewlett-Packard's PA-RISC, or DEC's Alpha. Intel wanted to get into that game.
## Extension versus Blank Sheet So question. Why not just extend the existing
32-bit x86 instruction set so that it can handle 64-bit registers?
After all, that is what Intel did with the prior major transition from 16-bit
to 32-bit. It wasn't easy, but the resulting 386 Intel CPU was a massive
success - powering a generation of PC clones like those from Compaq.
Intel even tried a similar, very ambitious blank sheet approach for that 32-bit transition.
The iAPX 432 was Intel's first 32-bit architecture. And to skip a lot of
words - feel free to read the very long Wikipedia if you care - that product failed.
But kind of like invading Russia, history rhymes. Intel felt that the 64-bit transition
would be different. And that this time, x86's years-old legacy CISC components would hold it
back. A lot of extra tooling and rules had to be followed to preserve that old world.
AMD and the other x86 cloners were a factor too. A history of 64-bit
computing by Matthew Kerner and Neil Padgett interviewed Richard Russell,
who pointed out that AMD's cross-licensing agreements gave them access to Intel's x86 work.
So the way it went was that Intel first releases a new x86 chip. Then six or twelve months later,
AMD releases their version at a cheaper price. This devalued
Intel’s R&D and burned a huge amount of profits for everyone involved.
The ghost of IBM and the PC loomed too. There is no guarantee that Intel will forever control x86.
The day might come that AMD, Cyrix, and the other x86 cloners somehow
pry control of the standard like what the PC cloners did to IBM.
So in Intel's eyes, yeah sure they can always extend x86. But the reboot-with-a-clean-sheet
approach could potentially let Intel surge ahead of the competition with an architecture
that it fully owned. And proponents argued that Intel now had enough influence to pull it off.
The debate raged until the late, great Albert Yu - Intel's general manager of microprocessors,
who oversaw development of the 386, 486, and the Pentium - bought in.
But how to achieve it? Kerner and Padgett also interviewed Dileep Bhandarkar,
who was then an Intel director. Bhandarkar recalled the company
doing a small internal 64-bit effort in 1992 while investigating outside opportunities.
The computer company DEC tried to get Intel to take on their RISC chip Alpha,
a very fast chip, which they declined. Intel then suggested DEC make the Alpha in Intel’s
fabs, which DEC declined because they just spent half a billion dollars on a new fab.
Then in late 1993, HP came knocking on Intel's door with an exciting new technology.
## A Post-RISC Technology In 1990, Hewlett-Packard rehired the brilliant
Bill Worley to flesh out the future of their proprietary line of chips.
Worley used to work at IBM alongside John Cocke on the legendary IBM 801 project.
801 is widely acknowledged as the driving force that kicked off the RISC revolution.
He then joined HP where he helped produce one of the earliest RISC instruction sets,
Performance-Architecture RISC, or PA-RISC. The architecture became a
growth engine for Hewlett-Packard through the turbulent RISC wars of the 1980s.
Worley then briefly left HP to lead a graphics processor
startup but rejoined in 1990 for a special project. The PA-RISC team
recognized that RISC was on the verge of hitting serious performance limits.
So a new project initially called "Super Workstation" was formed
to explore new architectures in the post-RISC beyond. Over time,
Super Workstation's work began to intertwine with that of another team inside Hewlett-Packard:
Fine-grained Architecture and Software Technologies, or FAST an HP internal
project exploring and evolving a radical concept known as Very Large Instruction Word, or VLIW.
## Meet VLIW
Very Long Instruction Word is a term coined by the brilliant Joshua Fisher while he was at Yale.
The way he puts it, VLIW describes a design philosophy. A concept or idea
more like RISC, rather than a specific instruction set like ARM or x86.
Its goal is for a CPU to achieve as much Instruction-Level Parallelism or
ILP as possible without making its hardware do it.
What is ILP? It is a way for a single microprocessor to speed up work by
initiating and executing multiple machine instructions in parallel
so that we can try to get more than one useful operation done per clock cycle.
Traditionally, high levels of ILP were seen as infeasible because programs have so many branching
conditions: If/else statements, loops and annoying dependencies that change the path of the code.
VLIW tries to surpass those shortcomings
by running "traces" of the program code. Using heuristics and user-provided data,
the compiler will try and guess how the user's program might progress.
That compiler then aggressively schedules the trace’s instructions
for maximum parallelism regardless of dependencies. To handle mistaken guesses,
the compiler adds compensation code to "backtrack" or fix things up.
These scheduled instructions are then packed together and sent to
the hardware in "very large words". Ergo the name.
People initially thought VLIW computers were impossible. That
is because it requires a compiler that can somehow predict a program’s future.
The difficulty of producing such compilers is a recurring theme with this technology.
## Fisher and Rau
Wanting to prove the skeptics wrong, Josh Fisher left to start a startup called Multiflow.
In 1987, they produced a line of powerful mini-supercomputers called TRACE. Over the
next two years, they sold and shipped about 100 units to scientific and commercial users.
Multiflow was not the only startup exploring VLIW at the time. There was
another founded by a brilliant Indian-American named Bob Rau.
Rau had led a team at the computer company TRW
studying similar Instruction-Level Parallelism techniques. In the same
year Fisher founded Multiflow, Bob Rau and several colleagues left to found Cydrome.
Cydrome worked on a VLIW-based "departmental supercomputer" called the Cydra 5. And while they
got it to work, it never shipped as a commercial product. The company eventually disbanded.
Multiflow also disbanded. In 1989, the mini-supercomputer market crashed from
over-competition in the category as well as cannibalization by powerful
single-chip RISC workstations called "Killer Micros". Circumstances trumped technology.
## A Radical Idea
After their startups closed down, both Bob Rau and Josh Fisher joined Hewlett
Packard and the FAST project with the goal of evolving the VLIW technology.
At the time, the big thing in the microprocessor world
was an ILP approach called Out-of-order Superscalar. This approach was arguably
pioneered by the aforementioned John Cocke and Tilak Agarwala.
Roughly speaking, superscalar involves us adding independent stations to the CPU, plus extra
hardware to grab a lot of instructions, figure out their various dependencies, and send them to
the right stations for simultaneous execution. This is all done as the program is running.
Superscalar worked. IBM utilized it then for their high-performing RS/6000 workstation.
Intel would later use it for their Pentium processors. But Rau and Fisher
came to believe - quite controversially - that superscalar is an anchor. An anchor that will
blunt the lift that microprocessors were then getting from Moore's Law.
Superscalar leans heavily on hardware to analyze instructions, figure out their
various dependencies, and sort them into the ideal order as the program runs. Such hardware
is incredibly complex and power-hungry. Rau and Fisher bet that it will not scale.
With their contributions, Super Workstation produced a new architecture called PA-Wide
Word or PA-WW. It performed quite well compared to what existed inside HP.
Next then is to design and produce a chip that implements this architecture. But in this,
there were challenges. Worley realized that PA-WW chips would have to be made
in a leading edge fab. In a 2001 interview for HP Labs, he explained the ramifications of such:
> The costs of such a fab implied that the chip volumes would have to be extremely high. High
volumes, as well as the need to attract software from many providers, implied that the architecture
would have to be an industry standard. An industry standard implied that HP could not do it alone.
Thus in July 1992, Worley recommended that HP bring in a manufacturing partner
with both prowess and scale. The obvious partner was Intel.
On Thanksgiving 1993, HP's CEO Lew Platt made a call to Andy Grove, asking whether
Intel might be interested in working with HP to make PA-WW the successor to x86.
Grove said no. HP tried again later, emphasizing that PA-WW would be fully
backwards compatible with both x86 and PA-RISC. This time it worked.
## Intel and HP Team Up So what did Intel see that got them so interested?
The HP design team included well-respected folks like Josh Fisher, Bob Rau,
and Bill Worley. And that team had already made much progress. In a widely circulated quote,
Intel's John Crawford told the Wall Street Journal:
> When we saw WideWord, we saw a lot of things we had only been looking at doing,
already in their full glory
A PA-Wide Word architect named Rajiv Gupta had this second golden quote - also widely circulated:
> I looked Albert Yu in the eyes and showed him we could run circles around PowerPC [a
competing IBM processor], that we could kill PowerPC, that we could kill the x86. Albert,
he's like a big Buddha. He just smiles and nods.
Intel would be blind if they didn't also notice the competitive dynamics. They can
convert one of their significant RISC rivals onto a technology platform that they control.
And if HP gets on board, then maybe others like Sun and Silicon Graphics will too.
Grove was intrigued and ordered a bake-off between PA-WW and its own internal 64-bit
architecture effort. PA-WW won. So they hammered out a deal, announced in June 1994.
Hewlett-Packard transfers the PA-WW IP over to Intel. Intel then designs and produces
the first CPUs. HP can then get said CPUs at a discount to produce enterprise system products.
There were no solid products, only a statement of direction towards a future
computer architecture. The first processors were not anticipated to arrive before 1998,
but once delivered, they will carry both companies into the 21st century.
This was going to be a massive project. Albert Yu anticipated it costing between $400 to
$500 million over its whole life. An underestimate as it turns out. But Intel
can afford it and the results were going to be amazing. Albert Yu told the press at the time:
> By combining our skills ... we will offer the marketplace chips and systems
with absolutely unparalleled performance for the future
## Taking Names
Now. I want to pause a bit and talk names. Part of what makes this all so confusing are the names.
There are more names here than you can shake a stick at. And unfortunately
they all come out at different times. I am going to step out of the flow of
time and gather them all together here so that we can keep track.
So we start off with HP’s Super Workstation,
which produces PA-Wide Word. The announced 1994 collaboration with Intel would eventually evolve
PA-WW into a new thing called "Explicitly Parallel Instruction Computing", or EPIC.
EPIC is an architecture philosophy kind of like how CISC or RISC are philosophies. So
think of it like the philosophy of French cuisine - a style with recommendations on
how to achieve a wanted goal. EPIC likes parallelism. French cuisine likes sauces.
EPIC is a direct descendant of VLIW. So it still transfers complexity from the hardware
to the software compiler. The complier still aggressively analyzes the program code for
parallelism opportunities and group together instructions in big bundles.
But EPIC strikes a more moderate tone by admitting that sometimes
the hardware is in a better position to do certain things in runtime because of
access to program variables. So EPIC accommodates hardware in the CPU for
that - but not so much to make it as complex as a superscalar chip.
Multiflow and Cydrome's VLIW compilers were also too tightly bound to their
microarchitectures' hardware. EPIC addresses this rigidity with something
called "templates" - which help define which instructions can be bundled together.
Now that is EPIC. The next term to introduce is the IA-64 instruction
set architecture. EPIC is to IA-64 as what RISC is to PA-RISC or SPARC. A specific
instruction set implementation of EPIC, defined and owned by both Intel and HP.
So to continue the cooking metaphor, you can think of it as like a French cuisine
cookbook - demonstrating various techniques and recipes for cooks to make French dishes.
After that, we go to the individual chips. The French dishes themselves,
as served by the restaurant. Intel expected its first IA-64 chip to hit the market in 1998.
Internally, this first IA-64 chip had the codename Merced after a river in California.
In October 1999, Intel would announce that the chip would be officially named Itanium. Intel said
at the time that the name conveys the processor's unique strengths and power while retaining the
"-ium" word endings for brand consistency. Netizens almost instantly dubbed it the Itanic.
## Reactions
Anyway. Back to 1994 and the flow of time. Outside analysts saw the
collaboration's potential - citing the two companies' talents and capabilities.
Hewlett-Packard was top two in the workstation and server markets,
where Intel was then weak. And of course, Intel was the juggernaut of the PC industry,
trying mightily to get into the workstation and server industries.
Analysts looked at how the collaboration might have on IBM, which backed its own PowerPC line
of RISC chips. Andrew Allison of the "Inside the Computer Industry" newsletter told ComputerWorld:
> I would imagine that IBM is not terribly thrilled with it ... It’s probably the only
combination that is virtually guaranteed to have the horsepower to stand up to PowerPC.
Intel didn't outright say it - and they would later deny to have ever
implied such a thing - but they also positioned this new family
as the future successor to x86. One VP at a Boston consultancy said:
> "Intel is smart enough to know when it’s time to be at the end of the x86 line."
The Microprocessor Report echoed the notion that the end was now in sight for x86.
This new architecture will supersede both it and
PA-RISC before trickling down to the mass market. They write:
> We expect that, in about 10 years, Intel will stop making pure x86 chips in favor
of [the new chips]. Intel will continue to milk the x86 cash cow as long as it can ...
> Intel’s P6, due in late 1995, probably will be the last pure x86 core that Intel develops
## Disagreements
Not everyone agreed with that. Shortly after the announcement,
Nick Tredennick wrote up a dissenting view.
He argued that the two companies had shot themselves in the foot by
transitioning architectures and pursuing the VLIW "technofad".
He pointed out that big architectural shifts require developers to recompile
their software. Which they hate doing because it’s never smooth.
And that the complicated hardware will also need extremely complicated
compilers. Neither of which have good histories of on-time delivery.
And that switching away from x86 would be walking the same mistaken and failed path
that IBM did when Big Blue tried to lock down the PC ecosystem with the Micro Channel Architecture.
Add to this boiling bone broth the collaboration's high expectations, which towered over K2.
Robert Colwell is a legendary CPU designer who previously worked at
Multiflow. He then went to Intel in 1990. In his memoirs, he wrote:
> In essence, [the Intel design team in charge of IA-64] were told that their mission was to
jointly conceive the world’s greatest instruction set architecture with HP,
and then realize that architecture in a chip called Merced by 1997,
with performance second to no other processor, for any benchmark you like.
Merced will also do all these things while being fully compatible with
legacy software of both x86 and PA-RISC. This sounds ambitious.
Colwell was not alone in his doubts. Intel's chief of corporate strategy
at the time was David House. While he approved the project,
he would later say that its sheer scale - and I quote - "scared the everloving bejesus out of me".
## Merced
Intel sold chips to HP, but they never worked together on this level.
HP is famous for its consensus-based management
style. Intel on the other hand is just as famous for "constructive confrontations",
where people are expected to challenge each other bluntly, promptly and with data.
So the two arm-wrestled over what functions should
be handled by the software or hardware while simultaneously ramping up their teams with new,
relatively inexperienced people. There was tension.
The difficult experience was either so traumatic or constructive that HP took the sole lead
for the second generation of IA-64 chips. This particular chip project was code-named McKinley.
The original plan was to release Merced in 1998 and fab it with
Intel's 250 nanometer node. But then the chip design was found
to be spilling beyond the limits of what can be fabbed. Like a muffin top.
So the designers took out transistors allocated for memory cache and x86
compatibility. Removing the latter was made easier after the much-faster
Pentium Pro released because of weak x86 performance relative to that beast.
Even so, there was still spill over. So it was decided to go to the 180-nanometer node
instead. The transistor shrink would let them put the whole design onto a single
die. The cost however was a six month delay, pushing the ship date to 1999.
Things progressed. In October 1997, the two companies introduced EPIC
and IA-64 to 1,500 computer designers at the Microprocessor Forum. They talked about EPIC's
key architectural choices and emphasized its speed relative to existing RISC chips.
Intel also shared a release date for Merced:
1999. They said it would have industry leading performance, full compatibility
with the old 32-bit architecture, and have a complete solution stack at launch.
Several big software developers announced their participation in the IA-64 ecosystem.
Microsoft agreed to have a 64-bit version of its Windows NT operating system available at
release. Sun said it would make their Solaris OS available on Merced chips.
And to raise the hype even more, presenter and Intel Fellow Fred Pollack teased the
second-generation McKinley chip, saying that it was going to "knock your socks off".
## P7
When Colwell arrived at Intel back in 1990,
he helped found the company's second design team in Oregon.
That team - working in friendly competition with a team in Santa Clara - began on a product
code-named P6. It would be released in 1995 as the 32-bit Pentium Pro.
The Pentium Pro was a remarkable chip. Despite being fabbed on the same process
as its predecessor (P5), P6 ran twice as fast thanks to the inclusion of ideas like out-of-order
superscalar, which to remind you, searched more aggressively for instructions to parallelize.
The Pentium Pro brought Intel's x86 architecture neck to neck with some of the fastest RISC chips.
It also opened the door to the workstation market by enabling the "personal workstation".
Such personal workstations - running Microsoft's Windows NT or Linux - cost
half that of the old-school UNIX-powered workstations. They grew rapidly in 1995,
eating into the low end of the market.
Unfortunately, internal politics interfered with the Oregon team's pursuit of this opportunity.
Colwell remembers being told that IA-64 will eventually replace the 32-bit lines,
so why keep working on the old legacy stuff?
To Colwell however, the Pentium Pro showed that the 32-bit architecture
still had plenty of juice. With no 64-bit killer application on the immediate horizon,
a premature switch might leave the market to AMD and other competitors.
He also argued that Merced had so many new things going on that there was no
chance that it would all work right on the first try. He felt Intel should have
returned the chip to the lab as a long-term research project to iron out its kinks.
In the end, management could not decide on a coherent strategy on how to resolve the conflicts
between the Oregon team working on 32-bit and the Santa Clara team working on 64-bit Merced.
At first, they were content to just stand aside and let the best one rise to the top.
However this backfired, because Merced had to be compatible with the 32-bit stuff. With
Colwell and the Oregon team still working on it, that goal became an ever-moving target. So
the Santa Clara team tried to "freeze" the specification, which Oregon hated.
In the end, management separated the children: 64-bit for the more powerful server chips.
32-bit for everything else including workstations. That’s the strategy Intel would follow henceforth.
By the way, I highly recommend Colwell's book, "The Pentium Chronicles", where he
talks about these worsening dynamics between Santa Clara and Oregon. It is a strong read.
## A Second Delay
Soon after the October 1997 presentation at the Microprocessor Forum, a new problem emerged.
A source told CNET at the time that Intel severely underestimated the
chip's complexity. The Wall Street Journal later reported Intel struggling with various
signals arriving at parts of the CPU at the wrong time, creating speed bottlenecks.
This was amplified by Intel targeting an exceptionally high 800 megahertz clock rate.
Tweaks made to fix bottlenecks in one module caused ripple effects in
other modules, making debugging endlessly tricky.
There are rumors of other things, but I won't go into them. Whatever the thing was,
it was serious. By mid-1998, the company had to announce that it was
pushing Merced's release from late 1999 to mid-2000. Which means servers do not
reach actual customers until Q4 2000. New CEO Craig Barrett told the press:
> Our best assessment is that the project is a bit bigger and complicated than we assumed
it would be ... we are pleased with progress. There's not a basic problem with the technology.
This second delay means that Merced is scraping up against the second-generation IA-64 chip - the
one that HP is designing code-named McKinley. It was scheduled to enter mass production in 2001.
Intel finally successfully taped out Merced in summer of 1999 and
demonstrated it in the fall at its 1999 Intel Developers Forum. Shortly afterwards,
the fabs started learning how to produce the new chip, with early versions seeded to developers.
## Transition Plans
Both Intel and Hewlett-Packard - perhaps expecting this might happen - went to their backups.
At the 1998 Microprocessor Forum, Hewlett-Packard unveiled a "transition plan" towards IA-64.
They would continue releasing additional PA-RISC chips for the next five years until
2003. Customers can choose which chip they want in their server.
This was not ideal. A former HP executive remarked that they had to do all sorts of tricks
to extend PA-RISC. The delays and distractions associated with getting out IA-64 allowed rival
Sun Microsystems to leap ahead in the web server market during the wild late 90s internet boom era.
And as for Intel, the chip giant revitalized market revenues of its 32-bit architecture in
1998 with the introduction of the Celeron and Xeon lines. Market segmentation.
The former targeted value-minded consumers who otherwise bought cheaper chips from AMD,
Cyrix and other cloners. The first Celeron flopped
because it basically had no cache but later iterations performed very well.
The latter chip, the Xeon, targeted the medium to high-end server market
with faster clock speeds, larger caches and higher cache bandwidth.
So when Merced was announced to be delayed,
analysts noted that it was not a huge deal and that the Xeon can hold on as
a "placeholder". As we will later see, that turned out to be an understatement.
The delay did give OS-makers like Microsoft and the UNIX vendors time
to port for Merced/Itanium. But even as something like a "race" developed, actual
application developer interest remained tepid. One Wells Fargo system architect said in 1998:
> We have a few applications that could benefit from Merced,
but probably not anytime soon ... first we’ve got to take
care of Year 2000 compliance issues. Maybe in 2001 we can look at Merced
## Itanium in 2001: The Revolution is Here
After 7 years and $5 billion spent, Intel finally launched Itanium in the summer of 2001.
Recognizing that their 32-bit products were still going strong,
Intel tried to position Itanium as a powerful but revolutionary product for the "most demanding
enterprise and high-performance computing applications" as their press release said.
So yes, while it might take some additional work at the start, those who do will be
rewarded. They commissioned a white paper to identify "sweet spots for early adopters",
which included technical computing, large databases, and complex analytics.
To the press, Intel worked hard to emphasize that this was just the first step of a long
journey and that the ecosystem adoption thus far at this early stage was pretty impressive.
On the hardware side, they highlighted buy-in from a spectrum of computer manufacturers.
Some 35 Itanium-based models were said to be released by 25 companies like Dell,
Compaq and Silicon Graphics throughout 2001.
Intel also highlighted that Itanium systems can run four compatible operating systems:
Two 64-bit versions of Windows, HP's proprietary UNIX variant HP-UX,
IBM's proprietary UNIX variant, and certain commercial Linux distributions.
With all this backing from the big companies,
people presumed that Itanium would take the market. A 2000 market report from MicroDesign
Resources had predicted that IA-64 chips would have 60% of the server market by 2003.
Unfortunately, Itanium took too long of a path to the market. Soon after its debut,
it was outshone by several new 64-bit RISC like Sun's UltraSPARC III and IBM's Power4 chips.
Microprocessor Report nominated the Itanium for
its Best Workstation/ Server Processor award, but wrote:
> But while other high-end server processor designs are moving to glueless multiprocessing,
simultaneous multithreading, chip-level multiprocessing,
and integrated memory controllers, the Itanium system architecture is beginning
to show its age. Perhaps the design has been in development too long and has had too many cooks.
Another major issue was that there was not a lot of Itanium-native software. And while
Itanium can run 32-bit x86 software, it unfortunately did not do it that well.
IA-64 is so different from x86 that emulation means recreating the whole
thing from scratch. The more you try to force the former to act like the latter,
the more you are giving up its own inherent advantages.
Considering the chip's high price, disappointing performance,
and the looming arrival of a faster chip the following year,
it is surprising that the first iteration of the Itanium sold even the few units that it did.
There are a few who say that Itanium did (allegedly) kill one of its big RISC rivals,
the DEC Alpha - once heralded as the world's fastest chip.
After Compaq bought DEC, they wanted to consolidate to a single 64-bit
platform - which was Itanium because that was all that Intel had at the time - and
sold the Alpha IP to Intel. Does that mean Itanium killed Alpha? Not sure.
A few people are nostalgic about the Alpha, but chip R&D is expensive, DEC was not doing
well then, and it was not like the chip was doing all that great before Compaq nixed it.
## AMD
Befitting the fast follower, AMD too wanted to get into the server
business. They had 18% of the 32-bit market but zero in servers. That meant going 64-bit.
Hearing that Intel was doing something brand new for 64-bit, AMD approached Intel for an
early look at the Itanium architecture. But as I mentioned in passing, Intel had
intentionally carved out Itanium from AMD's cross-licensing agreements. They were rebuffed.
So what to do next? Atiq Raza - who joined AMD as COO from its acquisition of the CPU
company Nexgen - explains in his oral history for the Computer History Museum:
> Everybody said we're going to get screwed. Itanium is going to
take over the world. So I said, "Okay. I find it very weird that basically they
have abandoned the x86 and are doing a different instruction set for 64-bit.
We should also consider doing a different instruction set if that's the case."
So AMD investigated various partners - SPARC, MIPS, PowerPC and DEC - to see whether they
can do something. While those ecosystems had existing user bases that AMD can leverage,
32-bit x86 software did not run well on them in emulation mode.
Eventually, AMD came to believe that Intel had made a mistake.
Developers hate recompiling software and users hate being forced to adopt
something new unless for some compelling reason. IA-64 didn't seem to be it.
VLIW's roots were in academia. And while Multiflow did sell well to corporates,
it showed its best colors on numerical and scientific workloads as those programs tend to
have more repeated loops, ILP opportunities and predictable control flows. For more
generalized work like in the business space, VLIW’s gains were not as obvious.
If Intel really did err in going down the route to Itanium, then AMD suddenly had an opportunity.
So Raza went to a brilliant chip designer who joined AMD from DEC named Jim Keller, and said:
> "Jim, life and death for AMD. We do an x86 extension to 64-bit. You
have to write the spec and you have to do it with very little time."
Keller - who cranked on this day and night - thus is one of the major authors of the x86-64
spec - later known as AMD64. Which I would say is by itself a killer legacy, but Keller
has since gone on to do a bunch of legendary stuff at Apple, Tesla and more. Living legend.
In October 1999, AMD announced x86-64 to the world. This was a major divergence from Intel.
AMD assured the market that their 64-bit transition will
be a "simple change" fully compatible with 32-bit x86.
AMD - never one to miss a snarky comment at their rivals - criticized Intel for
"forcing" the Itanium design onto the OEMs.
Ron Curry, Intel's director of marketing for IA-64 products,
responded by insisting that Itanium too will have x86 compatibility.
He then went on to compare AMD's strategy to trying to soup up a Volkswagen with wider
tires and a faster engine. I get where he is driving with this but still find it amusing.
The looming threat of AMD's entry into the 64-bit space pushed Intel
to double down on its second-generation IA-64 chip, the one codenamed McKinley.
## Itanium 2: This is Ready
In 2002, Intel CEO Craig Barrett rebooted the project.
McKinley was officially announced as the Itanium 2,
made an official member of the Intel lineup, and released in late 2003.
Then-CEO Paul Otellini predicted sales of 100,000 units, telling security analysts:
> At the risk of getting myself in a lot of trouble,
I'm going to declare this the year of Itanium
The Itanium 2 was indeed an improved product. It performed better on benchmarks.
Largely because of a larger bus with three times
the data bandwidth of the first Itanium and bigger L3 cache.
For the redux, Intel sharpened the messaging to aim squarely at Sun and their high-end UltraSPARC
III-based server systems - then the market leader in UNIX systems with nearly 30% market share.
Intel's product news release repeatedly compares
it favorably to the UltraSPARC III. Sun must not be happy about that.
Second, Intel highlighted growth in the software ecosystem. Applications were now
available from Microsoft, Oracle, Reuters, and BEA. They also claimed that the chip
is compatible with more operating systems than any other high-end enterprise server platform.
And Hewlett-Packard was giving it their all. They developed an Itanium 2-based high-end server
called the Superdome that can run a variety of operating systems - not just HP-UX but
also Windows and Linux - to help transition their customers over to this Intel stuff.
But even as early December 2002, there were concerns by outside analysts that
HP might be the only server vendor to go so far in adopting the Itanium.
In November 2003, Intel's director of business-critical systems marketing
told CNET that Itanium was on the brink of broader
use and that 2004 was going to be a "very strong watershed year".
## Rise of the Compute Cluster In an earlier time, I think it might have worked.
I think Intel really did have the market power then to drag people to Itanium. The
growing internet tech giants might have bought plenty of expensive
Itanium-powered computers for web servers, and Itanium could have been on its way.
But times were changing. Tech giants like Google were turning away from big mainframes towards what
are called "compute clusters". Such clusters were powered by cheap commodity hardware and
the open-source Linux OS and then networked with software to act like one chonk computer.
Such compute clusters can be cheaply scaled up to serve billions of people.
Buyers don’t have to pay a tax to a proprietary systems vendor.
Clusters also tend to be more resilient against physical hardware failures.
Big mainframes still have their place but clusters were the future.
That means cheap commodity hardware, which ironically enough meant x86 Pentium - later
Xeon - CPUs. In the end, Itanium's biggest competitor was another Intel product.
## AMD's Victory
In April 2003, AMD released the K8 Sledgehammer core in the Opteron and the Athlon 64.
AMD counted on the 130-nanometer Opteron to help it gain traction in the corporate market.
They held it out as the "evolutionary" path for
companies with legacy code to get to 64-bit computing.
Despite its groundbreaking features and Sun's support, the chip did not
sell as well as anticipated. System sales by mid-2004 did beat Itanium, but lagged
Xeon's by a country mile. Basically none of the big box vendors made an Opteron server.
This sus behavior is what in part led CEO Hector Ruiz to eventually
sue Intel again for alleged anti-competitive practices.
But Opteron and Athlon 64 did force Intel's hand. In early 2004,
Craig Barrett told developers that they were adding 64-bit address
extensions for their server Xeons. Desktop chips, the following year.
Intel ended up adopting AMD's 64-bit extension, with some minor differences. It was the first
time that a company other than Intel made a major addition to the x86 spec.
Isn't it ironic? Intel produced Itanium in part because they were afraid that AMD and the cloners
might pry away control of x86. But making Itanium eventually led to that very thing they feared
happening! Fortunately for them, they still had Xeons (and their special discounts to vendors).
Several analysts had projected Itanium server sales to reach $14 billion by mid-2004. The
actual sales number then was about $600 million. Outside of Hewlett-Packard,
no major systems vendor was selling these Itanium systems at volume.
## Conclusion Intel leadership long insisted that Itanium is
a long term play. In a 2005 oral history for Stanford University, Albert Yu said:
> I think Itanium has not failed. I think the Itanium chapter has not been written yet.
Later on, he says:
> Itanium was never intended to be a replacement for Intel Architecture. It was never thought of
that way. It's always to be for high end servers. And I think there probably some confusion.
Some people might think that we're going to take Itanium to replace Intel architecture.
That's never been the intention. It will be an additional architecture, and that was the intent.
Chairman Craig Barrett echoed the sentiment when he retired in 2009. He said in an interview:
> You guys have had a lot of fun with it in the past ... The
ultimate verdict is probably going to be 10 years from now
In 2000, the company posted a roadmap document that said that the IA-64 architecture would
last for 25 years. It didn't quite get there, but it got darn close.
A big reason why was Hewlett-Packard - which quickly standardized on the
Itanium. They were its dominant user - buying 85%
of production - and depended on it quite a bit. Their HP-UX OS only runs on Itanium.
This became a problem as the Itanium ecosystem declined. So HP went to great lengths to
keep this thing huffing and puffing even as software developers started dropping support
for Itanium systems: Red Hat in 2009, Microsoft in 2010, and more since then.
The biggest drop-out occurred in 2011 when Oracle halted Itanium development and called the product
"end of life". This led to Intel getting huffy and a lawsuit from Hewlett-Packard three months later.
It is thanks to this lawsuit that we learned that HP agreed to pay Intel
$690 million over the span of 2 deals to keep producing Itanium chips until 2017.
HP won the lawsuit by the way.
Development on Itanium continued - we have a Itanium 9100, 9300, 9500, and 9700 series.
But it was clear to everyone that the bulk of Intel's resources would be spent on the
x86-based Xeon. And today, the Xeon is indeed one of the company's biggest product lines.
But the day had to come. And in 2017 it finally did, Intel announced that that Itanium 9700,
codenamed Kittson, would be the end. The last to be shipped in 2021. And so it was.
With all that being said, I do applaud Intel for having the ambition, balls, and billions
to throw at something like this. For years, CPUs got faster from improvements in both clock speed
and architecture. After going superscalar, what in the latter was left? Itanium's failure locked x86
into the market - and in part paves the way for the stasis to come later in the decade.
Ask follow-up questions or revisit key timestamps.
This video explores the history and downfall of Intel's ambitious Itanium processor project. Initiated in the 1990s as a partnership with Hewlett-Packard, Itanium aimed to move computing beyond the 32-bit x86 architecture using a revolutionary philosophy called VLIW (Very Long Instruction Word). Despite spending billions, the project was plagued by technical delays, a shrinking software ecosystem, and the unexpected emergence of commodity compute clusters. Ultimately, AMD's approach of extending x86 to 64 bits won the market, leading to Itanium's gradual decline and eventual discontinuation in 2021.
Videos recently processed by our community