The Remarkable Computers Built Not to Fail
520 segments
In the late 1970s, Tandem Computers exploded onto the scene with a remarkable product.
Computers designed and built not to fail. The Tandem NonStop.
With their legendary reliability, Tandem's computers ran in banks,
market exchanges, and critical industries.
In this video, the rise and fall of Tandem Computers. And yes they are still around today.
## Beginnings
In the early 1970s, Jim Treybig was working as a
marketing manager at the American technology giant Hewlett-Packard.
Born in Texas, Treybig joined Hewlett-Packard after some time
at Texas Instruments to help sell their new commercial minicomputer, the HP3000.
His manager was a guy named Tom Perkins. Hmm. That
last name sounds familiar. I wonder what venture capital firm he will found later.
Anyway. At HP, Treybig - the last name appears to be of German origin in case
you are wondering - competed with the commercial computing colossus, IBM.
It is hard to convey just how difficult that was back then. With top-to-bottom
vertical integration, seemingly unlimited resources, powerful software lock-in,
titanic economies of scale, massive sales force, and sterling reputation, IBM felt unassailable.
Add the fact that the early 1970s were an economically challenging time.
High energy prices. Inflation. And interest rates at 21%! Who can afford to develop a new computer?
By 1974, mainframe pretenders like GE, NCR, RCA,
and Siemens had all evacuated the dance floor. IBM was eating the computer world.
Survival meant finding a niche that IBM either could not compete in or did not
care to. Like Digital Equipment or DEC, which built a strong position
in smaller minicomputers like the VAX. But what can that niche be?
## The Computer Steps Up
The late 1960s and early 1970s changed the role of the computer in the back office.
Computers had existed, of course. But up until then, its role in the back office
was a supporting one. Humans did the actual processing work during the day.
And the computer handled background processes in batch jobs that run overnight or during
off-hours. Examples of typical tasks would be like bookkeeping of the day's transactions,
consolidating customer accounts and what not.
However, using the computer this way led to several nagging problems. Treybig in
an oral history for Stanford recalls seeing the major implications at the
Holiday Inn. Chingy and Snoop would approve.
He and his team at HP (Treybig, not Chingy) had sealed a contract to sell 50 computer systems
to the Holiday Inn hotel. The hotel needed the computers to deal with a dine-and-dash problem,
where customers would eat at the hotel breakfast, and then immediately check out.
Because the breakfast order had not yet been processed into the
customer's account - it would have happened overnight via the
batch processing job - the hotel did not know to charge the customer for the meal.
A similar issue was occurring in ATMs. In the first half of the decade, ATMs
were installed in or outside of a bank branch and operated with limited hours of operation.
And the banks did this essentially to differentiate themselves and get customers
through the door so they can be upsold other products like loans. Half of these ATMs were
"off-line", meaning that they ran locally and were not connected to the central ledger.
This made them vulnerable to fraud. Treybig recalls in the city of Phoenix how thieves
would go to multiple ATMs around the city overnight and withdraw $10 from each of them.
Since the $10 transaction was only stored locally at the ATM and would not be consolidated into the
central ledger until later, the thief can drain all the money in a person's account and more.
The underlying issue was not being able to process and post transactions to the
central ledger in real time. This pushed companies to build
large systems capable of doing this without human intervention.
## Fail Whale
The problem however is that once you place your business in a computer system's metaphorical
hands, then it - like as someone says in Lord of the Rings - cannot fail.
When the computer does batch jobs, a failure was not all that big of
a deal. The general response to such has been and still is to reboot the system.
To quote an iconic television show, have you tried turning it off and on again?
Yes it almost always works, but look at the time you wasted. From the time when operator
first notices the failure, snapshots the error code for future review, reboots,
and gets back to work - you have lost about a hour, hour and a half.
Most systems had about 99.6% availability. Which sounds high,
but that means a failure once every two weeks, which nobody will accept from a computer on the
front lines. Imagine the NYSE stopping for an hour and a half every two weeks.
(Funny enough, they once shut down the stock exchanges every Wednesday
in late 1968 just to handle all the paperwork)
Moreover, you had to deal with the fact that even if the computer faults can be
quickly fixed and recovered from, you needed to ensure that no data had been corrupted.
## Fault Tolerance
Back in the the 1960s, IBM and others achieved reliability with redundancy. Treybig said:
> In 1973, there would be two IBM systems at a newspaper and people
would punch the paper tape that had the articles for the newspaper and
then they’d feed it into a computer and then it would set the type.
> If the computer failed, then they would take their paper tape ... And they’d walk
over to the other computer and put it in. That made it a Fault-Tolerant system
Buy two big IBM mainframes instead of one. If one fails,
use the other. Fault tolerance as designed by IBM salesmen. And it
works! Except for the part where you pay for two systems to get the performance of one.
Other companies tried customization by adding multiple processors to a
single-processor system. So if one breaks down,
then the system can hot-switch to the backup processor, waiting on standby.
This issue however was that these systems were still originally designed
to be single-processor. So even once customized,
there remained single hardware points of failure that can collapse the system.
Like for instance the I/O bus switch - which connected microprocessors to peripherals. Or
the I/O controller, which translates the microprocessors' commands. Or the common
memory modules, the RAM. Failures there can still happen, and collapse the system.
Larger companies started building even more custom systems. Bank of America for
instance commissioned a company to produce a massive "cluster" of minicomputers to handle
ATMs. But this meant a lot of custom software which was extremely expensive.
What people needed was a system built and designed from the ground up for fault-tolerance,
and also protected user's data. So how about if someone just sold them something turnkey?
## Founding Tandem
And that was what Jimmy Treybig, then just 33 years old, pitched to
Hewlett-Packard's leadership. A computer built from the ground-up not to fail.
Unfortunately, HP was not interested in the idea. Their bread and butter
was in scientific instruments, which was then going really well.
But in 1973, Treybig's old manager Tom Perkins had left to start a venture capital firm with former
Fairchild Semiconductor cofounder Eugene Kleiner. They called it, Kleiner Perkins.
Tom Perkins not only initially invested $50,000 into Treybig's idea but also offered to make him
a temporary partner at the firm. There, Perkins helped Treybig craft his business plan and hone
the elevator pitch. Even took Treybig to Brooks Brothers and paid for his suit.
After its start in 1974, Tandem Computer eventually
raised $3 million of additional financing from Kleiner and others,
and hired away several major names from Hewlett-Packard to work on the idea.
## Tandem's Approach: Hardware
In early December 1975, Tandem announced the Tandem/16 NonStop, later just the NonStop I.
Jimmy Treybig likened the system's philosophy to that of an airplane, saying:
> An airplane could have one engine and go 600 miles an hour. Or it could have two engines and
go 600 miles an hour, but if one failed it could go three hundred miles an hour.
> So the idea is to have modules that work together to do the workload, and you have to
buy a computer for the peak workload, so if it fails most of the time it doesn’t even slow down
From the very beginning, the Tandem 16 was built to be a
multi-computer system. It was made up of 2 to 16 "processor modules".
Each module equipped with its own silicon processor, memory, IO, and copy of the
operating system. They all work independently of each other. They even all have their own power
supply and battery backup so that power can be cut to one module without affecting the others.
This makes it easy to replace a broken module. Just take it
out and replace it with a new one. The other modules work without a problem.
Tandem designed it this way because historically,
maintenance has been a source of outages. But it also happened to make the system linearly
expandable. You can scale its capabilities just by adding new modules. A nifty perk.
These processor modules are connected via two independent Inter-Processor
Buses called the DynaBuses. Each designed such that if one failed,
the other can handle all traffic and smoothly step up if something happens.
## Tandem's Approach: Software
Hardware mattered to Tandem's approach, but the company's real "secret sauce" was its software.
Tandem used software to orchestrate fault-tolerance, built on top of a custom
operating system called the Tandem Transactional Operating System. Later renamed to Guardian.
Its approach centered on a few core concepts:
Message-passing, Failing Fast, and "Process Pairs".
First, the message-passing. Since sharing memory can cause faults,
modules only communicate with one another using 16-bit messages sent through the DynaBuses.
Second is Fail Fast. If a module realizes that something is wrong with itself, it is instructed
to do what normal human cells do when they mutate. They shut down and remain so until reloaded.
Third are the Process Pairs. Every program is run by two
processor modules - a primary and backup - working independently and asynchronously.
As it works, the primary module frequently checkpoints its critical
state information by sending messages to the backup before certain events.
The programmer has to tell the module when to send the checkpoint status.
Simultaneously, modules scan their current status every second. If everything is good,
they send out a special message through both DynaBuses to indicate that they are "alive".
They even send it to themselves to check whether their buses are still working.
Each processor module checks for an "I'm alive" message from its neighbor.
If no such message is received from a module for two cycles,
then that module is presumed to have failed and shut itself down.
If the primary module fails, then the system reassigns its work,
disks, and network lines to the backup. The system might slow down slightly as a result,
but it does not crash. Which was what mattered.
Occasionally an "I'm alive" message comes late for whatever reason - an
error or power interruption. The risk then is that some processor modules assume failure
while others do not. This can lead to a split-brain situation and corrupted data.
So Tandem created this cool "regroup" algorithm. It is kind of like in that video game "Among Us"
when someone calls an emergency meeting. I don't know if anyone watching will get that reference.
Anyway, during a regroup, the processor modules all get together and vote on
whether the module really has "returned". After two rounds, they come to a consensus.
This software-centric approach posed some issues for application providers.
One particularly tricky thing was learning how to checkpoint applications.
Making sure that the primary node updated the backup node frequently
enough that if it did crash then the backup can pick up the pieces.
Even Tandem people acknowledged this was not easy.
Application programmers also had to initially learn Tandem-proprietary languages like Tandem
Application Language for system-level stuff and Screen COBOL for application-level stuff.
## Tandem and the ATM
Tandem's first sale actually happened because of a Business Week article.
The December 1975 article discussed the significant and growing market opportunity
in "failsafe" computer systems. At the time, the Tandem/16 was not yet ready so the article
focused on Treybig himself, his investors, and the team of talents poached from HP and others.
Nothing all that special, but enough to interest a team at Citibank to submit Tandem’s first
order - shipped in May 1976. Funny enough, Treybig says that they never really used that machine.
They bought it just so that they can say they were up to date on tech trends.
But the company did find its true niche servicing the meteoric rise
of ATMs - the major market opportunity that Treybig first saw in the 1970s.
In the second half of the 1970s, ATMs turned from a simple carnival show curiosity to a
genuine cost-savings tool. In the United States and abroad, ATMs helped banks provide banking
services without also growing labor costs - which they welcomed during the inflationary 1970s.
And over time, ordinary folks overcame their initial fears and embraced the ATM. Thanks in part
to aggressive marketing from forward-thinking banks like Citibank and Chase Manhattan.
And significant events like snowstorms that snowed out
the banks and gave people no choice but to use the accursed machines.
The Tandem NonStop systems helped too. Their reliability and 24/7 availability
helped people gradually trust that the machine would not "eat" the cards mid-transaction.
In 1978, America had less than 10,000 ATMs. By 1990, there were over 80,000 ATMs,
facilitating 450 million debit transactions each month and driving Tandem forward.
In 1977, three years after launch, Tandem did $7.7 million in revenue
and IPO'ed in December. They were one of two major wins in Kleiner
Perkins' first fund. The other being the ground-breaking biotech firm Genentech.
## From ATM to EFTs
After winning in ATMs, Tandem expanded into electronic fund transfers.
Once connected to a central ledger, ATMs no longer had to be in or near the bank
branches anymore. Banks started putting them at train stations, grocery stores,
airports, and anywhere else people needed quick cash.
Banks then realized that they can set up and share networks of ATMs,
spreading out the upfront costs of installation. In the early 1980s,
you started to see regional ATM networks like Yankee24 in New England or NYCE in New York.
This accelerated in 1985 when the US courts ruled that ATMs were not bank branches - allowing
networks to expand across state lines and become national networks like STAR, PULSE, or Cirrus.
We call these Electronic Fund Transfer networks. And because they serve millions of people across
different banks 24/7, reliability and availability is of the utmost importance.
Behind the scenes, networks used "switches" to process, route and authorize transaction
requests. As the network's transaction hub, the switch cannot fail. Otherwise,
millions of people can't get their money or worse, their cards get eaten.
Tandem's NonStop systems became the go-to default.
Cirrus for instance used a NonStop II system with four processor modules.
The Visa and Mastercard card networks used Tandem too. As these credit card networks
gained global prominence, they brought Tandem's systems to Asia, Latin America, and Europe.
The US Treasury even used a NonStop system in 1981 to build an EFT network capable of handling
$100 billion of electronic fund transfers. Hard to imagine a sturdier certification.
The company's banking and finance successes gave them a reputation. If people went to IBM
because no one ever got fired for choosing IBM,
people chose Tandem because they needed computers that did not die.
Industrial companies all over the world in North America and
Europe adopted Tandem to help run their mission-critical systems.
There was even a dedicated Tandem distributor in Malaysia,
serving banks and financial institutions in the ASEAN region.
As they went from win to win to win, Tandem's sales doubled each
year over its first six years - hitting $312 million in 1982.
Such growth is virtually unheard of in the computer hardware industry.
## Working at Tandem Tandem was known for its unique corporate culture.
Jimmy Treybig came from Hewlett-Packard - famous for its "HP Way", which valued
integrity and trust. Treybig brought it with him to Tandem, and he was very proud of it.
Tandem wanted workers to all act like owners. So they issued stock options to every employee,
not just the execs. It made them all rich. By mid-1982, two dozen
employees were millionaires. Another hundred employees owned stock worth at least $500,000.
Tandem employees also got other things like flexible work hours,
open door policies, and sabbaticals every four years.
To encourage cross-department communication,
each company location held a beer and popcorn get-together on Friday
afternoon. And the company's Cupertino headquarters had a pool. That's nice.
None of this feels very special nowadays, but for the late 1970s and early 1980s it
was pretty radical. There were murmurs at the time that maybe things were a
little too insular. Cultish even. And that there were maybe too few meetings.
But it worked for a while. One thing people pointed to was the employee turnover:
Just 8%, a fourth that of other North American tech companies.
## Stratus Unfortunately, this culture tightened up as
the company's growth slowed and new competition emerged.
Challengers immediately emerged out of the dirt like sprouts after the rain.
But the one that took root was the VC-funded startup Stratus in Boston.
Founded in 1980 by a former Tandem employee, they had a hardware-only solution called "Lockstep".
In lockstep, pairs of ordinary, off-the-shelf Motorola processors run the same task.
The output is evaluated and if differences exist,
then that pair is shut down and a second takes over.
Stratus's systems cost less than Tandem's - about $200,000 compared to several million
dollars. And can supposedly achieve the same fault-tolerance without needing to
adopt Tandem's complex software concepts like checkpointing.
Nicholas J. Bologna, Stratus' director of product engineering told the New York Times in 1982:
> "Tandem has claimed for years that it has a six-year jump on the field because
of its software ... but we leapfrogged that distance by not needing software."
Tandem hit back. Tandem software architect Jim Gray argued that
Lockstep handles hardware failures fine but does
nothing for software failures. Both processors run the bug the same way.
Nevertheless, Tandem adjusted its technical approach. Gray distanced the company from
manual checkpointing practices - recognizing that it was hard for programmers to do.
## Slowing Down and Bouncing Back
Stratus' competition and growth led to Tandem’s first major slowdown.
After doubling sales so many years, Tandem's revenue growth in 1982 fell
to 50% (still good). Then fell again to 34% in 1983 (still okay). And then 27% in 1984. Uh oh.
Such a slowdown is always difficult. In May 1984, the company announced that
earnings would decline 70% year over year and the stock price clanked 25%.
To top it all off, in October 1984, the US securities regulator accused the company
and three executives of inflating earnings by improperly recognizing
certain orders. They settled with a small slap on the wrist.
After this, Treybig held more meetings to keep executives accountable and issued more direct
orders rather than coming to consensus. A new audit team was spun up, salaries were frozen,
manufacturing processes were revamped, and costs were cut.
They also struck partnerships with companies like Motorola to share
R&D costs on new products like the Tandem VLX chip. That chip powered
a new lower-end mainframe called the NonStop EXT that sold well in 1986.
That along with new customer wins like Texaco, GTE Corporation, and the US Air Force Logistics
Command helped turn things around. In 1986, gross margins increased to 68% and profits rose 148%.
## Opening Up
But a major problem of “good enough” hardware remained.
As Stratus' chairman William Foster explains, in the late 1970s computer hardware cost more than
software. So it made sense for Tandem to sell a software-centric approach to fault-tolerance.
But then the hardware got cheaper, better and more reliable. And the software got
more expensive thanks to rising programmer salaries. As a result of software inflation,
businesses now want their applications to be portable - developed on top of
open platforms like UNIX so it can be easily moved to new hardware.
The problem was that Tandem's solution was all proprietary. Once hardware got good enough for
most companies, who would want to buy such expensive proprietary hardware and develop
in a proprietary Guardian OS ecosystem unless they had the most extreme demands?
A perfect example of this was in 1987, when Tandem lost a massive $1 billion contract to help build
the US Veterans' Administration's decentralized IT system: the Composite Health Care System.
Tandem was expected to win the contract,
but lost to a company called Science Applications International Corporation
or SAIC. In part because the VA wanted to build on open standards like UNIX and MUMPS.
With the ground under them shifting, Tandem realized it had to expand out of its
fault-tolerant niche, open up its proprietary stack, and do it all without going bankrupt.
## New Verticals & Products
In the second half of the 1980s, Tandem spread out. Treybig said in a 1987 interview:
> "I used to not like that word, niche ... but I don't think it's negative. We're choosing
niches. By choosing where we fight and win we can be quite a big, profitable company"
They struck partnerships to enter several verticals and diversify away from banking
and finance. Targets included government, telecommunications, retail and transportation.
There was some success with market exchanges. In the late 1980s,
over two dozen market exchanges - ranging from the Chicago Mercantile Exchange to the Nasdaq to
exchanges in Hong Kong and New Zealand - adopted Tandem to handle ever-rising transaction volumes.
In 1988, they spent $280 million to buy a company called Ungermann-Bass,
hoping to leverage the latter's expertise in Local Area Networks to expand their breadth.
In that same 1987 interview, Treybig admitted that
they had gotten a bit too insular and over-indexed on high-end customers.
> We didn't do a good job on third parties. We didn't build
products in the lower price range and that was a strategic mistake.
So they experimented with lower-end versions of their products. Some did
better than others. A failure was the line of MS-DOS workstations called the Dynamite.
Tandem was trying to grab a piece of the growing PC market. Unfortunately
Dynamite was incompatible with the IBM ecosystem and failed on arrival.
They replaced it a year later with a fully PC AT-compatible workstation.
Two products that did a little better were the Tandem NonStop CLX and LXN.
The CLX was a low-end line of computers that can cost as little as
$57,000 - making it the cheapest thing Tandem had in their lineup.
Tandem made the product after noticing that while big banks bought NonStop mainframes
for their headquarters, they hesitated to do the same for their smaller, regional offices.
It was also kind of cool because it had their first CMOS chip, self-designed and
fabbed at VLSI Technology. If you know this channel you know why I care about that.
The LXN is particularly interesting because it was their tentative first step into UNIX. However,
the rest of their lineup still ran their proprietary Guardian OS.
This era also saw one of the company's most iconic technical legacies: Tandem NonStop SQL.
NonStop SQL is a relational database management software that supports the
SQL query language. Tandem introduced it in 1987
as a ground-up replacement for an older database called Encompass.
It gained some renown for its groundbreaking high availability as well as its linear
scalability. They are still using it today in certain financial applications.
## Cyclone And in May 1989, it helped Tandem beat IBM in a
head-to-head bake-off for a large contract for the California DMV.
NonStop SQL beat IBM's iconic DB2 database in five
of the seven technical criteria - and Tandem bid less than half of IBM. The
big win convinced Tandem that they were finally ready to take the Beast head-on.
Thus in October 1989, Tandem brought forth the NonStop Cyclone. One of
the finest devices that the company would ever produce,
and a direct broadside against IBM's dominance in "big boy" mainframe computers.
The Cyclone's big differentiator was the adoption of superscalar processing - where
the chip initiates several instructions and does them all at once. Thus multiplying
the potential throughput per processor without needing to ramp up clock cycles.
At the same time, the Cyclone retained many of the NonStop
line's reputed fault tolerance - even adding new advanced technologies for it.
Huge RAM redundancies were built into the boards.
The Dynabus interconnects were updated from copper to fiber optics using light
signals. The cabinets themselves had an extra, backup power supply just in case.
The cabinets were also thoughtfully engineered with minimum cabling to
make replacement via hot-swapping much easier.
It also looked rad. Witnesses recall the Cyclones being physically large and imposing,
with big black cabinets and loud air-fans.
Previously, Tandem's systems just helped collect real time data. Now,
they wanted everyone to know that they had the "Big Iron" hardware and efficient relational
database software to run jobs on all that data. Sold at a third the price of an IBM system.
Jeffrey Beeler, an industry analyst at Dataquest said:
> Tandem is entering the most significant phase in their corporate evolution.
Instead of asking users to give them transactions they
have previously run on an IBM system, they are saying 'Give us everything'.
To promote the Cyclone, Tandem crowed their successful DMV win and performance benchmarks
from the highest mountains. But inside, a debate raged over the company's future.
Oh by the way. That DMV project ended up a $49 million, 7-year debacle. It wasn't Tandem's
fault. The DMV caught most of the blame, with four employees disciplined for literal fraud.
But $18 million of Cyclone mainframes ended up being sold as surplus for pennies on the dollar.
## Open is Death
In 1989, a major telephone customer with $100 million in cumulative
sales - probably AT&T - told Tandem that they were switching to UNIX.
And if Tandem did not get on board, then they would cut the cord and leave. There
was no convincing them. It was either get UNIX or get out. This sparked a massive internal debate.
Tandem did release one UNIX product before. But a small one. Executives
recognized that it was their proprietary ecosystem of Guardian OS applications
that kept customers on board and buying their expensive new hardware.
Treybig said it best when confronted with the notion: "Open is death". Meaning that
if they switched to UNIX and ported their applications to it, then customers can bolt
to any UNIX-compatible vendor offering a wink and cheaper price. UNIX commoditizes Tandem.
But the customer insisted and they were just too large to ignore. So
Tandem decided to saw the baby in half and add a new lower-end SKU
to the product lineup rather than doing a full-throated transition.
In March 1989, Tandem announced what they hailed as the first fault-tolerant UNIX mainframe,
powered by MIPS Technologies' RISC chips and System V UNIX. They called it the Integrity S2.
Tandem worked hard to produce a UNIX-compliant system with the
same hardware modularity and software redundancy that Tandem was known for.
It was not easy, since they were using off-the-shelf chips. Their technical
paper lists innovations like the idea of "Virtual Time", which synchronized
the three non-lockstep CPUs based on instruction counts rather than clock cycles.
Treybig said shortly before the release:
> "Lots of companies have tried to build good nonstop
systems with Unix and none succeeded. We have".
Finally released in January 1990, the S2 sold very well with telecommunications companies.
AT&T partnered with Tandem and started marketing it to their various government and tech partners.
Entering the 1990s, things seemed to be on the right track. In late October 1989,
Tandem announced that revenues grew 24% to $1.6 billion. Income also grew to $118
million. The stock rose 40% from March 1989 to January 1990. Treybig told the SF Chronicle:
> I probably feel better than I have for eight years.
## End of the Mainframe
Tandem and the rest of the industry began the year with
high hopes for their beautiful Cyclone mainframe.
Then in February, the big investment bank Drexel Burnham Lambert - home to Mike Milken,
the junk bond king - unexpectedly collapsed.
The bankruptcy ends an era of "easy money" on Wall Street. Rumors of other
financial turbulences roiled the markets throughout the first half of the year.
The downturn smashed Tandem's core market, which can no longer justify spending $2
million on a big iron mainframe. At the same time, competitors from DEC to Stratus to a
revitalized IBM started cutting prices in an attempt to peel away Tandem's top customers.
In April 1990, Tandem admitted that profits will decline from
the previous quarter, and the stock yeet'ed 17%.
Things only got worse in August 1990, when Iraq invaded Kuwait, kicking off the First
Gulf War - spiking oil prices and spreading the downturn to the whole world economy.
Ahead of the company's fiscal year close in September 1990, executives warned that
things were as bad as they had ever seen. They predicted that the year would see
zero revenue growth, lower profits and maybe even layoffs for the first time since 1974.
Treybig said in early September:
> "You could probably say it's a disaster in the UK, and to some degree in the Southeastern US"
While economic growth in the US returns after less than a year,
the jobs do not return as quickly. Perhaps due to sluggish construction activity,
a weak real estate market, and layoffs in the defense industry.
In 1990 and 1991, Tandem's growth was essentially stagnant at around $1.9 billion.
They blamed the economic downturn in the wider computer market, but the real reason
was that customers were preferring to buy cheap, high-end commodity UNIX boxes from HP and DEC.
Maybe even clustering them together for high availability. Not as fault-tolerant
as Tandem can do, but good enough for most. And that was bad for Tandem.
## Himalaya
As Tandem's sales wriggled in stasis, executive debate roared back up again.
The Integrity S2 did not quell demand for a UNIX-ified Guardian OS. Rather,
it only showed customers that it was technically possible,
and that the only reason they were not getting it was deliberate lock-in.
In 1991, Tandem's successor to the Cyclone - called Himalaya-
would not be ready until late 1993.
But the customers were already angling to leave - including one of the largest,
San Francisco-based Wells Fargo.
Wells Fargo ... oh, you mean the guys who defrauded their customers with empty accounts?
Anyway. The Himalaya was not a bad computer.
Its performance was measured at two times faster than Cyclone.
And Himalaya's "NonStop Kernel" software can
support both Guardian OS and POSIX-flavor UNIX applications. It was not a full port,
but Tandem felt that being able to dual-run apps can calm customer chatter.
So yeah, fine computer. But rather than announcing it ASAP while also cutting prices on the older
Cyclone, Treybig decided to milk the old cow and keep things quiet until the new cow was ready.
This gave the effect of the company doing nothing on the product front between 1990 and 1993.
They did lay off about 600 people, but essentially fiddled while Rome burned.
During a mid-1993 meeting with analyst firm META Group - no
relation to Facebook - Treybig was told in no uncertain terms that they had made
a terrible mistake. And that two major London banks were preparing to defect.
So he goes and announces it in mid-1993, several months before it is to be released.
He tried to frame the launch as evolving the company towards a future of "client-server"
architectures. Where their NonStop mainframes can power online services like e-mail and file
sharing - then seen as "business-critical" as the banking and finance stuff.
What customers heard, however, was that the Himalaya had twice the performance for
a third to sixth the price of the Cyclones. But it wasn't ready yet, so sales collapsed.
In July 1993, Treybig announced the company's worst quarterly
loss in history, a jaw-dropping $550 million.
About $450 million of that were write-offs and costs associated with plant closures,
consolidations, and a 1,800 person layoff, about 15% of the company.
Treybig said later that he had stalled it as long as he could, but:
> However, we are aware that becoming a low price provider requires we adopt a new business model if
we are going to achieve our profitability goals ... This requires more than just cutting costs;
it requires re-engineering the business, changing our culture,
and continuously reducing our cost structure
The Himalaya computers did sell 500 units in the third quarter - double of that in the prior two
quarters. And it went on to sell well in finance and banking thanks to its strong
clustering abilities. However, it could not pull the company out of its rut and long-term decline.
## Treybig is Out
Treybig founded the company in his early thirties and had led it for over two decades.
He sparked their early success and crafted their unique,
people-centric culture. But he also failed to position the company for success in the
new world of commodity hardware, open systems, and distributed clusters.
In the end, the board had to do something. In October 1995,
after yet another disappointing financial quarter, the company announced that it was
reshuffling its senior management and that Treybig would step down.
A month later, the company named Roel Pieper as the new president and CEO.
A hard-charging sales executive, Pieper had headed Tandem's subsidiary UB Networks - the
former Ungermann–Bass - and turned them around.
Pieper's turnaround strategy was to make Tandem into a software company by getting the company's
previously proprietary technology off Guardian and into as many hands as possible. He said in 1996:
> The company is in a position where it is technology rich and maybe image poor.
Such technologies included ServerNet, a fabric communications link that replaced
their old DynaBus interprocessor buses. It was well received - one of Himalaya's
strengths had been clustering computers - but exclusive to their Guardian OS systems.
Pieper shared the technology to the desktop PC maker Compaq as part of a
larger strategy to enter the market of Windows NT-enabled servers and
turn ServerNet into a commercial standard. Compaq would build servers using ServerNet.
In May 1996, Tandem announced a deal with Microsoft, which made the then-ascending NT
server OS. Microsoft paid $30 million to bring Tandem's fault-tolerance to NT. And Tandem
would port NonStop SQL to NT and sell Windows NT NonStop plain-vanilla servers using Intel chips.
I can see why Microsoft does it. They get to say they have brought
mission-critical level fault-tolerance and a nice database software to their
OS - a boon at a time when NT was still fighting for dominance in the enterprise.
Tandem on the other hand seemed to be taking a major risk. 70% of revenue still came from
its proprietary systems, and now they were porting those to a rival platform.
It felt like they abandoning the Himalayas, though Tandem assured old customers it would
still be supported - with "paths" of migration opened up to NT. Why do this?
One Tandem SVP of marketing explained that the old way had put the company in gridlock:
> Our strategy before was to take the Himalaya and
make it cheaper ... and there's only so cheap you can make a platform ...
> Now [Pieper has] got Tandem out of this gridlock. We're saying
it's OK to compete and cooperate with a competitor
But outsiders were cautious. One analyst said that they basically "gave their Crown Jewels away".
Others said that though it was a necessary shock to get Tandem to think like a software company,
it remained unclear how much money the strategy would actually make.
We never really figured out whether this breakneck plan to turn the mainframe-centric
Tandem into a software company would have worked because about a year later, they got bought.
## Acquisition
In June 1997, the company was acquired by the PC giant Compaq for about $3 billion in stock.
I covered this briefly in the video about Compaq, though I got the price wrong. My bad.
Compaq CEO Eckhard Pfeiffer wanted to expand into the enterprise. He wanted
to add Tandem's 4,000-person sales staff to the 2,000 they
already had so they can sell more of their computers to enterprises.
The fast-growing company offered a 50% premium on Tandem's latest stock price.
Considering how the stock price - and the business - had been stagnant for years,
I reckon management felt they had to take it.
Asked about the deal, Treybig said that Tandem can help Compaq deal with that world of enterprise
sales. He also discussed Tandem's leadership in cluster NT systems and database software.
Compaq didn't get much chance to leverage any of that because they
then swallowed up the ancient - and flailing - computer giant DEC for nearly $10 billion.
And then was itself swallowed up by Hewlett-Packard in the early 2000s.
There is something poetic about that. Created as an idea inside HP. Ended up inside HP.
## Conclusion After the acquisitions, HP positioned Tandem as a
high-end fault-tolerant brand within its product lineup.
Meanwhile, the company completed its standardization onto Intel hardware,
the Itanium. But the Itanium itself was a bit of a strange duck within the Intel roadmap.
So a few years later they switched again with the NonStop X in 2014,
which used industry-standard Intel Xeon processors.
HP held it up as a sign of their commitment to the brand. Tandem fans hailed the decision.
A year later, HP split into two companies:
HP and HPE. Tandem went with HPE where it remains today. Their hardware continues to
get new Intel Xeons and power the world's mission-critical IT infrastructure.
Google and Amazon with their big compute clusters might be bigger and more scalable.
But those seem to crash all the time nowadays. The Tandems powering Visa,
Mastercard, and our ATMs on the other hand are still chugging along.
Ask follow-up questions or revisit key timestamps.
The video chronicles the rise and fall of Tandem Computers, a company founded in the 1970s by Jim Treybig. Tandem specialized in creating highly reliable, fault-tolerant computer systems designed for mission-critical industries like banking and stock exchanges. Their innovative hardware and software, particularly the NonStop system, allowed businesses to process transactions in real-time, overcoming the limitations of earlier batch processing. Despite facing stiff competition and a changing technological landscape that favored open systems, Tandem carved out a significant niche, especially with the proliferation of ATMs and Electronic Fund Transfers. The company culture was also distinctive, emphasizing employee ownership and well-being. However, as the market shifted towards cheaper, commodity hardware and open-source software, Tandem struggled to adapt. Following a period of internal debate and product evolution, including the development of the Cyclone and Himalaya systems and an attempt to embrace UNIX, Tandem was eventually acquired by Compaq in 1997, and its legacy lives on within Hewlett Packard Enterprise.
Videos recently processed by our community