HomeVideos

The Remarkable Computers Built Not to Fail

Now Playing

The Remarkable Computers Built Not to Fail

Transcript

520 segments

0:02

In the late 1970s, Tandem Computers exploded  onto the scene with a remarkable product.

0:09

Computers designed and built  not to fail. The Tandem NonStop.

0:15

With their legendary reliability,  Tandem's computers ran in banks,  

0:19

market exchanges, and critical industries.

0:23

In this video, the rise and fall of Tandem  Computers. And yes they are still around today.

0:29

## Beginnings

0:33

In the early 1970s, Jim Treybig was working as a  

0:37

marketing manager at the American  technology giant Hewlett-Packard.

0:41

Born in Texas, Treybig joined  Hewlett-Packard after some time  

0:45

at Texas Instruments to help sell their  new commercial minicomputer, the HP3000.

0:52

His manager was a guy named Tom Perkins. Hmm. That  

0:56

last name sounds familiar. I wonder what  venture capital firm he will found later.

1:02

Anyway. At HP, Treybig - the last name  appears to be of German origin in case  

1:08

you are wondering - competed with the  commercial computing colossus, IBM.

1:13

It is hard to convey just how difficult  that was back then. With top-to-bottom  

1:18

vertical integration, seemingly unlimited  resources, powerful software lock-in,  

1:24

titanic economies of scale, massive sales force,  and sterling reputation, IBM felt unassailable.

1:33

Add the fact that the early 1970s  were an economically challenging time.  

1:38

High energy prices. Inflation. And interest rates  at 21%! Who can afford to develop a new computer?

1:47

By 1974, mainframe pretenders like GE, NCR, RCA,  

1:51

and Siemens had all evacuated the dance  floor. IBM was eating the computer world.

1:57

Survival meant finding a niche that IBM  either could not compete in or did not  

2:03

care to. Like Digital Equipment or  DEC, which built a strong position  

2:08

in smaller minicomputers like the  VAX. But what can that niche be?

2:15

## The Computer Steps Up

2:15

The late 1960s and early 1970s changed the  role of the computer in the back office.

2:22

Computers had existed, of course. But up  until then, its role in the back office  

2:26

was a supporting one. Humans did the  actual processing work during the day.

2:32

And the computer handled background processes  in batch jobs that run overnight or during  

2:38

off-hours. Examples of typical tasks would be  like bookkeeping of the day's transactions,  

2:45

consolidating customer accounts and what not.

2:49

However, using the computer this way led  to several nagging problems. Treybig in  

2:54

an oral history for Stanford recalls  seeing the major implications at the  

2:59

Holiday Inn. Chingy and Snoop would approve.

3:03

He and his team at HP (Treybig, not Chingy) had  sealed a contract to sell 50 computer systems  

3:10

to the Holiday Inn hotel. The hotel needed the  computers to deal with a dine-and-dash problem,  

3:16

where customers would eat at the hotel  breakfast, and then immediately check out.

3:23

Because the breakfast order had  not yet been processed into the  

3:26

customer's account - it would  have happened overnight via the  

3:30

batch processing job - the hotel did not  know to charge the customer for the meal.

3:36

A similar issue was occurring in ATMs.  In the first half of the decade, ATMs  

3:42

were installed in or outside of a bank branch  and operated with limited hours of operation.

3:48

And the banks did this essentially to  differentiate themselves and get customers  

3:52

through the door so they can be upsold other  products like loans. Half of these ATMs were  

3:59

"off-line", meaning that they ran locally  and were not connected to the central ledger.

4:05

This made them vulnerable to fraud. Treybig  recalls in the city of Phoenix how thieves  

4:11

would go to multiple ATMs around the city  overnight and withdraw $10 from each of them.

4:18

Since the $10 transaction was only stored locally  at the ATM and would not be consolidated into the  

4:26

central ledger until later, the thief can drain  all the money in a person's account and more.

4:32

The underlying issue was not being able  to process and post transactions to the  

4:37

central ledger in real time.  This pushed companies to build  

4:41

large systems capable of doing  this without human intervention.

4:47

## Fail Whale

4:47

The problem however is that once you place your  business in a computer system's metaphorical  

4:53

hands, then it - like as someone says  in Lord of the Rings - cannot fail.

4:58

When the computer does batch jobs,  a failure was not all that big of  

5:03

a deal. The general response to such has  been and still is to reboot the system.

5:09

To quote an iconic television show, have  you tried turning it off and on again?

5:14

Yes it almost always works, but look at the  time you wasted. From the time when operator  

5:20

first notices the failure, snapshots the  error code for future review, reboots,  

5:25

and gets back to work - you have  lost about a hour, hour and a half.

5:30

Most systems had about 99.6%  availability. Which sounds high,  

5:36

but that means a failure once every two weeks,  which nobody will accept from a computer on the  

5:42

front lines. Imagine the NYSE stopping  for an hour and a half every two weeks.

5:48

(Funny enough, they once shut down  the stock exchanges every Wednesday  

5:51

in late 1968 just to handle all the paperwork)

5:56

Moreover, you had to deal with the fact  that even if the computer faults can be  

6:00

quickly fixed and recovered from, you needed  to ensure that no data had been corrupted.

6:07

## Fault Tolerance

6:07

Back in the the 1960s, IBM and others achieved  reliability with redundancy. Treybig said:

6:15

> In 1973, there would be two IBM  systems at a newspaper and people  

6:20

would punch the paper tape that had  the articles for the newspaper and  

6:24

then they’d feed it into a computer  and then it would set the type.

6:28

> If the computer failed, then they would  take their paper tape ... And they’d walk  

6:33

over to the other computer and put it  in. That made it a Fault-Tolerant system

6:39

Buy two big IBM mainframes  instead of one. If one fails,  

6:44

use the other. Fault tolerance as  designed by IBM salesmen. And it  

6:49

works! Except for the part where you pay for  two systems to get the performance of one.

6:55

Other companies tried customization  by adding multiple processors to a  

7:00

single-processor system. So if one breaks down,  

7:03

then the system can hot-switch to the  backup processor, waiting on standby.

7:09

This issue however was that these  systems were still originally designed  

7:13

to be single-processor. So even once customized,  

7:17

there remained single hardware points  of failure that can collapse the system.

7:22

Like for instance the I/O bus switch - which  connected microprocessors to peripherals. Or  

7:29

the I/O controller, which translates the  microprocessors' commands. Or the common  

7:36

memory modules, the RAM. Failures there  can still happen, and collapse the system.

7:43

Larger companies started building even  more custom systems. Bank of America for  

7:48

instance commissioned a company to produce a  massive "cluster" of minicomputers to handle  

7:55

ATMs. But this meant a lot of custom  software which was extremely expensive.

8:02

What people needed was a system built and  designed from the ground up for fault-tolerance,  

8:08

and also protected user's data. So how about  if someone just sold them something turnkey?

8:15

## Founding Tandem

8:15

And that was what Jimmy Treybig,  then just 33 years old, pitched to  

8:20

Hewlett-Packard's leadership. A computer  built from the ground-up not to fail.

8:26

Unfortunately, HP was not interested  in the idea. Their bread and butter  

8:30

was in scientific instruments,  which was then going really well.

8:36

But in 1973, Treybig's old manager Tom Perkins had  left to start a venture capital firm with former  

8:44

Fairchild Semiconductor cofounder Eugene  Kleiner. They called it, Kleiner Perkins.

8:50

Tom Perkins not only initially invested $50,000  into Treybig's idea but also offered to make him  

8:57

a temporary partner at the firm. There, Perkins  helped Treybig craft his business plan and hone  

9:04

the elevator pitch. Even took Treybig to  Brooks Brothers and paid for his suit.

9:09

After its start in 1974,  Tandem Computer eventually  

9:13

raised $3 million of additional  financing from Kleiner and others,  

9:18

and hired away several major names from  Hewlett-Packard to work on the idea.

9:23

## Tandem's Approach: Hardware

9:23

In early December 1975, Tandem announced the  Tandem/16 NonStop, later just the NonStop I.

9:32

Jimmy Treybig likened the system's  philosophy to that of an airplane, saying:

9:37

> An airplane could have one engine and go 600  miles an hour. Or it could have two engines and  

9:43

go 600 miles an hour, but if one failed  it could go three hundred miles an hour.

9:49

> So the idea is to have modules that work  together to do the workload, and you have to  

9:55

buy a computer for the peak workload, so if it  fails most of the time it doesn’t even slow down

10:02

From the very beginning, the  Tandem 16 was built to be a  

10:05

multi-computer system. It was made  up of 2 to 16 "processor modules".

10:11

Each module equipped with its own silicon  processor, memory, IO, and copy of the  

10:18

operating system. They all work independently of  each other. They even all have their own power  

10:24

supply and battery backup so that power can be  cut to one module without affecting the others.

10:30

This makes it easy to replace  a broken module. Just take it  

10:34

out and replace it with a new one. The  other modules work without a problem.

10:39

Tandem designed it this way because historically,  

10:42

maintenance has been a source of outages. But  it also happened to make the system linearly  

10:48

expandable. You can scale its capabilities  just by adding new modules. A nifty perk.

10:55

These processor modules are connected  via two independent Inter-Processor  

11:00

Buses called the DynaBuses. Each  designed such that if one failed,  

11:06

the other can handle all traffic and  smoothly step up if something happens.

11:12

## Tandem's Approach: Software

11:12

Hardware mattered to Tandem's approach, but the  company's real "secret sauce" was its software.

11:19

Tandem used software to orchestrate  fault-tolerance, built on top of a custom  

11:24

operating system called the Tandem Transactional  Operating System. Later renamed to Guardian.

11:32

Its approach centered on a few core concepts:  

11:36

Message-passing, Failing  Fast, and "Process Pairs".

11:40

First, the message-passing. Since  sharing memory can cause faults,  

11:45

modules only communicate with one another using  16-bit messages sent through the DynaBuses.

11:52

Second is Fail Fast. If a module realizes that  something is wrong with itself, it is instructed  

11:59

to do what normal human cells do when they mutate.  They shut down and remain so until reloaded.

12:06

Third are the Process Pairs.  Every program is run by two  

12:11

processor modules - a primary and backup -  working independently and asynchronously.

12:18

As it works, the primary module  frequently checkpoints its critical  

12:22

state information by sending messages  to the backup before certain events.  

12:28

The programmer has to tell the module  when to send the checkpoint status.

12:33

Simultaneously, modules scan their current  status every second. If everything is good,  

12:39

they send out a special message through both  DynaBuses to indicate that they are "alive".  

12:45

They even send it to themselves to check  whether their buses are still working.

12:50

Each processor module checks for an  "I'm alive" message from its neighbor.  

12:56

If no such message is received  from a module for two cycles,  

13:00

then that module is presumed to  have failed and shut itself down.

13:04

If the primary module fails, then  the system reassigns its work,  

13:09

disks, and network lines to the backup. The  system might slow down slightly as a result,  

13:15

but it does not crash. Which was what mattered.

13:19

Occasionally an "I'm alive" message  comes late for whatever reason - an  

13:23

error or power interruption. The risk then  is that some processor modules assume failure  

13:29

while others do not. This can lead to a  split-brain situation and corrupted data.

13:37

So Tandem created this cool "regroup" algorithm.  It is kind of like in that video game "Among Us"  

13:44

when someone calls an emergency meeting. I don't  know if anyone watching will get that reference.

13:50

Anyway, during a regroup, the processor  modules all get together and vote on  

13:55

whether the module really has "returned".  After two rounds, they come to a consensus.

14:02

This software-centric approach posed  some issues for application providers.  

14:08

One particularly tricky thing was  learning how to checkpoint applications.

14:12

Making sure that the primary node  updated the backup node frequently  

14:16

enough that if it did crash then  the backup can pick up the pieces.

14:21

Even Tandem people acknowledged this was not easy.  

14:25

Application programmers also had to initially  learn Tandem-proprietary languages like Tandem  

14:31

Application Language for system-level stuff  and Screen COBOL for application-level stuff.

14:38

## Tandem and the ATM

14:39

Tandem's first sale actually happened  because of a Business Week article.

14:43

The December 1975 article discussed the  significant and growing market opportunity  

14:49

in "failsafe" computer systems. At the time,  the Tandem/16 was not yet ready so the article  

14:57

focused on Treybig himself, his investors, and  the team of talents poached from HP and others.

15:04

Nothing all that special, but enough to interest  a team at Citibank to submit Tandem’s first  

15:10

order - shipped in May 1976. Funny enough, Treybig  says that they never really used that machine.  

15:19

They bought it just so that they can  say they were up to date on tech trends.

15:25

But the company did find its true  niche servicing the meteoric rise  

15:29

of ATMs - the major market opportunity  that Treybig first saw in the 1970s.

15:35

In the second half of the 1970s, ATMs turned  from a simple carnival show curiosity to a  

15:42

genuine cost-savings tool. In the United States  and abroad, ATMs helped banks provide banking  

15:49

services without also growing labor costs - which  they welcomed during the inflationary 1970s.

15:56

And over time, ordinary folks overcame their  initial fears and embraced the ATM. Thanks in part  

16:03

to aggressive marketing from forward-thinking  banks like Citibank and Chase Manhattan.

16:09

And significant events like  snowstorms that snowed out  

16:12

the banks and gave people no choice  but to use the accursed machines.

16:18

The Tandem NonStop systems helped too.  Their reliability and 24/7 availability  

16:24

helped people gradually trust that the machine  would not "eat" the cards mid-transaction.

16:32

In 1978, America had less than 10,000  ATMs. By 1990, there were over 80,000 ATMs,  

16:39

facilitating 450 million debit transactions  each month and driving Tandem forward.

16:46

In 1977, three years after launch,  Tandem did $7.7 million in revenue  

16:52

and IPO'ed in December. They were  one of two major wins in Kleiner  

16:57

Perkins' first fund. The other being the  ground-breaking biotech firm Genentech.

17:04

## From ATM to EFTs

17:04

After winning in ATMs, Tandem expanded  into electronic fund transfers.

17:10

Once connected to a central ledger, ATMs  no longer had to be in or near the bank  

17:15

branches anymore. Banks started putting  them at train stations, grocery stores,  

17:21

airports, and anywhere else  people needed quick cash.

17:25

Banks then realized that they can  set up and share networks of ATMs,  

17:29

spreading out the upfront costs of  installation. In the early 1980s,  

17:34

you started to see regional ATM networks like  Yankee24 in New England or NYCE in New York.

17:44

This accelerated in 1985 when the US courts  ruled that ATMs were not bank branches - allowing  

17:50

networks to expand across state lines and become  national networks like STAR, PULSE, or Cirrus.

17:58

We call these Electronic Fund Transfer networks.  And because they serve millions of people across  

18:04

different banks 24/7, reliability and  availability is of the utmost importance.

18:12

Behind the scenes, networks used "switches"  to process, route and authorize transaction  

18:18

requests. As the network's transaction  hub, the switch cannot fail. Otherwise,  

18:25

millions of people can't get their  money or worse, their cards get eaten.

18:30

Tandem's NonStop systems became the go-to default.

18:34

Cirrus for instance used a NonStop II  system with four processor modules.

18:39

The Visa and Mastercard card networks used  Tandem too. As these credit card networks  

18:44

gained global prominence, they brought Tandem's  systems to Asia, Latin America, and Europe.

18:51

The US Treasury even used a NonStop system in  1981 to build an EFT network capable of handling  

18:58

$100 billion of electronic fund transfers.  Hard to imagine a sturdier certification.

19:05

The company's banking and finance successes  gave them a reputation. If people went to IBM  

19:10

because no one ever got fired for choosing IBM,  

19:14

people chose Tandem because they  needed computers that did not die.

19:18

Industrial companies all over  the world in North America and  

19:21

Europe adopted Tandem to help run  their mission-critical systems.

19:26

There was even a dedicated  Tandem distributor in Malaysia,  

19:30

serving banks and financial  institutions in the ASEAN region.

19:35

As they went from win to win to  win, Tandem's sales doubled each  

19:40

year over its first six years  - hitting $312 million in 1982.  

19:46

Such growth is virtually unheard of  in the computer hardware industry.

19:52

## Working at Tandem Tandem was known for its unique corporate culture.

19:55

Jimmy Treybig came from Hewlett-Packard  - famous for its "HP Way", which valued  

20:01

integrity and trust. Treybig brought it with  him to Tandem, and he was very proud of it.

20:07

Tandem wanted workers to all act like owners.  So they issued stock options to every employee,  

20:13

not just the execs. It made them  all rich. By mid-1982, two dozen  

20:19

employees were millionaires. Another hundred  employees owned stock worth at least $500,000.

20:26

Tandem employees also got other  things like flexible work hours,  

20:30

open door policies, and  sabbaticals every four years.

20:34

To encourage cross-department communication,  

20:37

each company location held a beer  and popcorn get-together on Friday  

20:41

afternoon. And the company's Cupertino  headquarters had a pool. That's nice.

20:47

None of this feels very special nowadays,  but for the late 1970s and early 1980s it  

20:53

was pretty radical. There were murmurs  at the time that maybe things were a  

20:56

little too insular. Cultish even. And  that there were maybe too few meetings.

21:02

But it worked for a while. One thing people  pointed to was the employee turnover:  

21:07

Just 8%, a fourth that of other  North American tech companies.

21:14

## Stratus Unfortunately, this culture tightened up as

21:17

the company's growth slowed  and new competition emerged.

21:21

Challengers immediately emerged out of  the dirt like sprouts after the rain.  

21:25

But the one that took root was the  VC-funded startup Stratus in Boston.  

21:30

Founded in 1980 by a former Tandem employee, they  had a hardware-only solution called "Lockstep".

21:37

In lockstep, pairs of ordinary, off-the-shelf  Motorola processors run the same task.

21:44

The output is evaluated and if differences exist,  

21:47

then that pair is shut down  and a second takes over.

21:52

Stratus's systems cost less than Tandem's -  about $200,000 compared to several million  

21:58

dollars. And can supposedly achieve the  same fault-tolerance without needing to  

22:03

adopt Tandem's complex software  concepts like checkpointing.

22:08

Nicholas J. Bologna, Stratus' director of product  engineering told the New York Times in 1982:

22:14

> "Tandem has claimed for years that it  has a six-year jump on the field because  

22:19

of its software ... but we leapfrogged  that distance by not needing software."

22:25

Tandem hit back. Tandem software  architect Jim Gray argued that  

22:29

Lockstep handles hardware failures fine but does  

22:32

nothing for software failures. Both  processors run the bug the same way.

22:38

Nevertheless, Tandem adjusted its technical  approach. Gray distanced the company from  

22:44

manual checkpointing practices - recognizing  that it was hard for programmers to do.

22:50

## Slowing Down and Bouncing Back

22:50

Stratus' competition and growth led  to Tandem’s first major slowdown.

22:56

After doubling sales so many years,  Tandem's revenue growth in 1982 fell  

23:02

to 50% (still good). Then fell again to 34% in  1983 (still okay). And then 27% in 1984. Uh oh.

23:14

Such a slowdown is always difficult.  In May 1984, the company announced that  

23:19

earnings would decline 70% year over  year and the stock price clanked 25%.

23:25

To top it all off, in October 1984, the US  securities regulator accused the company  

23:31

and three executives of inflating  earnings by improperly recognizing  

23:36

certain orders. They settled  with a small slap on the wrist.

23:40

After this, Treybig held more meetings to keep  executives accountable and issued more direct  

23:45

orders rather than coming to consensus. A new  audit team was spun up, salaries were frozen,  

23:53

manufacturing processes were  revamped, and costs were cut.

23:58

They also struck partnerships with  companies like Motorola to share  

24:01

R&D costs on new products like the  Tandem VLX chip. That chip powered  

24:06

a new lower-end mainframe called the  NonStop EXT that sold well in 1986.

24:13

That along with new customer wins like Texaco,  GTE Corporation, and the US Air Force Logistics  

24:20

Command helped turn things around. In 1986, gross  margins increased to 68% and profits rose 148%.

24:31

## Opening Up

24:31

But a major problem of “good  enough” hardware remained.

24:35

As Stratus' chairman William Foster explains, in  the late 1970s computer hardware cost more than  

24:42

software. So it made sense for Tandem to sell  a software-centric approach to fault-tolerance.

24:50

But then the hardware got cheaper, better  and more reliable. And the software got  

24:55

more expensive thanks to rising programmer  salaries. As a result of software inflation,  

25:01

businesses now want their applications  to be portable - developed on top of  

25:06

open platforms like UNIX so it can  be easily moved to new hardware.

25:12

The problem was that Tandem's solution was all  proprietary. Once hardware got good enough for  

25:18

most companies, who would want to buy such  expensive proprietary hardware and develop  

25:23

in a proprietary Guardian OS ecosystem  unless they had the most extreme demands?

25:30

A perfect example of this was in 1987, when Tandem  lost a massive $1 billion contract to help build  

25:37

the US Veterans' Administration's decentralized  IT system: the Composite Health Care System.

25:45

Tandem was expected to win the contract,  

25:48

but lost to a company called Science  Applications International Corporation  

25:53

or SAIC. In part because the VA wanted to  build on open standards like UNIX and MUMPS.

26:01

With the ground under them shifting,  Tandem realized it had to expand out of its  

26:06

fault-tolerant niche, open up its proprietary  stack, and do it all without going bankrupt.

26:14

## New Verticals & Products

26:14

In the second half of the 1980s, Tandem  spread out. Treybig said in a 1987 interview:

26:20

> "I used to not like that word, niche ... but  I don't think it's negative. We're choosing  

26:26

niches. By choosing where we fight and win  we can be quite a big, profitable company"

26:32

They struck partnerships to enter several  verticals and diversify away from banking  

26:37

and finance. Targets included government,  telecommunications, retail and transportation.

26:44

There was some success with market  exchanges. In the late 1980s,  

26:48

over two dozen market exchanges - ranging from  the Chicago Mercantile Exchange to the Nasdaq to  

26:54

exchanges in Hong Kong and New Zealand - adopted  Tandem to handle ever-rising transaction volumes.

27:01

In 1988, they spent $280 million to  buy a company called Ungermann-Bass,  

27:06

hoping to leverage the latter's expertise in  Local Area Networks to expand their breadth.

27:14

In that same 1987 interview, Treybig admitted that  

27:17

they had gotten a bit too insular and  over-indexed on high-end customers.

27:22

> We didn't do a good job on  third parties. We didn't build  

27:25

products in the lower price range  and that was a strategic mistake.

27:30

So they experimented with lower-end  versions of their products. Some did  

27:34

better than others. A failure was the line  of MS-DOS workstations called the Dynamite.

27:40

Tandem was trying to grab a piece of  the growing PC market. Unfortunately  

27:45

Dynamite was incompatible with the  IBM ecosystem and failed on arrival.  

27:51

They replaced it a year later with a  fully PC AT-compatible workstation.

27:56

Two products that did a little better  were the Tandem NonStop CLX and LXN.

28:03

The CLX was a low-end line of  computers that can cost as little as  

28:08

$57,000 - making it the cheapest  thing Tandem had in their lineup.

28:14

Tandem made the product after noticing that  while big banks bought NonStop mainframes  

28:19

for their headquarters, they hesitated to do  the same for their smaller, regional offices.

28:24

It was also kind of cool because it had  their first CMOS chip, self-designed and  

28:29

fabbed at VLSI Technology. If you know this  channel you know why I care about that.

28:35

The LXN is particularly interesting because it  was their tentative first step into UNIX. However,  

28:42

the rest of their lineup still  ran their proprietary Guardian OS.

28:47

This era also saw one of the company's most  iconic technical legacies: Tandem NonStop SQL.

28:53

NonStop SQL is a relational database  management software that supports the  

28:58

SQL query language. Tandem introduced it in 1987  

29:02

as a ground-up replacement for an  older database called Encompass.

29:07

It gained some renown for its groundbreaking  high availability as well as its linear  

29:12

scalability. They are still using it  today in certain financial applications.

29:18

## Cyclone And in May 1989, it helped Tandem beat IBM in a

29:22

head-to-head bake-off for a large  contract for the California DMV.

29:27

NonStop SQL beat IBM's iconic DB2 database in five  

29:32

of the seven technical criteria - and  Tandem bid less than half of IBM. The  

29:37

big win convinced Tandem that they were  finally ready to take the Beast head-on.

29:43

Thus in October 1989, Tandem brought  forth the NonStop Cyclone. One of  

29:48

the finest devices that the  company would ever produce,  

29:51

and a direct broadside against IBM's  dominance in "big boy" mainframe computers.

29:57

The Cyclone's big differentiator was the  adoption of superscalar processing - where  

30:02

the chip initiates several instructions  and does them all at once. Thus multiplying  

30:07

the potential throughput per processor  without needing to ramp up clock cycles.

30:13

At the same time, the Cyclone  retained many of the NonStop  

30:16

line's reputed fault tolerance - even  adding new advanced technologies for it.

30:22

Huge RAM redundancies were built into the boards.  

30:26

The Dynabus interconnects were updated  from copper to fiber optics using light  

30:31

signals. The cabinets themselves had an  extra, backup power supply just in case.

30:37

The cabinets were also thoughtfully  engineered with minimum cabling to  

30:41

make replacement via hot-swapping much easier.

30:46

It also looked rad. Witnesses recall the  Cyclones being physically large and imposing,  

30:52

with big black cabinets and loud air-fans.

30:55

Previously, Tandem's systems just  helped collect real time data. Now,  

30:59

they wanted everyone to know that they had the  "Big Iron" hardware and efficient relational  

31:04

database software to run jobs on all that data.  Sold at a third the price of an IBM system.

31:12

Jeffrey Beeler, an industry  analyst at Dataquest said:

31:15

> Tandem is entering the most significant  phase in their corporate evolution.  

31:20

Instead of asking users to  give them transactions they  

31:23

have previously run on an IBM system,  they are saying 'Give us everything'.

31:29

To promote the Cyclone, Tandem crowed their  successful DMV win and performance benchmarks  

31:35

from the highest mountains. But inside,  a debate raged over the company's future.

31:41

Oh by the way. That DMV project ended up a $49  million, 7-year debacle. It wasn't Tandem's  

31:50

fault. The DMV caught most of the blame, with  four employees disciplined for literal fraud.  

31:55

But $18 million of Cyclone mainframes ended up  being sold as surplus for pennies on the dollar.

32:03

## Open is Death

32:03

In 1989, a major telephone customer  with $100 million in cumulative  

32:09

sales - probably AT&T - told Tandem  that they were switching to UNIX.

32:15

And if Tandem did not get on board, then  they would cut the cord and leave. There  

32:19

was no convincing them. It was either get UNIX or  get out. This sparked a massive internal debate.

32:27

Tandem did release one UNIX product  before. But a small one. Executives  

32:32

recognized that it was their proprietary  ecosystem of Guardian OS applications  

32:36

that kept customers on board and  buying their expensive new hardware.

32:42

Treybig said it best when confronted with  the notion: "Open is death". Meaning that  

32:48

if they switched to UNIX and ported their  applications to it, then customers can bolt  

32:53

to any UNIX-compatible vendor offering a wink  and cheaper price. UNIX commoditizes Tandem.

33:01

But the customer insisted and they  were just too large to ignore. So  

33:06

Tandem decided to saw the baby in  half and add a new lower-end SKU  

33:11

to the product lineup rather than  doing a full-throated transition.

33:15

In March 1989, Tandem announced what they hailed  as the first fault-tolerant UNIX mainframe,  

33:21

powered by MIPS Technologies' RISC chips and  System V UNIX. They called it the Integrity S2.

33:29

Tandem worked hard to produce a  UNIX-compliant system with the  

33:33

same hardware modularity and software  redundancy that Tandem was known for.

33:39

It was not easy, since they were using  off-the-shelf chips. Their technical  

33:43

paper lists innovations like the idea  of "Virtual Time", which synchronized  

33:48

the three non-lockstep CPUs based on  instruction counts rather than clock cycles.

33:55

Treybig said shortly before the release:

33:57

> "Lots of companies have  tried to build good nonstop  

34:00

systems with Unix and none succeeded. We have".

34:05

Finally released in January 1990, the S2 sold  very well with telecommunications companies.  

34:11

AT&T partnered with Tandem and started marketing  it to their various government and tech partners.

34:18

Entering the 1990s, things seemed to be  on the right track. In late October 1989,  

34:24

Tandem announced that revenues grew 24%  to $1.6 billion. Income also grew to $118  

34:32

million. The stock rose 40% from March 1989 to  January 1990. Treybig told the SF Chronicle:

34:41

> I probably feel better  than I have for eight years.

34:46

## End of the Mainframe

34:47

Tandem and the rest of the  industry began the year with  

34:49

high hopes for their beautiful Cyclone mainframe.

34:52

Then in February, the big investment bank  Drexel Burnham Lambert - home to Mike Milken,  

34:58

the junk bond king - unexpectedly collapsed.

35:02

The bankruptcy ends an era of "easy  money" on Wall Street. Rumors of other  

35:07

financial turbulences roiled the markets  throughout the first half of the year.

35:12

The downturn smashed Tandem's core market,  which can no longer justify spending $2  

35:17

million on a big iron mainframe. At the same  time, competitors from DEC to Stratus to a  

35:25

revitalized IBM started cutting prices in an  attempt to peel away Tandem's top customers.

35:31

In April 1990, Tandem admitted  that profits will decline from  

35:36

the previous quarter, and the stock yeet'ed 17%.

35:40

Things only got worse in August 1990, when  Iraq invaded Kuwait, kicking off the First  

35:46

Gulf War - spiking oil prices and spreading  the downturn to the whole world economy.

35:52

Ahead of the company's fiscal year close  in September 1990, executives warned that  

35:58

things were as bad as they had ever seen.  They predicted that the year would see  

36:02

zero revenue growth, lower profits and maybe  even layoffs for the first time since 1974.

36:10

Treybig said in early September:

36:12

> "You could probably say it's a disaster in the  UK, and to some degree in the Southeastern US"

36:18

While economic growth in the US  returns after less than a year,  

36:22

the jobs do not return as quickly. Perhaps  due to sluggish construction activity,  

36:27

a weak real estate market, and  layoffs in the defense industry.

36:31

In 1990 and 1991, Tandem's growth was  essentially stagnant at around $1.9 billion.  

36:39

They blamed the economic downturn in the  wider computer market, but the real reason  

36:44

was that customers were preferring to buy cheap,  high-end commodity UNIX boxes from HP and DEC.

36:51

Maybe even clustering them together for  high availability. Not as fault-tolerant  

36:56

as Tandem can do, but good enough for  most. And that was bad for Tandem.

37:02

## Himalaya

37:02

As Tandem's sales wriggled in stasis,  executive debate roared back up again.

37:08

The Integrity S2 did not quell demand  for a UNIX-ified Guardian OS. Rather,  

37:13

it only showed customers that  it was technically possible,  

37:17

and that the only reason they were  not getting it was deliberate lock-in.

37:21

In 1991, Tandem's successor to  the Cyclone - called Himalaya-  

37:26

would not be ready until late 1993.

37:29

But the customers were already angling  to leave - including one of the largest,  

37:34

San Francisco-based Wells Fargo.

37:37

Wells Fargo ... oh, you mean the guys who  defrauded their customers with empty accounts?

37:42

Anyway. The Himalaya was not a bad computer.  

37:45

Its performance was measured at  two times faster than Cyclone.

37:49

And Himalaya's "NonStop Kernel" software can  

37:52

support both Guardian OS and POSIX-flavor  UNIX applications. It was not a full port,  

37:59

but Tandem felt that being able to  dual-run apps can calm customer chatter.

38:05

So yeah, fine computer. But rather than announcing  it ASAP while also cutting prices on the older  

38:11

Cyclone, Treybig decided to milk the old cow and  keep things quiet until the new cow was ready.  

38:19

This gave the effect of the company doing nothing  on the product front between 1990 and 1993.

38:26

They did lay off about 600 people, but  essentially fiddled while Rome burned.

38:33

During a mid-1993 meeting with  analyst firm META Group - no  

38:37

relation to Facebook - Treybig was told  in no uncertain terms that they had made  

38:41

a terrible mistake. And that two major  London banks were preparing to defect.

38:47

So he goes and announces it in mid-1993,  several months before it is to be released.

38:53

He tried to frame the launch as evolving the  company towards a future of "client-server"  

38:57

architectures. Where their NonStop mainframes  can power online services like e-mail and file  

39:03

sharing - then seen as "business-critical"  as the banking and finance stuff.

39:11

What customers heard, however, was that  the Himalaya had twice the performance for  

39:14

a third to sixth the price of the Cyclones.  But it wasn't ready yet, so sales collapsed.

39:21

In July 1993, Treybig announced  the company's worst quarterly  

39:25

loss in history, a jaw-dropping $550 million.

39:29

About $450 million of that were write-offs  and costs associated with plant closures,  

39:35

consolidations, and a 1,800 person  layoff, about 15% of the company.

39:42

Treybig said later that he had  stalled it as long as he could, but:

39:46

> However, we are aware that becoming a low price  provider requires we adopt a new business model if  

39:52

we are going to achieve our profitability goals  ... This requires more than just cutting costs;  

39:58

it requires re-engineering the  business, changing our culture,  

40:02

and continuously reducing our cost structure

40:06

The Himalaya computers did sell 500 units in the  third quarter - double of that in the prior two  

40:11

quarters. And it went on to sell well in  finance and banking thanks to its strong  

40:16

clustering abilities. However, it could not pull  the company out of its rut and long-term decline.

40:23

## Treybig is Out

40:25

Treybig founded the company in his early  thirties and had led it for over two decades.

40:30

He sparked their early success  and crafted their unique,  

40:34

people-centric culture. But he also failed  to position the company for success in the  

40:39

new world of commodity hardware, open  systems, and distributed clusters.

40:45

In the end, the board had to  do something. In October 1995,  

40:50

after yet another disappointing financial  quarter, the company announced that it was  

40:54

reshuffling its senior management  and that Treybig would step down.

40:59

A month later, the company named Roel  Pieper as the new president and CEO.  

41:04

A hard-charging sales executive, Pieper had  headed Tandem's subsidiary UB Networks - the  

41:10

former Ungermann–Bass - and turned them around.

41:14

Pieper's turnaround strategy was to make Tandem  into a software company by getting the company's  

41:20

previously proprietary technology off Guardian and  into as many hands as possible. He said in 1996:

41:27

> The company is in a position where it  is technology rich and maybe image poor.

41:33

Such technologies included ServerNet, a  fabric communications link that replaced  

41:38

their old DynaBus interprocessor buses.  It was well received - one of Himalaya's  

41:43

strengths had been clustering computers -  but exclusive to their Guardian OS systems.

41:48

Pieper shared the technology to the  desktop PC maker Compaq as part of a  

41:53

larger strategy to enter the market  of Windows NT-enabled servers and  

41:58

turn ServerNet into a commercial standard.  Compaq would build servers using ServerNet.

42:04

In May 1996, Tandem announced a deal with  Microsoft, which made the then-ascending NT  

42:11

server OS. Microsoft paid $30 million to bring  Tandem's fault-tolerance to NT. And Tandem  

42:18

would port NonStop SQL to NT and sell Windows NT  NonStop plain-vanilla servers using Intel chips.

42:26

I can see why Microsoft does it.  They get to say they have brought  

42:30

mission-critical level fault-tolerance  and a nice database software to their  

42:35

OS - a boon at a time when NT was still  fighting for dominance in the enterprise.

42:41

Tandem on the other hand seemed to be taking  a major risk. 70% of revenue still came from  

42:47

its proprietary systems, and now they  were porting those to a rival platform.

42:52

It felt like they abandoning the Himalayas,  though Tandem assured old customers it would  

42:57

still be supported - with "paths" of  migration opened up to NT. Why do this?  

43:04

One Tandem SVP of marketing explained that  the old way had put the company in gridlock:

43:09

> Our strategy before was to take the Himalaya and  

43:13

make it cheaper ... and there's only  so cheap you can make a platform ...

43:16

> Now [Pieper has] got Tandem out  of this gridlock. We're saying  

43:21

it's OK to compete and cooperate with a competitor

43:25

But outsiders were cautious. One analyst said that  they basically "gave their Crown Jewels away".  

43:32

Others said that though it was a necessary shock  to get Tandem to think like a software company,  

43:38

it remained unclear how much money  the strategy would actually make.

43:43

We never really figured out whether this  breakneck plan to turn the mainframe-centric  

43:47

Tandem into a software company would have worked  because about a year later, they got bought.

43:53

## Acquisition

43:53

In June 1997, the company was acquired by the  PC giant Compaq for about $3 billion in stock.

44:00

I covered this briefly in the video about  Compaq, though I got the price wrong. My bad.

44:05

Compaq CEO Eckhard Pfeiffer wanted to  expand into the enterprise. He wanted  

44:10

to add Tandem's 4,000-person  sales staff to the 2,000 they  

44:15

already had so they can sell more  of their computers to enterprises.

44:20

The fast-growing company offered a 50%  premium on Tandem's latest stock price.  

44:25

Considering how the stock price - and the  business - had been stagnant for years,  

44:30

I reckon management felt they had to take it.

44:33

Asked about the deal, Treybig said that Tandem  can help Compaq deal with that world of enterprise  

44:39

sales. He also discussed Tandem's leadership  in cluster NT systems and database software.

44:46

Compaq didn't get much chance to  leverage any of that because they  

44:50

then swallowed up the ancient - and flailing  - computer giant DEC for nearly $10 billion.

44:56

And then was itself swallowed up by  Hewlett-Packard in the early 2000s.  

45:02

There is something poetic about that. Created  as an idea inside HP. Ended up inside HP.

45:10

## Conclusion After the acquisitions, HP positioned Tandem as a

45:13

high-end fault-tolerant brand  within its product lineup.

45:18

Meanwhile, the company completed its  standardization onto Intel hardware,  

45:22

the Itanium. But the Itanium itself was a bit  of a strange duck within the Intel roadmap.

45:29

So a few years later they switched  again with the NonStop X in 2014,  

45:34

which used industry-standard  Intel Xeon processors.  

45:38

HP held it up as a sign of their commitment  to the brand. Tandem fans hailed the decision.

45:45

A year later, HP split into two companies:  

45:48

HP and HPE. Tandem went with HPE where it  remains today. Their hardware continues to  

45:54

get new Intel Xeons and power the world's  mission-critical IT infrastructure.

46:01

Google and Amazon with their big compute  clusters might be bigger and more scalable.  

46:06

But those seem to crash all the time  nowadays. The Tandems powering Visa,  

46:10

Mastercard, and our ATMs on the  other hand are still chugging along.

Interactive Summary

The video chronicles the rise and fall of Tandem Computers, a company founded in the 1970s by Jim Treybig. Tandem specialized in creating highly reliable, fault-tolerant computer systems designed for mission-critical industries like banking and stock exchanges. Their innovative hardware and software, particularly the NonStop system, allowed businesses to process transactions in real-time, overcoming the limitations of earlier batch processing. Despite facing stiff competition and a changing technological landscape that favored open systems, Tandem carved out a significant niche, especially with the proliferation of ATMs and Electronic Fund Transfers. The company culture was also distinctive, emphasizing employee ownership and well-being. However, as the market shifted towards cheaper, commodity hardware and open-source software, Tandem struggled to adapt. Following a period of internal debate and product evolution, including the development of the Cyclone and Himalaya systems and an attempt to embrace UNIX, Tandem was eventually acquired by Compaq in 1997, and its legacy lives on within Hewlett Packard Enterprise.

Suggested questions

5 ready-made prompts