StarTrek, Compiler History. and AI "Negative Transfer" Bugs

Watch on YouTube

Now Playing

Transcript

374 segments

0:00

Hi, I'm Carl from the Internet of Bugs.

0:02

I've been making a living from writing

0:03

software

0:03

for 35 plus years now,

0:05

and I'm trying to do my part

0:07

to make the Internet a less buggy place.

0:09

So, today's books are "The World of Star Trek"

0:13

by "David Gerrold" and "Programming Language Primatics"

0:16

by Scott.

0:17

In February of 1967, a Star Trek episode first

0:23

aired.

0:23

That episode was called "Court Martial,"

0:26

and it was the source of confusion for me

0:27

for most of my life until very recently.

0:29

Spoilers for a 58-year-old Star Trek episode,

0:31

by the way.

0:32

Don't say I didn't warn you.

0:34

The episode kind of is a whodunit mystery.

0:37

The breakthrough that the characters have

0:38

to figure out the mystery revolves around the

0:40

way

0:40

that the computer works on the show.

0:42

In the episode, Mr. Spock discovers

0:45

that there's an error

0:46

in the central computer's chess program,

0:48

and because of that, he concludes

0:50

that the computer's memory has been tampered

0:52

with.

0:52

In this case, someone has altered the ship's

0:54

log,

0:54

effectively replacing a section of surveillance

0:56

video

0:56

with a fake.

0:58

When I was a kid, I loved computers,

0:59

reading sci-fi books, including Star Trek,

1:02

and also watching and rewatching a lot of Star

1:04

Trek reruns,

1:04

because we only had three TV channels back then,

1:06

and we didn't have internet,

1:07

and TikTok wouldn't be invented for decades,

1:09

and okay, I'm old, just leave me alone.

1:10

Anyway, this episode confused me as a teenager

1:13

because it did not make any sense

1:14

with respect to what I knew about how computers

1:16

worked.

1:17

Computers had chess programs, and computers had

1:19

logs,

1:20

and there was no world I understood

1:21

in which forging a log over here

1:23

would affect the chess program over there.

1:25

As I got older, and I learned about how TV

1:27

shows were written,

1:28

I decided that it must just be

1:29

that the people that worked on the episode

1:30

had no clue how computers worked,

1:32

so they just wrote the wrong thing.

1:34

As I got even older,

1:35

and I understood the timeline of how computers

1:37

were developed,

1:38

I realized that a lot of the things that I knew

1:40

to be true about how computers work

1:41

wouldn't necessarily have been true in the mid-1960s,

1:44

because a lot had changed in the decades

1:45

since that episode was written.

1:47

And then, as I got even older, much older,

1:50

things came full circle,

1:51

and here in the mid-2020s,

1:53

that episode finally makes sense.

1:54

In fact, it seems almost prophetic. Let me

1:57

explain.

1:58

There have been a lot of advances in

2:00

programming

2:00

for the last 60 years.

2:02

First off, there are a number

2:04

of formerly common programming techniques

2:07

that these days are Considered Harmful,

2:09

like GOTO and Global Variables.

2:12

And in order to avoid those harmful things,

2:14

and write better and safer programs,

2:16

we've had lots of advances in the science

2:18

and the art of programming,

2:19

and better programming languages and tools.

2:22

Let me give you a simplified case

2:23

for three of the biggest, most important ones.

2:26

One is variable scope,

2:27

when a variable can only affect things

2:30

within a certain area.

2:32

Two is types, more than memory types,

2:34

but like a "name" can't be assigned to an "address",

2:36

even though they're both strings.

2:38

And the third is error and exception handling

2:40

and checking.

2:41

So think about it.

2:42

Basic and assembly are really simple,

2:44

with a flat variable address space,

2:45

not a lot of structure,

2:46

basically no way of handling error built in.

2:48

With Pascal, C, we got functions,

2:50

return values, some sample types,

2:52

some simple types, some simple local variables.

2:54

C++, Java, Objective-C gave us objects,

2:57

where we could have variables private to

2:58

instances.

2:59

Public and private methods can define types.

3:01

For example,

3:02

if you accidentally confused an integer zip

3:04

code

3:04

with an integer phone number,

3:06

you could get a compiler error

3:07

if you tried to pass one to a function

3:08

expecting the other one.

3:10

Swift and Kotlin gave us optional or nullable

3:12

types

3:12

that cut down in a whole class of errors.

3:14

Go gave us conventions for error values,

3:16

getting returned alongside functions as result

3:18

variables.

3:19

But the algorithms in the heart of LLMs,

3:21

and neural nets in general,

3:23

don't use these new advances.

3:24

It's conceivable that going and manually

3:27

changing

3:27

some chatbot's weights to get it to misreport

3:29

some past event

3:30

might end up affecting how well the chatbot

3:32

plays chess.

3:33

And according to some of the @GothamChess

3:35

videos I've seen,

3:35

they definitely don't need anything

3:36

making chess harder for them.

3:38

I'll put a link to a couple of those videos

3:39

below,

3:39

if you're curious.

3:40

They're pretty hilarious.

3:41

An LLM has potentially trillions of parameters,

3:44

which are effectively just global variables,

3:46

all with the same scope, all with the same type.

3:49

We have no way of understanding

3:50

how changing any of them might impact any

3:52

others,

3:53

not to mention how,

3:54

and there's no inherent or internal error

3:56

handling

3:56

in the system.

3:57

Welcome back to the 1960s.

3:59

And this matters.

4:01

Like I said in my companion video on my main

4:03

channel,

4:03

LLMs are just software,

4:05

and they are subject to all the rules

4:06

and disadvantages of software.

4:08

And software gets complicated

4:09

and the bigger the software project,

4:10

the more complicated it gets.

4:12

Think about it.

4:13

Imagine you have 100,000 lines of code

4:14

and hundreds of global variables

4:16

with GOTO statements everywhere,

4:17

and you need to make a change.

4:19

You need to do two things.

4:19

First, you need to make the change you were

4:21

asked to make,

4:22

add the feature, fix the bug, whatever.

4:23

Second, you need to avoid causing any other

4:25

changes,

4:26

breakage, side effects, anything else in the

4:28

project.

4:28

That first part is relatively straightforward.

4:30

The second part... isn't.

4:32

The larger the project gets,

4:33

the more potential interactions

4:35

the rest of the project has

4:36

with the part that you're changing.

4:37

And the more difficult it is

4:38

to make sure you don't cause any new problems.

4:40

How many different things do you need to check

4:42

to make sure you didn't break anything?

4:43

Well, with 100 or so global variables,

4:46

you need to check 100 or so.

4:47

And what's worse, is assuming you to a perfect

4:49

job

4:49

with this task, and that's a big assumption,

4:51

then you just added one more thing you need to

4:53

check.

4:53

So when you need to add your next change,

4:54

you need to verify 100 or so plus one.

4:57

And then when after that, at 100 or so plus two,

4:59

and the longer you work on this project,

5:01

the harder you make things for yourself.

5:03

I lived through this before, and it was a

5:05

nightmare.

5:05

I did a couple of projects in the early 1990s

5:07

for a now-defunct retailer

5:08

whose point of sale system had been written

5:10

internally

5:11

using a version of the BASIC programming

5:13

language

5:13

with ISAM database extensions.

5:14

And it was more than 100,000 lines of BASIC

5:16

code,

5:17

complete with line numbers and GOTO statements

5:19

everywhere.

5:20

Changing anything was incredibly risky and took

5:23

forever.

5:24

Now, I can't prove the system contributed

5:26

to the company's inability to adapt to online

5:28

sales,

5:28

causing them to fail when the Internet took off,

5:31

but I wouldn't be surprised.

5:32

This situation is often referred to as "whack-a-mole,"

5:35

where every time you fix a bug,

5:36

you run the risk of creating a bug somewhere

5:38

else,

5:38

because of some interaction that you didn't

5:40

know about

5:40

or didn't fully understand.

5:41

It's named after the old arcade game

5:43

with the targets that pop up

5:44

that you try to hit with a hammer.

5:45

I talk about whack-a-mole a lot,

5:47

and I've seen it kill a bunch of projects.

5:49

And I even had a section on it

5:50

in the mobile app development book I wrote

5:51

more than 10 years ago.

5:53

This isn't a plug, by the way, although it is an

5:54

FAQ.

5:55

The book is called "App Accomplished."

5:56

It's permanently out of print,

5:58

if for no other reason than because mobile

6:00

development

6:00

isn't a hot topic anymore.

6:01

So one way that we as professional programmers

6:04

deal with this

6:04

is to try to limit the number of things

6:05

that can interact with each other.

6:07

Imagine now that same 100,000-line project,

6:10

but now it's partitioned into well-designed

6:12

modules

6:12

with mostly private variables

6:14

and a small number of well-defined public

6:16

methods.

6:16

Now you only have to worry about

6:18

what's in the module you're actually changing

6:20

and then double-checking some interactions

6:21

between the module you change

6:23

and others that touch the public methods that

6:25

it calls.

6:25

Still requires work,

6:26

but it's SO much easier than "everything is a

6:28

global" case.

6:29

But that's not really an option for neural

6:30

networks.

6:30

And knowing the problems at hundreds

6:32

of global variables cause, much less trillions.

6:34

My professional experience would lead me

6:36

to expect this to cause a problem.

6:37

And lo and behold, there turns out to be a

6:39

problem

6:39

called "negative transfer" that happens

6:41

when the training that you're doing now

6:42

has an adverse effect on training

6:44

you did previously on an unrelated task.

6:46

And when you try to retrain to fix the thing

6:48

that just broke,

6:49

you run the risk of a "negative transfer"

6:50

breaking something else in the network.

6:52

And that's pretty much the same symptoms as "whack-a-mole."

6:54

So my educated guess is that this "negative

6:56

transfer" problem

6:57

is the equivalent of the standard software

6:58

engineering

6:59

problem at least of what's called "whack-a-mole."

7:00

And the two things are connected.

7:02

And it sounds like this is a training problem,

7:04

and it is.

7:04

And that sounds bad, and it is.

7:06

But it's probably even worse than it sounds.

7:08

And not just because it can lead to the problem

7:10

called "catastrophic forgetting,"

7:12

which is not a good thing, although that

7:14

happens too.

7:15

There's a thing that defines human intelligence,

7:17

that the AI companies just haven't cracked yet.

7:19

And that's learning.

7:21

Humans have a very complicated system of short-term

7:23

memory

7:24

that gets saved into long-term memory as we

7:26

sleep,

7:26

or at least we're pretty sure that sleep is

7:28

involved.

7:28

There's a lot about how that works,

7:29

and we still don't completely understand.

7:30

The current crop of AIs are all frozen in time

7:32

at the end of their training.

7:34

I've made a video likening it to the character

7:35

from the Memento movie, and I've also heard it

7:37

compared to the character in a movie called "50 First

7:39

Dates,"

7:40

although I've never seen that one.

7:41

The AI companies seem to mostly avoid

7:44

discussing this topic,

7:44

but I think it's important.

7:46

In fact, I'm gonna go out on a limb here

7:48

and say that one of the requirements

7:49

for artificial general intelligence

7:51

is not to be as brain damaged as the Memento guy,

7:53

but feel free to disagree in the comments.

7:55

It seems to me that having some kind of working,

7:57

long-term memory is a minimum requirement

7:59

for anything claiming human-like intelligence,

8:01

much less super intelligence.

8:03

And that makes this problem,

8:04

whether you call it a "global variable,"

8:05

or "whack-a-mole", or "negative transfer,"

8:07

or "catastrophic forgetting," or whatever,

8:09

to be something that I expect will have to be

8:11

solved

8:11

before the AI companies' ambitions can be

8:13

realized.

8:14

And I haven't seen any real advancements on

8:16

that front

8:16

from any of the AI companies, or much of the

8:18

research.

8:19

Few AIs that have been released

8:21

have any kind of learning mechanisms,

8:23

and a lot of those have had to be quickly shut

8:25

down

8:25

after they learned the wrong thing.

8:26

Here's what I mean when I say these AI's are

8:28

just software.

8:30

The situations that cause things to go wrong

8:31

in business software or web software

8:33

also cause things to go wrong with these AIs.

8:35

It might not sound like the same kind of

8:37

problem,

8:37

but only because the industry goes to great

8:40

lengths

8:40

to use vocabulary associated with brains,

8:42

instead of with computers.

8:43

But they're the same problems

8:44

because it's the same mechanisms.

8:46

Software is software is software,

8:48

even if they try to call it something else.

8:50

This is one of the reasons that I've never

8:52

believed

8:52

the AI hype, because I've learned over the

8:54

decades

8:54

and many sleepless nights that software is

8:57

complicated,

8:57

and there are so many ways that software can go

8:59

wrong.

9:00

And getting it right is not as easy

9:02

as the AI companies are trying to make it seem.

9:04

And if you are one of the people that watch

9:06

this channel

9:06

because you are or want to be a part of the

9:08

software industry,

9:09

I hope that helps you see reasons

9:11

that your skills are more relevant in the

9:13

current era

9:13

than AI companies would like you

9:15

or the rest of the world to believe.

9:16

Because despite what the companies want you to

9:18

believe,

9:18

the skills that you've developed to understand

9:21

software

9:22

still apply in the AI and machine learning

9:24

space.

9:24

There's some extra stuff you're gonna need to

9:27

learn too,

9:27

but it's not a whole different science,

9:29

the way the AI industry wants you to believe

9:30

that they are.

9:31

Thanks for watching, let's be careful out there.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

In this video, Carl from 'Internet of Bugs' discusses how a 1967 Star Trek episode, 'Court Martial', accidentally predicted the challenges of modern Artificial Intelligence. He explains that while software engineering evolved to use structured techniques like variable scope and types to prevent side effects, Large Language Models (LLMs) rely on trillions of parameters that function like primitive global variables. This lack of structure leads to 'negative transfer', an AI phenomenon similar to the software 'whack-a-mole' problem, where changing one part of the system unexpectedly breaks another. Carl concludes that AI is ultimately software, and traditional engineering skills remain essential for understanding its complexities.