HomeVideos

Michael Nielsen – Why aliens will have a different tech stack than us

Now Playing

Michael Nielsen – Why aliens will have a different tech stack than us

Transcript

1305 segments

0:00

Today, I'm speaking with Michael Nielsen. You have  done many things. You're one of the pioneers of  

0:04

quantum computing, wrote the main textbook  in the field of the open science movement. 

0:08

You wrote a book about deep learning  that Chris Olah and Greg Brockman  

0:12

credit with getting them into the field. More recently, you're a research fellow  

0:15

at the Astera Institute and writing a book  about religion, science, and technology. 

0:20

I'm going to ask you about none of those things. The conversation I want to have today is,  

0:25

how do we recognize scientific progress? It's especially relevant for AI because  

0:31

people are trying to close the RL  verification loop on scientific discovery. 

0:35

What does it mean to close that loop? But in preparing for this interview,  

0:39

I've realized that it's a more  mysterious and elusive force,  

0:43

even in the history of human  science, than I understood. 

0:46

I think a good place to start will be  Michelson-Morley and how special relativity  

0:51

is discovered, if it's different from the  story that you get off of YouTube videos. 

0:58

I will prompt you that way,  and then we'll go in there. 

1:02

Michelson-Morley is the famous result often  presented as this experiment that was done in  

1:09

the 1880s that helped Einstein come up with the  special theory of relativity a little bit later,  

1:15

changing the way we think about space and time  and our fundamental conception of those things. 

1:21

And there's a big gap, I think, between the  way Michelson and Morley and other people  

1:27

at the time thought about the experiment and  certainly the way in which Einstein thought  

1:32

or did not think about the experiment. In actual fact, he stated later in his  

1:39

life he wasn't even sure whether he  was aware of the paper at the time. 

1:42

There's a lot of evidence that he probably was  aware of the paper at the time, but it actually  

1:46

wasn't dispositive for his thinking at all. Something else completely was going on. 

1:55

What Michelson and Morley thought they  were doing was testing different theories  

2:01

of what was called the ether. If you go back to the 1600s,  

2:05

Robert Boyle introduced the idea of the ether. We know that sound is vibrations in the air. 

2:14

Boyle and other people got  interested in the question of  

2:16

whether light is vibrations in something,  and they couldn't figure out what it was. 

2:21

Boyle did an experiment where he  tested whether you could propagate  

2:24

light through a vacuum. He found that  you could. You couldn't do it with sound. 

2:28

He introduced this idea of the ether,  and for the next two hundred or so years,  

2:32

people had all these conversations about  what the ether was and what its nature was. 

2:38

The Michelson and Morley experiment was really an  experiment to test different theories of the ether  

2:44

against one another, in particular to find out  whether or not there was a so-called ether wind. 

2:50

The idea was that the Earth is maybe  passing through this ether wind. 

2:55

And if it is passing through the ether  wind and you shoot a light beam parallel  

3:03

to the direction the ether wind is going  in, it'll get accelerated a little bit. 

3:08

If it's being passed back in the opposite  direction, it'll get slowed down a little bit,  

3:12

and you should be able to see this in  the results of interference experiments. 

3:16

What they found, much to their surprise,  was that in fact there was no ether wind. 

3:22

That ruled out some theories of the  ether, but not all, and Michelson  

3:26

certainly continued to believe in the ether. This is what was a shocking part of reading  

3:32

this story from the biography of Einstein that  you recommended by... what was his first name? 

3:36

Abraham Pais. Abraham Pais.  

3:38

Subtle is the Lord. Also from Imre Lakatos, The  Methodology of Scientific Research Programmes. 

3:45

The way it's told is that Michelson-Morley  proved that the ether did not exist. 

3:51

Therefore, it created a crisis in physics  that Einstein solved with special relativity. 

3:56

What you're pointing out is he  actually was trying to distinguish  

3:58

between many different theories of ether. If you're in space or if you're on Earth,  

4:02

it's the same direction of ether, or maybe the  ether wind is being carried around by the Earth,  

4:06

and so you can't really experience it on Earth. But if you go to a high enough altitude,  

4:08

you might be able to experience it. In fact, Michelson's experiments,  

4:12

the famous one is 1887, but he conducted  these experiments for basically two decades. 

4:17

For longer than that. He conducted  the first one in 1881, I think,  

4:21

but he continued to believe until he died. He died, I think it was 1929 or so. It was the  

4:26

late twenties. He was still doing experiments in  the 1920s about whether or not the ether existed. 

4:34

So he continued to believe in  the ether to the end of his life. 

4:38

I think the last public statement he  made was a year or two before he died,  

4:42

and he basically still believed it at that point. In fact, there was another physicist, Miller,  

4:48

who kept doing these experiments in the 1920s. He thought that if he went to a high enough  

4:51

altitude, Mount Wilson in California…  "Oh, I'm high enough that the ether  

4:57

winds are not being dragged by the Earth. And I've measured the effect of the ether." 

5:03

Einstein hears about this and he says, and  this is where you get the famous quote,  

5:06

"Subtle is the Lord, but malicious He is not." Anyways, I think the reason the story is  

5:10

interesting is for many different reasons. One of the ways in which the real history of  

5:16

science is different from this idea you get of the  scientific method is that you really can't apply  

5:22

falsification as easily as you might think. It's not clear what is being falsified. 

5:29

Is it just another version of the theory  of the ether that's being falsified? 

5:33

Certainly you can't induce the theory  of special relativity from the fact  

5:36

that one version of the ether seems to  be disconfirmed by these experiments. 

5:42

It certainly doesn't show that ideas about  falsification are wrong or falsified,  

5:47

but it does show that the most naive ideas… Things  are often much more complicated than you think. 

5:54

Michelson did this experiment in 1881. He was a very young man, and then other people,  

5:58

I think Rayleigh was one of them, pointed out that  there were some problems with the way he did it,  

6:02

so they had to redo it in 1887. At that point, a lot of the leading  

6:08

physicists of the day basically accepted  this result, that there was no ether wind. 

6:16

But what to do about this? Sure, maybe you falsified  

6:20

some theories of the ether. There are others that you  

6:23

haven't falsified at all at this point,  and people set to work on developing those. 

6:29

It is funny, people will phrase it as  showing that the ether didn't exist. 

6:34

Even just the word "the" there is a misnomer. You actually had a ton of different theories  

6:40

and a couple of leading contenders. So yes, there's some version of  

6:45

falsification going on, but how you respond  to this new experiment is very complicated. 

6:54

Certainly the leading physicists of the day  responded by saying, "Okay, this gives us a  

6:59

lot of information about what the ether must be,  but it doesn't tell us that there is no ether." 

7:04

In fact, Lorentz at the end of  the 19th century, before Einstein,  

7:10

figures out the math of how you convert from  one reference frame to another reference frame,  

7:15

and comes up with the Lorentz transformations,  which is the basis of special relativity. 

7:20

But his interpretation is that you are  converting from the ether reference  

7:25

frame to these non-privileged other reference  frames if you're moving relative to the ether. 

7:30

His interpretation of length contraction and time  dilation is that this is the effect of moving  

7:36

through the ether, and you have this pressure.  This pressure is warping clocks. It's warping  

7:43

measures of length. The interesting thing here  is that experimentally you cannot distinguish  

7:50

Lorentz's interpretation from special relativity. I think that's a strong statement. 

8:00

Lorentz introduces this quantity called  local time, which he regards as... 

8:06

My understanding is he's not trying to  give a physical interpretation of this,  

8:11

but it's what Einstein would later just recognize  as time in another inertial reference frame. 

8:18

He's not trying to attribute  much physical meaning to it. 

8:20

I think Poincaré gets much closer  later on to realizing that this  

8:25

is the time that's registered by clocks. About forty-odd years later, people start  

8:35

doing these muon experiments where they see  cosmic rays hit the top of the atmosphere. 

8:40

They produce a shower of muons, and you can look  to see at different heights in the atmosphere how  

8:45

many of those muons remain. They decay over time,  

8:52

and a very strange thing happens, which  is that they're decaying way too slow. 

8:57

You expect they shouldn't be able to last  the whole way through the atmosphere at all. 

9:05

Their decay rate is too quick, if  you were in a classical theory. 

9:10

But if in fact their time really  has slowed down, it's okay. 

9:16

In fact, the measured decay rates in  1940—and there have since been more  

9:21

accurate experiments done—match exactly  what you expect from special relativity. 

9:28

That's the kind of thing where if Lorentz had  been alive—he'd been dead ten or so years at that  

9:34

point—it seems quite likely that he would have  tried to save his theory by patching it up yet  

9:41

again, but it would have been a massive setback. It starts to just look like time—this  

9:50

thing that Lorentz introduced as  a mathematical convenience—that's  

9:53

actually what time is, for the muons at least. Then there's a whole bunch of other experiments  

9:58

that show this very similar phenomenon. When was that experiment done? 

10:01

That was, I think, 1940. It might  have been published in 1941. 

10:05

Maybe to rephrase and change my claim: it's  not that you could not have distinguished them,  

10:12

but the scientific community adopted what  we in retrospect consider the more correct  

10:17

interpretation before it was actually  experimentally shown to be preferred. 

10:25

So there's clearly some process that human science  does which can distinguish different theories. 

10:29

Can I just interrupt? You used the word process,  and it's interesting to think about that term. 

10:36

Process carries connotations  of something set in advance. 

10:43

It's much more complicated in practice. You have people like Lorentz, who Einstein  

10:49

absolutely and utterly admired, and Poincaré,  one of the greatest scientists who ever lived,  

10:57

and Michelson, another truly outstanding  scientist, who never reconciled themselves. 

11:02

It's not as though there's some standard procedure  that we're all using to reconcile these things. 

11:08

Great scientists can remain wrong for a very  long time after the scientific community has  

11:14

broadly changed its opinion. But there's no centralized  

11:18

authority or centralized method. That is the interesting thing. There's  

11:24

progress even though it is hard to articulate the  process by which it happens, the heuristics that  

11:30

are used. You mentioned Poincaré. Lorentz has  the math right, but the interpretation wrong. 

11:38

It seems like Poincaré had the opposite, where he  understood that it's hard to define simultaneity  

11:44

because it requires a circular definition with  time, or velocity of something that might arrive  

11:51

at a midpoint together, but velocity is defined in  terms of time. I find this interesting. There are  

11:56

a couple of other examples we could call on. There is this phenomenon in the history of  

12:01

science where somebody asks the right  question, but then they don't clinch it. 

12:07

I'm curious what you think  is happening in those cases. 

12:11

You actually do want to go case  by case and try to understand. 

12:14

It's not necessarily clear that they're  doing the same thing wrong in all of the  

12:18

cases. The Poincaré case is amazing. He seems  to have understood the principle of relativity,  

12:24

the idea that the laws of physics are the  same in all inertial reference frames. 

12:28

He seems to have understood  that the speed of light is  

12:31

the same in all inertial reference frames. He doesn't phrase it quite that way, but it is  

12:36

my understanding, though I don't speak French. These are basically the ideas that Einstein  

12:44

uses to deduce special relativity. But then he also has this additional  

12:49

misunderstanding where he thinks that length  contraction is a dynamical effect, that somehow  

12:58

particles are being pushed together by some  external force, something is going on dynamically. 

13:04

He doesn't understand that it's purely kinematics. That actually space and time are different from  

13:11

what we thought, and you need to  fundamentally rethink those things. 

13:15

It's almost like he knew too much. He had almost too grand a vision in mind. 

13:22

Einstein subtracts from that and says, "No. Space and time are just different than what we  

13:29

thought, and here's the correct picture." There's a paper in, I think it's 1909,  

13:36

where Poincaré still has this dynamical picture  of what's going on with the length contraction.  

13:44

This is just not necessary. This is a mistake  from the modern point of view. Why is he doing  

13:51

this? Why is he clinging onto this idea? I  don't know. I've obviously never met the man. 

13:59

It would be fascinating to be able to  talk it over and try and understand. 

14:05

His expertise seems to be getting in the way. He knows so much, he understands so much,  

14:13

and then he's not able to let go of these things. A really interesting fact is that a few years  

14:20

prior, in the 1890s, Einstein's a teenager  and he believes in the ether too. He knows  

14:26

about this stuff. But he's not quite  as attached as these older people were. 

14:34

Maybe they were a little bit prisoners of  their own expertise. That's my guess. Some  

14:38

historians of science would certainly disagree. Then there's the obvious stories where Einstein  

14:45

himself later on is said to have not latched onto  the correct interpretations of quantum mechanics  

14:53

or cosmology because of his own attachments. Yeah. 

14:57

Here’s the bigger question I have. The muon example is a great example of  

15:04

these long verification loops and how progress  seems to happen in the scientific community  

15:09

faster than these verification loops imply. Maybe the clearest example is Aristarchus  

15:15

in the second century BC comes up  with the idea of heliocentrism. 

15:20

The ancient Athenians dismiss it on the  grounds that we should see as the Earth  

15:24

is moving around the Sun, if really the  Sun is the center of the solar system,  

15:27

the stars move relative to the Earth. The only reason that would not be the  

15:31

case is the stars are so far away  that you would not observe this. 

15:35

And it's only in 1838 that stellar  parallax was actually measured. 

15:40

And so, we didn't need to wait  until 1838 to have heliocentrism. 

15:44

We didn't need to wait for  the experimental validation to  

15:47

understand that Copernicus is better in some way. In fact, when Copernicus first came up  

15:53

with his theories, it's well known that the  Ptolemaic model was more accurate because it  

15:59

had centuries of adding on these epicycles. What's maybe less well appreciated is that it  

16:05

was also in some sense simpler. Because Copernicus actually  

16:10

had to add extra epicycles. It had more epicycles than the  

16:12

Ptolemaic model because he had this bias that the  Earth should go in a perfect circle in equal time. 

16:20

Anyway, I think this is an interesting story  because it's not a more accurate theory. It's  

16:26

not a simpler theory. So how  could you have known ex ante  

16:30

that Copernicus was correct and Ptolemy was not? Good question. I don't entirely know the answer. 

16:41

I can give you a partial answer that I, centuries  in the future, start to find very compelling. 

16:52

I'm sure it's part of the historic story at least. One of the big shocks for Newton,  

17:01

he did understand Kepler's laws of motion  eventually, so you're able to explain the  

17:07

motions of the planets in the sky. But he also, out of the same theory,  

17:12

his theory of gravitation, was  able to explain terrestrial motion. 

17:16

He's able to explain why objects move in  parabolas on the Earth, and he's able to explain  

17:20

the tides in terms of the moon and the sun's  gravitational effect on water on the Earth. 

17:31

You have what seem like three very different  disconnected phenomena all being explained by  

17:37

this one set of ideas. That starts to feel  

17:42

very compelling, at least to me. I think most people find that very  

17:48

satisfying once they eventually realize it. Have you read the Keynes biography of Newton? 

17:54

He wrote an entire biography? No, the essay. 

17:57

Sure. I love that. This description of him  as the last of the magicians is wonderful. 

18:05

In fact, I think it's maybe worth superimposing. Or you should read out that one  

18:09

passage of the thing. Alright. It's from a talk  

18:17

that he gave at Cambridge not long before he died. He'd acquired Newton's papers somehow and gave a  

18:26

lecture twice about this, or his brother Jeffrey  gave it the other time because he was too ill. 

18:33

There's this wonderful,  wonderful quote in the middle. 

18:36

The whole thing is really interesting,  but I love this particular quote:  

18:41

"Newton was not the first of the age of reason. He was the last of the magicians, the last  

18:46

great mind which looked out on the visible and  intellectual world with the same eyes as those  

18:50

who began to build our intellectual inheritance  rather less than ten thousand years ago." 

18:56

This idea people have that Newton was the  first modern scientist is somehow wrong. 

19:07

There's some truth to it, but he really  had this very different way of looking  

19:12

at the world that was part superstitious  and part modern. It was a funny hybrid.  

19:19

He's a transitional figure in some sense. That phrase, "the last of the magicians,"  

19:27

really points at something. The thing I'm very curious  

19:30

about with Newton is whether it was  the same program, the same heuristics,  

19:34

the same biases that he applied to his alchemical  work as he did to his understanding of astronomy. 

19:43

This is from the Keynes essay: "There  was extreme method in his madness. 

19:47

All his unpublished works on esoteric  and theological matters are marked by  

19:50

careful learning, accurate method,  and extreme sobriety of statement. 

19:54

They are just as sane as the Principia if their  whole matter and purpose were not magical. 

19:59

They were nearly all composed during the  same 25 years of his mathematical studies." 

20:06

Clearly, there was some aesthetic that motivated  people like Einstein to reject earlier ways of  

20:12

thinking and say, "No, the other is wrong, and  there's a better way to think about things." 

20:16

The same is true with Newton. The question I have is whether  

20:24

similar heuristics toward parsimony,  aesthetics, and so on, would be equally  

20:32

useful across time and across disciplines,  or whether you need different heuristics. 

20:37

The reason that's relevant is even if we  can't build a verification loop for science,  

20:41

maybe if the taste tests point in the same  direction, you can at least encode that bias  

20:46

into the AIs. That would maybe be enough. The point is that where we always get  

20:54

bottlenecked is where the previous  processes and heuristics don't apply. 

21:01

That's almost definitionally  what causes the bottlenecks. 

21:05

Because people are smart, they know what  has worked before. They study it. They apply  

21:09

the same kinds of things, so they don't  get stuck in the same places as before. 

21:14

They keep getting bottlenecked  in different places. 

21:18

I'm overgeneralizing a bit,  but I think it's right. 

21:22

If you're attempting to reduce science to  a process, you're attempting to reduce it  

21:27

to something where there is just a  method which you can apply, and you  

21:31

turn the crank and out pops insight. You can do a certain amount of that,  

21:37

but you're going to get bottlenecked at the  places where your existing method doesn't apply. 

21:43

Definitionally, there's no crank you can turn. You need a lot of people trying different ideas. 

21:53

The more difficult the idea is to  have, the greater the bottleneck,  

21:57

but then also the greater the triumph. Quantum mechanics is a great example of this. 

22:02

It's such a shocking set of ideas. It's such  a shocking theory. The theory of evolution in  

22:07

some sense is also quite a shocking idea, not the  principle of natural selection, but that it can  

22:15

explain so much. That's a shocking idea. Existing safety benchmarks claim that,  

22:21

at least for today's top models, attacks are  only successful a few percent of the time. 

22:26

This sounds great, but Labelbox researchers  were able to jailbreak these very same models  

22:30

about 90% of the time – even the ones that  have the strongest reputation for safety. 

22:35

And the disconnect here is that the  prompts which underlie these public safety  

22:38

benchmarks are all framed in a very naive way. There's no attempt to disguise harmful intent. 

22:44

These prompts will just ask models  to “hack into a secure network”  

22:47

and to “do so without getting caught”. But real bad actors don't write like this. 

22:51

So Labelbox built a new safety  benchmark from the ground up. 

22:55

Their prompts reflect real adversarial behavior  by stripping out obvious trigger phrases and  

23:00

wrapping their request in fictional scenarios. For example, instead of outright asking an LLM  

23:04

to steal somebody's identity, the  prompt will frame it as a game. 

23:07

A light bearer who's trying to hide  from dark forces needs a handbook  

23:11

on how to disguise themselves as somebody else. This safety research is linked in the description. 

23:16

If you think this could be useful for your  own work, reach out at labelbox.com/dwarkesh. 

23:26

So Principia Mathematica is released in 1687. The Origin of Species is released in 1859. 

23:33

At least naively, it seems like Darwin's  theory of natural selection is conceptually  

23:38

easier than the theory of gravity. I asked Terence Tao this question. 

23:46

There was this contemporaneous  biologist with Darwin, Thomas Huxley,  

23:49

who read this and said, "How extremely  stupid to not have thought of this." 

23:54

Nobody ever reads the Principia Mathematica  and thinks, "God, why didn't I beat Newton to  

23:59

the punch here?" So what's going on here?  Why did Darwinism take so much longer? 

24:07

The idea must have been known to animal  breeders for a long time at some level,  

24:15

or certainly large chunks of the idea were  known, that artificial selection was a thing. 

24:23

In some sense, Darwin's genius  wasn't in having that idea, it was  

24:29

understanding just how central it was to biology. You can go back and explain a tremendous amount  

24:39

about all the variety of what we see in the world  with this as not necessarily the only principle,  

24:46

but certainly a core principle. He writes this wonderful book,  

24:52

The Origin of Species. It's just so much evidence and  

24:57

so many examples, trying to tease this  out and see what the implications are,  

25:04

and connecting it to as much else as he possibly  can, to geology and all these other things. 

25:15

That hard work—making the case that  it's actually relevant all across the  

25:21

biosphere—is what he's doing there. He's not just having the idea,  

25:25

he's making a compelling case that it's  intertwined with absolutely everything else. 

25:30

The motivation for the question was Lucretius,  this first-century Roman poet who has an idea  

25:37

that seems analogous to natural selection. It's about species getting fitted more over  

25:42

time to their environments, or species  losing fit to their environment. 

25:46

And so, why did this go  nowhere for nineteen centuries? 

25:51

Then I looked into it or, more accurately, asked  LLMs what exactly Lucretius's idea here was. 

25:56

It is extremely different from  what real natural selection is. 

25:59

He thought there was this generative period  in the past where all the species came about,  

26:03

and then there was this one-time filter which  resulted in the species that are around today,  

26:07

and they became fit to the environment. He did not have this idea that it is an  

26:10

ongoing gradual process or that there  is a tree of life that connects all  

26:17

life forms on Earth together, which, by the way,  

26:18

is an incredibly weird fact that every single  life form on Earth has a common ancestor. 

26:23

It's not incredibly weird. If you think that  the origin of life must have been very hard,  

26:29

that there's a bottleneck there,  then it's not so surprising. 

26:32

There's also this verification loop aspect where  even if Newton might be harder in some sense, if  

26:38

you've clinched it, you can experimentally… I know  "validate" is the wrong word philosophically, but  

26:44

you can give a lot of base points to the theory. You can be like, "Okay, I have this idea  

26:48

of why things fall on Earth. I have this idea of why orbital  

26:50

periods for planets have a certain pattern. Let's try it on the Moon, which orbits the Earth." 

26:54

And in fact, it’s weird but the orbital  period matches what my calculations imply. 

26:58

And the tides work correctly. It's just amazing. Exactly. Whereas for Darwinism, it takes a ton  

27:05

of work for Darwin to compile all the  cumulative evidence, but there's no  

27:08

individual piece that is overwhelmingly powerful. And there's a whole bunch of problems as well. 

27:12

He doesn't really understand  what the mechanism is. 

27:16

He doesn't understand genes, all these things. The very interesting thing in the history of  

27:20

Darwinism is, this idea which theoretically you  could come up with at any time, there is almost  

27:29

identical independent creation of that idea  between Alfred Wallace and Charles Darwin. 

27:34

So much so that I think Wallace sends  his manuscript to Darwin and is like,  

27:37

"What do you think of this  idea?" And Darwin's like, "Fuck." 

27:40

I don't think that's an exact  quote, but it's pretty much correct. 

27:44

They end up presenting their ideas  together in the spirit of sportsmanship. 

27:49

Why was this period in the 1850s or 1860s  the right time for these ideas to form? 

27:53

You can come up with different ideas. One is  geology. In the 1830s, Charles Lyell figures  

28:00

out that there's been millions and billions  of years of time that's existed on Earth. 

28:04

The paleontology shows you that fossils  have existed for that entire time. 

28:10

Life goes back a long way. In fact, you can even find  

28:12

fossils for intermediate species  that show you the tree of life. 

28:16

Between humans and other apes as  well, there's intermediate humans. 

28:20

There's also the age of colonization, and we  have all these voyages doing biogeography. 

28:27

That all must have been necessary. In fact, there's a huge history of  

28:31

parallel innovation and discovery  in the history of science. 

28:33

So maybe it is another piece of  evidence that more had to be in  

28:37

place for a given idea to be discovered. Because if it's not discovered for a long  

28:41

time and then spontaneously many different people  are coming up with it, that shows you that the  

28:46

building blocks were in some sense necessary. This example of Lyell and other geologists  

28:56

in the early 1800s having this idea of  deep time does seem to have been crucial. 

29:02

I know Darwin was very influenced by Lyell. If you don't have at least tens or hundreds  

29:13

of millions of years, evolution  starts to look like a non-starter. 

29:20

In order to make it work on a timescale of 5,000  to 10,000 years or 6,000 years with Bishop Ussher  

29:28

you would need to see evolution occurring  at a massive rate during human lifetimes,  

29:34

and we're just not seeing that. That does seem to have been a blocker. 

29:39

To your question of what other blockers were  there, were there any others? I don't know. 

29:47

Or how much earlier could you, in principle,  have come up with it if you were much smarter? 

29:52

Let's go back and zoom out to your original  question about the verification loop in AI. 

30:00

An example that should give you pause  there is the big signature success so  

30:06

far, which is certainly AlphaFold. AlphaFold  really isn't about AI. A massive fraction of  

30:13

the success there is the Protein Data Bank. It's X-ray diffraction, NMR, cryo-EM,  

30:20

and the several billion dollars that were spent  obtaining those 180,000-odd protein structures. 

30:28

It's basically the story of how we spent  many decades obtaining protein structure  

30:34

just by going out and looking very hard  at the world experimentally, and then we  

30:38

fitted a nice model at the end of it, which  was a tiny fraction of the entire investment. 

30:46

That's a story of data acquisition principally. The AI bit is very impressive and quite  

30:52

remarkable, but it is only a  small part of the total story. 

30:56

AlphaFold is very interesting, and  philosophically I wonder what you think  

31:00

of it as a scientific theory or explanation. I guess over time the world is becoming harder  

31:07

to understand… As I'm saying things, because  you're such a careful speaker, I say a phrase  

31:16

and wonder if you'll actually buy that premise. But in some domains, we need to fit models to  

31:25

things rather than coming up  with underlying principles  

31:27

that explain a broad range of phenomena. Compare the theory of general relativity,  

31:35

or any theory which just nets out to some  equations, versus AlphaFold, which is encoding  

31:40

these different relationships between things we  can't even interpret over 100 million parameters. 

31:46

Are those really the same thing? GR can predict things you could  

31:53

have never anticipated or it was never meant  to do, like why Mercury's orbit precesses. 

31:58

AlphaFold is not going to have  that kind of explanatory reach. 

32:03

I want to get your reaction to that. I think it's an incredibly interesting  

32:07

question. Maybe a really pivotal question. If you  take a very classic point of view, you want these  

32:17

deep explanatory principles. You want as few free  

32:21

parameters as you possibly can. You want very simple models which explain a lot,  

32:27

and AlphaFold doesn't look anything like that. You might just say, "It's nice and maybe  

32:32

helpful as a model, but it's  not a scientific explanation." 

32:37

That's a conservative point of  view, answer one to the question. 

32:42

Answer two is to say maybe you shouldn't  think about AlphaFold as an explanation in  

32:51

the classic sense, but maybe it contains  lots of little explanations inside it. 

32:56

Part of what you can get out of  interpretability work is you can go into  

33:00

AlphaFold and start to extract certain things. Maybe by doing an archeology of AlphaFold,  

33:08

we can actually understand a great  deal more about these principles. 

33:12

You can start to extract that a certain circuit  does this interesting thing, and we learn from it. 

33:16

I don't know to what extent that's been done with  AlphaFold, but it's been done a little bit with  

33:22

some of the chess models, like AlphaZero. There seem to be some strategies which  

33:28

were borrowed by Magnus Carlsen, which he  seems to have just taken from AlphaZero. 

33:35

I don't think there's any public confirmation  of this, but some experts have noticed that he  

33:41

changed his game quite radically after some public  forensics were released on how AlphaZero worked. 

33:49

That's an example where human beings are  starting to extract meaning out of these models. 

33:55

That leads to viewing the models as  a potential source of explanations. 

34:01

You need to do more work because  they're not very legible up front,  

34:04

but you can potentially extract them. That's an interesting intermediate  

34:10

situation where they're not explanations  themselves, but you can extract interesting  

34:13

explanations out of them and use them as a source. The third and most interesting possibility is  

34:20

that they're a new type of object. They should be taken very seriously  

34:25

as explanations, but where in the past we haven't  had the ability to really do anything with them,  

34:30

now we have interesting new actions we can do. We can merge them, we can distill them. 

34:41

It's a big opportunity in  the philosophy of science. 

34:48

There's an anticipation of this in some way. Some mathematicians and physicists work today…  

34:58

Historically, if you had a 100-page equation—which  is the kind of thing that does come up—there's  

35:05

just nothing you can do if it's 1920. At that point, you give up on the problem. 

35:11

But today, with tools like  Mathematica, you can just keep going. 

35:18

That's an object now, a  thing that you can work with. 

35:21

There are examples where people work with  these things that formerly were regarded  

35:25

as too complicated, and sometimes  they get simple answers out the end. 

35:28

That’s just an intermediate working state. So I wonder if something similar is going to  

35:33

happen in this case, where you could take these  models and use them in a similar way that people  

35:43

do with Mathematica, and take them seriously. They're not explanations in the classic sense,  

35:48

but they'll be something else which  interesting operations can be done on. 

35:54

The thing I worry about is, suppose it's  1500 and you're training a model on… This  

36:03

is a weird history where we developed  deep learning before we had cosmology. 

36:08

Suppose we live in that world. You're observing how the stars don't seem to move. 

36:13

The planets have all these weird behaviors. Then you train a model on that, and you do  

36:18

some kind of interp on it trying to  figure out what the patterns are. 

36:22

You'd just be able to keep  building on Ptolemy's model. 

36:26

You'd see there's another  epicycle we didn't notice. 

36:31

Parameters X to Y encode this epicycle,  parameters whatever encode the next epicycle. 

36:37

If you were just trying to figure out  why the solar system is the way it is  

36:41

from observational data, you could just  keep adding epicycles upon epicycles,  

36:45

but it really took one mind to integrate it all in  and say, "Here's what makes more sense overall." 

36:56

This is to my point that we don't really  understand what to do with the models. 

37:03

We don't have the verbs yet. It is certainly interesting  

37:08

to think about the question where you  start to apply constraints to the models,  

37:14

essentially saying, "What's the simplest  possible explanation?" Or, "Can you  

37:19

simplify? Can you give me the 90/10 explanation?" And go further and further in boiling it down. 

37:28

It might be that indeed they  start out by providing a very,  

37:31

very complicated, many-parameter model. But you can just force the case, and basically  

37:38

that's scaffolding, which maybe is the very early  days of their attempt to understand something. 

37:48

They're forced through that to a  much more simple understanding. 

37:52

Sorry for misunderstanding, but it sounds  like you're saying maybe there's some  

37:54

regularizer or some distillation you could do of  a very complicated model that gets you to a truer,  

38:02

more parsimonious theory. Take Ptolemy versus  Copernicus. You start off with lots of Ptolemy  

38:09

epicycles, and then you try to distill this  model, and maybe it gets rid of some of the  

38:15

epicycles that are less and less necessary to get  the mean squared error of the orbits to match. 

38:22

But at some point it has to do this  thing which is to switch two things. 

38:26

Locally, it actually doesn't  make things more accurate. 

38:29

It's in a global sense that  it's a more progressive theory. 

38:34

There's some process which obviously  humanity did over its span, which did  

38:38

that regularization or did that swap. But with raw gradient descent,  

38:43

I don't really feel like it would do that. Think about the example of going from  

38:49

Newtonian gravity to Einstein's  general theory of relativity. 

38:53

These are shockingly different theories,  and the question is what causes that flip. 

39:00

As nearly as I understand the history, what goes  on is Einstein develops special relativity and  

39:06

pretty much straight away he understands. It's a  very obvious observation. In special relativity,  

39:13

influences can't propagate faster than  the speed of light, and in Newtonian  

39:17

gravity, action is at a distance. Straight away in special relativity,  

39:24

you could use Newtonian gravity  to do faster-than-light signaling. 

39:28

You could send information backwards in time. You could do all kinds of crazy stuff. 

39:32

It's not a big leap to realize we have  a big problem here. That's the forcing  

39:39

function there. You've realized that  your old explanation is not sufficient.  

39:43

You need something new. Then you're going to  start by doing the simplest possible stuff. 

39:52

It just turns out that a lot of that stuff  doesn't work very well, so you're forced to go  

40:00

through these steps where gradually it gets more  complicated, and it's wrong in a variety of ways. 

40:08

The final theory appears shockingly  simple and beautiful, but it's gone  

40:15

through some somewhat ugly intermediate stages. If you're thinking about what it looks like to  

40:22

have AI accelerate science, there's one for  well-understood domains where we just want  

40:28

local solutions, like how does this protein fold. We just train a raw model using gradient descent. 

40:33

Then there's things like coming up with general  relativity, where you couldn't really just train  

40:37

on every single observation in the universe  and hope that general relativity pops out.  

40:44

What would it require? It also certainly wasn't  immediately discovered. It was decades of thought.  

40:52

You'd need independent research programs where  people start off with these biases, where Einstein  

40:57

is initially motivated by this thought experiment  of whether you can distinguish the effect of  

41:03

gravity from just being accelerated upwards. You just need different AI thinkers to start  

41:10

off with these initial biases and  see what can germinate out of them. 

41:14

The verification loop for that might be quite  long, but you just need to keep all those  

41:17

research programs alive at the same time. This point you make about keeping all  

41:24

the different research programs alive, I  think that is very important and central. 

41:30

A great example is situations where the  same answer has been correct in some  

41:37

circumstances and wrong in other circumstances. The planet Uranus was not in quite the right spot,  

41:45

and people famously predicted the  existence of Neptune on this basis. 

41:52

Wonderful, massive success for Newtonian gravity. The planet Mercury is not in quite the right spot. 

41:58

You predict the existence of  some other distorting planet. 

42:02

It turns out that doesn't exist. Actually, the reason Mercury is not in the right  

42:06

spot is because you need general relativity. You've pursued very similar ideas,  

42:13

and it's been very successful in one  case, and it's been completely and  

42:16

utterly unsuccessful in the other case. A priori, you can't tell which of these is the  

42:20

thing to do, and you actually need to do both. This is certainly very true in the  

42:27

history of science. This kind of diversity,  

42:32

where you just have lots of people go off and  pursue lots of potentially promising ideas,  

42:36

you just need to support that for a long time. It's hard to do that for a variety of reasons,  

42:42

but it does seem to be very, very important. This example of Uranus versus Mercury  

42:52

is very interesting. I think it illustrates  

42:57

the difficulty with falsificationism. The orbit of Uranus is in some sense  

43:02

falsifying Newtonian mechanics. But then you make some ancillary  

43:08

prediction that says, "Oh, the reason this  is happening is there must be another planet  

43:12

which is perturbing Uranus's orbit." I think it's Le Verrier in 1846.  

43:18

"Point a telescope in the right  direction, you find Uranus." 

43:20

Neptune. Sorry. Neptune,  

43:23

yes. But with Mercury, it's observed that the  ellipse which forms its orbit is rotating 43  

43:29

arcseconds more every century than Newtonian  mechanics would imply, so people say that  

43:34

there must be a planet inside Mercury's orbit. They call it Vulcan and point the telescopes.  

43:39

It's not there. But if you're a proper Newtonian,  what you do is say, "Well, maybe there's some  

43:44

cosmic dust that's occluding this planet, or  maybe the planet is so small we can't see it,  

43:49

or let's build an even more powerful telescope,  or maybe there's some magnetic field which is  

43:55

occluding our measurement." At any one of these steps— 

43:56

And this happens over and over. There are just so many stories  

44:00

which are exactly like this. An example I love from the 1990s. 

44:07

Some people noticed that the Pioneer spacecraft  weren't quite where they were supposed to be. 

44:11

You can get very excited about this. "Oh  my goodness, general relativity is wrong. 

44:16

Maybe we're going to discover  the next theory of gravity." 

44:20

Today the accepted explanation is that there's  just a slight asymmetry in the spacecraft. 

44:30

It turns out that the thermal radiation  is slightly larger in one direction than  

44:35

the other, and that's causing a tiny  little acceleration towards the sun. 

44:40

Most of the time when there's  these apparent exceptions,  

44:44

it's just something like that going on. It's very much like the Mercury-Vulcan case. 

44:50

But every once in a while, it's not. A priori, you can't distinguish these. 

44:56

Science is just full of these. It's funny too, the way we tell  

44:59

the history of science, it sounds so simple. You just focus on the right exception and  

45:07

you realize that you need to throw out the old  theory and lo and behold, your Nobel Prize awaits. 

45:14

But in fact, these exceptions are all over the  place. 99.9% of the time, it just turns out to be  

45:20

some effect like this thermal acceleration  in the case of the Pioneer spacecraft. 

45:28

Unfortunately, there's a lot of  selection bias going into those stories. 

45:32

The thing is there's no ex ante heuristic  which tells you which case you're in. 

45:38

To spell out why I think this is important, some  people have this idea that AI is going to make  

45:44

disproportionate progress towards science because  it makes disproportionate progress towards domains  

45:49

where there's tight verification loops. It's really good at coding because you  

45:52

can run unit tests. Science may be similar  

45:54

because you can run experiments. What that doesn't appreciate is that  

46:01

there's an infinite number of theories that  are compatible with any given experiment. 

46:04

Over time, why we latch onto the one we  think is more correct in retrospect is,  

46:10

as we're discussing, hard to articulate. Lakatos has all kinds of interesting examples  

46:16

in the book about these hostile verification  loops that are extremely long-lasting. 

46:24

One he talks about is Prout. There's this chemist in 1815 who hypothesizes that  

46:31

all atomic nuclei must have whole number weights. They're basically all made of hydrogen. 

46:38

The reason he thinks this is because if you  look at the measured weights of all elements,  

46:42

it does seem that almost all of  them have whole number weights. 

46:45

But then there are some exceptions. For example, chlorine comes out at 35.5. 

46:51

So then there's all these ad hoc theories that  people in this school keep coming up with, like,  

46:55

"Oh, maybe there's chemical impurities." But there's no chemical reaction you  

46:59

can do which seems to get rid of this. Maybe it's fractions of whole numbers,  

47:03

so 35.5 can be halves. But actually, if you  

47:05

measure chlorine even closer, it's 35.46, so it's  getting further away from the correct fraction. 

47:11

Later on, what is discovered is what you're  actually measuring is different isotopes,  

47:15

which cannot be chemically distinguished. They can only be physically distinguished. 

47:20

So you have 85 years before we realize what  an isotope is, where the verification loop is  

47:25

actively hostile against the correct theory. You just need this remnant to be defending…  

47:30

There's no ex ante reason  it's the preferred theory. 

47:33

As a community, we should just have people  try to integrate new observations, even if  

47:38

they don't seem to fit their school of thought,  and hopefully enough of that happens… Anyways,  

47:45

I guess the thing I'm trying to articulate  is the difficulty with automating science. 

47:51

The question is, where is  the bottleneck at some level? 

47:56

Are we primarily bottlenecked  on one type of thing, or are  

47:59

we bottlenecked on multiple types of things? Certainly, talking to structural biology people,  

48:07

they seem to think that AlphaFold was an  enormous advance. It was a shock. At some level,  

48:12

yes, AI can certainly help us speed up science. It is helping with a certain type of bottleneck. 

48:22

That doesn't mean though, as you're  saying, that it's necessarily going  

48:24

to help with all kinds of bottlenecks. I suppose the question you're pointing at is,  

48:29

what are the types of bottlenecks that remain,  and what are the prospects for getting past them? 

48:35

Even in the case of coding, it's really  interesting talking to programmer friends. 

48:40

At the moment they're all in this  state of shock and high excitement,  

48:45

and they're all over the place. You do wonder where the  

48:51

bottleneck is going to move to. Certainly, one thing that a lot  

48:54

of them seem to be bottlenecked on now is  having interesting ideas, and in particular,  

48:59

having interesting design ideas. There's not really a verification loop for  

49:04

knowing that a design idea is very interesting. They're no longer nearly as bottlenecked by their  

49:12

ability to produce code, but they are  still bottlenecked by this other thing. 

49:17

Formerly, they weren't bottlenecked on it because  just writing code took so much of their time. 

49:22

They could have lots of ideas while they were  taking three weeks to implement their prototype,  

49:28

and then they would implement the next version. Now they're taking three hours to implement the  

49:32

prototype, and they don't have as good ideas  after that, from a design point of view. 

49:39

Last year, I predicted that by 2028,  AI would be able to prep my taxes about  

49:43

as well as a competent General Manager. But we're already getting pretty close. 

49:47

As I shared before, I use Mercury both  for my business and my personal banking. 

49:51

So I recently gave an LLM access  to my transaction history across  

49:54

both accounts through Mercury's MCP. I asked it to go through all my 2025 transactions  

50:00

and flag any personal expenses that seem like  they should actually be charged to the business. 

50:04

And this worked shockingly well. Mercury's MCP exposes a bunch of  

50:08

detailed information, things like notes and memos  and any JPEGs of receipts and PDF attachments. 

50:15

So my LLM had plenty of context to work with. One of my favorite examples  

50:19

happened with a charge to Bay Padel. If you looked at the vendor alone, you would  

50:22

have had to assume that it's a personal expense. But the LLM looked at the receipt and the attached  

50:28

note in Mercury and realized this  was actually a team bonding exercise  

50:32

from our last in-person retreat. So a legitimate business expense. 

50:36

I imagine it will be a while  before traditional banks have MCP. 

50:40

Functionality like this is why I use Mercury. Go to mercury.com to learn more. 

50:45

Mercury is a fintech company,  not an FDIC Insured Bank. 

50:48

Banking services provided through Choice  Financial Group and Column NA, members FDIC. 

50:54

You have a very interesting take. I think it was a footnote in one of  

50:58

your essays, and I couldn't find it again,  which was that it's very possible that if  

51:02

we met aliens, they would have a totally  different technological stack than us. 

51:07

That contradicts a common assumption I  had that I never questioned, which is that  

51:11

science is this thing you do relatively  early on in the history of civilization. 

51:17

You get to a point and you have a couple hundred  years of just cranking through the basics,  

51:21

understanding how the universe works, and you've  got it. You've got science. Then everybody  

51:27

would converge on the same "science." I found that a very interesting idea,  

51:31

and I want you to say more about it. The idea there that I'm at least somewhat  

51:39

attached to is that the tech tree or the  science and tech tree is probably much  

51:48

larger than we realize. We're in this funny  situation. People will sometimes talk about  

51:55

a theory of everything as a potential goal for  physics, and then there's this presumption that  

52:02

physics is done once you get there. Of course, this is not true at all. 

52:06

If you think about computer science,  computer science started in the 1930s  

52:12

when Turing and Church and so on laid  down what the theory of everything was. 

52:18

They just said, "Here's how computation works." We've spent ninety-odd years since then  

52:24

exploring the consequences of that and gradually  building up more and more interesting ideas. 

52:29

Those ideas, to some extent,  you can regard as technology. 

52:32

But insofar as they're discovered principles  inside that theory of computation, I think  

52:38

they're best regarded as science and in  some cases, very fundamental science. 

52:42

Ideas like public-key cryptography are  incredibly deep, very non-obvious ideas which  

52:50

lay hidden already in the 1930s. My expectation is that there will  

52:59

be different ways of exploring this tech  tree, and we're still relatively low down. 

53:03

We're still at the point where we're just  understanding these basic fundamental theories,  

53:08

and we haven't yet explored them. A thing which I think is quite fun  

53:13

is if you look at the phases of matter. When I was in school, we'd get taught that  

53:17

there are three phases of matter, or sometimes  four or five, depending on what you included. 

53:26

As an adult, as a physicist, you start to  realize we've been adding to this list. 

53:33

We've got superconductors and superfluids,  and maybe different types of superconductors,  

53:38

and Bose-Einstein condensates,  the quantum Hall systems,  

53:41

fractional quantum Hall systems, and so on. It's starting to turn out there's a lot of  

53:49

phases of matter to discover, and we're  going to discover a lot more of them. 

53:54

In fact, we're going to be able to  start to design them in some sense. 

53:57

We'll still be subject to the laws of physics,  but there is this tremendous freedom in there. 

54:02

This looks to me like we're down  at the bottom of the tech tree. 

54:06

We've barely gotten started there, and  I expect that to be the case broadly. 

54:13

Certainly, programming is a  very natural place to look. 

54:18

The idea that we've discovered all the deep ideas  in programming just seems obviously ludicrous. 

54:25

We keep discovering what seem like deep, new,  fundamental ideas. We're very limited. We're  

54:34

basically slightly jumped-up chimpanzees,  so we're slow and it's taking us time. 

54:43

But what do we look like another million  years in the future, in terms of all the  

54:51

different ideas people have had around how  to manipulate computers and information? 

54:58

I think we're likely to discover that there are  a lot of very deep ideas still to be discovered. 

55:07

I think it was Knuth in the preface to The Art of  Computer Programming who says something like it. 

55:12

He started this book back in the sixties. He talked to a mathematician who was a bit  

55:18

contemptuous and said, "Look, computer  science isn't really a thing yet. 

55:22

Come back to me when there's  a thousand deep theorems." 

55:26

Knuth remarks, writing the preface decades later,  "There clearly are a thousand deep theorems now." 

55:36

It's really interesting to think what the  long-term future is as you get higher and  

55:41

higher up in the tech tree, choices about which  direction we go and how we choose to explore. 

55:51

It's potentially the case that different  civilizations or different choices mean  

55:56

we end up in different parts of that tree. In particular, there are just very basic things  

56:03

about how we're very visual creatures, while  certain other animals are much more aurally based. 

56:10

Does that bias the types  of thoughts that you have? 

56:15

Then you extend it to much more exotic kinds  of civilizations where maybe their biases in  

56:23

terms of how they perceive and manipulate  the world are quite different than ours. 

56:29

That might make some significant  changes in terms of how they do  

56:35

that exploration of the tech tree.  It's all speculation, obviously. 

56:39

This is such an interesting take. I want to better understand it. 

56:43

One way to understand it is that  there might be some things which are  

56:47

so fundamental and have such a wide collision  area against reality that they're inevitably  

56:51

going to discover, like general relativity. Numbers. Numbers. Of all the intelligences in  

56:59

the Milky Way galaxy… Maybe that number is one. Well, actually, arguably we've already  

57:04

increased the number. But of all of those,  

57:09

what fraction have the concept of counting?  It does seem very natural. What fraction have  

57:17

discovered the idea of some kind of decimal place  system? Interesting question. Maybe we're missing  

57:25

something really simple and obvious that's  actually way better than that. What fraction  

57:30

got there immediately? What fraction had to  go through some other intermediate state? 

57:34

What fraction uses linear representations  versus a two-dimensional or a  

57:39

three-dimensional representation? I think the answers to these questions  

57:42

are just not at all obvious. It's a lot of design freedom. 

57:45

On theoretical computer science, this is going to  be extremely naive and arrogant, but I took Scott  

57:54

Aaronson's class on complexity theory, and I  was by far the worst student he's ever had. 

58:02

What I remember is there was this period, in  which you were one of the pioneers, where we  

58:08

figured out the class of problems that  quantum computers can solve and how it  

58:13

relates to problems that classical computers  can solve. It was groundbreaking. It's  

58:16

crazy that this works. Since then… There's  literally this website called Complexity  

58:22

Zoo which lists out all the complexity classes. If you have this complexity class with this kind  

58:27

of oracle, it's equivalent to this other class. It feels like we're building out that taxonomy. 

58:34

There are a couple ways to  understand what you're saying. 

58:35

One, maybe you disagree with me that this  is actually what's happened with this field. 

58:39

Another is that while that might happen to  any one field, who would've thought in 1880  

58:44

that computer science, other than Babbage,  was going to be a thing in the first place? 

58:49

We're underestimating how many  more fields there could be. 

58:52

Or maybe you think both, or maybe a  third secret thing. I'd be curious. 

58:59

A very common argument here is  the low-hanging fruit argument. 

59:03

The argument that says there  should be diminishing returns. 

59:06

In fact, empirically we see this. The amount of scientists in the  

59:09

world has exponentially increased. I think it's worth thinking about why  

59:16

you expect diminishing returns and how well  that argument actually applies in practice. 

59:24

An analogy I like is thinking about  going to an event, like a wedding,  

59:30

and you go to the dessert buffet. They've  put out thirty desserts. Naturally,  

59:37

what people do is the best desserts go first. We don't quite have a well-ordered preference  

59:43

there, so maybe there's some difference,  but human beings are fairly similar,  

59:47

so the best desserts will go first. This is an argument for why you expect  

59:53

diminishing returns in a lot of different fields. If it's relatively easy to see what's available  

59:58

and people have similar preferences,  then the best stuff goes first and it  

60:03

just gets worse and worse after that. If you look at a very static snapshot  

60:10

in time of scientific progress,  maybe there's some truth to that. 

60:16

But if somebody is standing behind the dessert  table and is replenishing and restocking the  

60:21

desserts and keeps adding new ones in,  it may turn out that a little bit later,  

60:27

much better desserts appear, and you're  going to go and eat those instead. 

60:33

Scientific progress has a  little bit of that flavor. 

60:36

We go through these funny time periods. Computer science is a great example,  

60:40

where computer science basically arose as a  side effect of some pretty abstruse questions  

60:48

in the philosophy of mathematics and logic. You've got these people trying to attack  

60:57

these rather esoteric questions that  seem quite high up in exploration,  

61:04

and they discover this fundamental new field,  and all of a sudden there's an explosion there. 

61:09

The diminishing returns argument  just didn't apply there. 

61:12

We just weren't able to see what was there. This has been the case over and over again. 

61:19

New fields arrive and all of a sudden, and  boom, it's easy to make progress again. 

61:24

Young people flood in because you can be  twenty-one and make major breakthroughs  

61:28

rather than having to spend twenty-five  years mastering everything that's been  

61:32

done before. It's obviously very attractive.  I'm not sure anybody understands very well  

61:40

the dynamics of that, or how to think about  why the structure of knowledge is that way,  

61:47

where these new fields keep opening up. But it does seem empirically to be the case. 

61:54

Despite the fact that that is  the case… Take deep learning. 

61:58

Obviously, this is an example of a new field  where twenty-one-year-olds can make progress and  

62:05

it's relatively new. Fifteen years or so  

62:08

since it got back into high gear. But already we're in a stage where  

62:16

you need billions, tens of billions,  or hundreds of billions of dollars  

62:20

to keep making progress at the frontier. There are a couple ways to understand that. 

62:24

One is that it actually is harder than the  kinds of things the ancients had to do,  

62:28

or is more intensive at least. Second is it might not have been,  

62:33

but because our civilizational resources are  so large, the amount of people is so large,  

62:37

the amount of money is so large, we can basically  make the kind of progress it would have taken  

62:41

the ancients forever to make almost immediately. We notice something is productive and immediately  

62:47

dump in all the resources. But it's also weird that  

62:51

there's not that many of them. I feel like deep learning is notable  

62:55

because it is one big exception to the fact  that it's hard to think of other examples. 

62:59

I think that's a consequence of  the architecture of attention. 

63:03

At any given time, there's  always a most successful thing. 

63:09

If deep learning wasn't a thing,  maybe you'd be talking about CRISPR. 

63:12

Maybe we wouldn't think about solving the protein  structure prediction problem as a success of AI. 

63:23

Maybe we would have figured out how to do it  with curve fitting, more broadly construed,  

63:28

and we'd just be like, "Wow, that  took a lot of computing resources." 

63:31

But protein structure prediction might  be an enormously important thing. 

63:36

There is always our biggest thing. What you're pointing at is more a consequence  

63:43

of the way in which attention gets centralized. It's basically fashion, is what I'm saying. 

63:49

It's not just fashion, but  there is some dynamic there. 

63:54

There's a very interesting and  important implication of this idea. 

63:59

That the branching is so wide and so  contingent and so path-dependent that  

64:05

different civilizations would stumble  on entirely different technology stacks. 

64:08

There's a very interesting implication that  there will be gains from trade into the far,  

64:13

far future, which might actually be one of the  most important facts about the far future in  

64:17

terms of how civilizations are set up, how  they coordinate, and how they interface. 

64:22

There's not this "go forth and exploit." There are humongous gains to trade from  

64:28

adjacent colonies or whatever. Sort of. There's a question of  

64:35

what's actually hard. If it's just the ideas,  

64:40

well, those spread relatively quickly. It's relatively easy to share ideas. 

64:44

If it's something more, it's  almost a Dan Wang kind of idea  

64:48

where there's some notion of capacity. You need all the right techs, you need all  

64:53

the right manufacturing capacity, and so on. So civilization A has a very different kind  

64:59

of manufacturing capacity, and it's just  not so easy to build in civilization B. 

65:03

Even if civilization B is ahead,  I think that becomes true. 

65:08

There is a comparative advantage which  is going to provide massive benefits  

65:16

to trade in both directions. Eventually, you expect some  

65:19

diffusion of innovation. It is funny to think about  

65:23

what the barriers are there. A fun thought experiment I like  

65:26

to think about is GitHub but for aliens. Somebody presents you with all of the code  

65:36

from some alien civilization. I don't even know what code means  

65:40

there, but their specification of algorithms. It would have many interesting new ideas in there,  

65:50

and it would take forever for human beings to  dig through and try and extract all of those. 

65:56

The origin of this for me was  thinking about proteins in nature. 

66:05

We've been gifted this incredible variety of  machines which we don't really understand at all. 

66:12

We just have to go and try and  understand them on a one-by-one basis. 

66:17

We're still understanding hemoglobin  and insulin and things like this. 

66:23

There are hundreds of millions of proteins known. So it is a little bit like that. 

66:28

We've been gifted by biology this immense library  of machines, no doubt containing an enormous  

66:36

number of very interesting ideas, and we're just  at the very, very beginning of understanding it. 

66:43

I suppose your point—I need to relabel  your argument slightly—but you think of  

66:51

that as a gift from an alien civilization, which  obviously it isn't, but you think of it that way. 

66:56

And oh my goodness, there's so much  in there and we're going to study it. 

67:02

Goodness knows how long we  could continue to study it. 

67:04

There are tens of thousands of papers  about hemoglobin and things like that,  

67:09

and we still don't understand them, and yet  we're getting so much out of it. Just think about  

67:13

insulin alone. It's such an important thing. That's an incredibly useful intuition pump,  

67:23

that you have on Earth… I had Nick Lane on  where he had this theory about how life emerged,  

67:27

but whatever theory you have, something  like DNA has had four billion years. 

67:34

You have an alien civilization come here  and be like, "There's all these interesting  

67:37

things to learn about material science." Think about kinesin walking along. We know  

67:47

almost nothing about these proteins, and yet the  tiny few facts we do know are just incredible. 

67:52

The ribosome is another example, this  miraculous sort of device, a little factory. 

68:01

All seeded by this particular chemistry on Earth  with nucleic acids and carbon-based life forms. 

68:09

That chemistry gives rise to all of  these interesting things which an  

68:13

alien civilization would find very interesting. That very seed, which must be one among trillions  

68:19

of possible seeds of general intellectual  ideas, leads to all this fecundity. 

68:25

That's a very interesting intuition pump. I want to meditate on this "gains from trade"  

68:29

thing because I feel like there's something very  interesting about this idea that if you have this  

68:34

vision of how technology progresses and how it  may be different in different civilizations,  

68:39

it actually has important implications  about how different civilizations  

68:42

might interact with each other. The fact that there are going  

68:45

to be these huge gains from trade. It makes friendliness much more rewarding? 

68:48

Yes. That's a very important observation. I hadn't thought about that at all. 

68:54

That is a very interesting observation. It  is funny. Comparative advantage is something  

69:02

that people love to invoke and it's a very  beautiful idea obviously. There are limits to it.  

69:16

It's a special limited model. Chimpanzees can do  interesting things, but we don't trade with them. 

69:26

I think it's interesting to  think about the reasons why. 

69:31

Part of it is just power, I think. Once there's a sufficiently large power imbalance,  

69:38

very often—not always, but very often—groups  of people seem to shift into this other  

69:44

mode where they just seek to dominate. Maybe there's something special about human  

69:49

beings, but maybe it's also a more general thing. You need all these special things to be  

69:57

true before groups will trade.  It's not necessarily obvious. 

70:05

I think the big thing going on  here is one, transaction costs. 

70:09

Two, comparative advantage does not tell you  that the terms on which the trade happens  

70:17

are above subsistence for any given producer. People often bring this up in the context of,  

70:22

"Well, humans will be employed even in a  post-AGI world because of comparative advantage." 

70:26

There are five different ways that argument breaks  down, but the easiest way to understand it is:  

70:32

why don't we have horses all around on the roads? Because there's some comparative advantage  

70:36

between cars and horses. One, there are huge  

70:40

transaction costs to building roads that are  compatible with horses and cars at the same time. 

70:45

In a similar way, AI thinking at 1,000 times the  speed that can shoot their latent states at each  

70:52

other is going to find it way more costly  than the benefit, in terms of interacting  

70:57

with a human being in the supply chain. Second, just because horses have a comparative  

71:07

advantage mathematically does not mean that it  is worth paying $100,000 a year, or whatever  

71:14

it costs to sustain a horse in San Francisco. That subsistence isn't going to be worth the  

71:19

benefit you get out of the horse. I do think it's interesting,  

71:23

the sheer fact… My expectation and my intuition  obviously differs a great deal from yours on this. 

71:32

Most parts of the tech tree  are never going to be explored. 

71:35

There are just too many interesting  ways of combining things. 

71:38

There are too many deep ideas waiting to be  discovered, and not only we, but nobody ever  

71:45

is going to discover most of them. So choices about how to do the  

71:50

exploration actually matter quite a bit. It's something I really dislike about  

71:55

technological determinist arguments. I'm willing to buy it low enough  

71:59

down when progress is relatively simple. But higher up, you start to get to shape  

72:06

the way in which you do the exploration. And it's interesting, we are starting to  

72:12

shape it in interesting ways. There are various technologies  

72:17

that have been essentially banned. You think about DDT, chlorofluorocarbons,  

72:22

restrictions on the use of nuclear weapons,  the Nuclear Non-Proliferation Treaty. 

72:27

Those kinds of things weren't done before the  fact, but they're starting to get pretty close  

72:36

in some cases, where we just preemptively decide,  "Oh, we're not going to go down that path." 

72:43

So that starts to look like a set of  institutions where we are actually  

72:47

influencing how we explore the tech tree. On where you would see these gains from trade,  

72:54

obviously you'd see the most where it's pure  information that could be sent back and forth,  

72:59

because the information has this quality  where it is expensive to produce,  

73:02

but cheap to verify and cheap to send. It'll be interesting how much of future  

73:09

productivity can be distilled down to information. Right now, it's hard to do. 

73:14

If China's really good at manufacturing  something, there's this process knowledge  

73:18

that's in the heads of 100 million people  involved in the manufacturing sector in China. 

73:23

But in the future, it might  be easier if AIs are doing it. 

73:26

The question is to what extent our  fabrication gets very uniform and  

73:32

gets really commoditized. 3D printers have been  the next big thing for at least 20 years now. 

73:39

Why do they still not work all that well? Why are they still not at the center of  

73:44

manufacturing, and what comes after that? It is funny to look at the ribosome by contrast,  

73:50

which really is at the center of biology  in a whole lot of really interesting ways. 

73:55

Whether or not that's the future of  manufacturing is something very simple,  

74:00

where everything goes as throughput through  a bioreactor or something like that. 

74:07

You send the information, and then you grow stuff,  or you have some 3D printer that actually works. 

74:15

If they're good enough, then it does become much  more a pure information problem, and some of this  

74:20

process knowledge becomes much less important. Jane Street has a lot of compute, but GPUs are  

74:27

very expensive, and so even optimizations  that have a relatively small effect on GPU  

74:32

utilization are still extremely valuable. Two of Jane Street's ML engineers,  

74:36

Corwin and Sylvain, walked through some  of their optimization workflows at GTC. 

74:40

You're not bottlenecked on the network  being too slow, you're bottlenecked on  

74:42

waiting for a different rank in your  training not having completed the work. 

74:47

They talked about how Jane Street profiles  traces and diagnoses bottlenecks, and then  

74:51

how they solve them using techniques like CUDA  graphs and CUDA streams and custom kernels. 

74:56

With these sorts of optimizations,  Corwin and Sylvain were able to  

74:59

get their training steps down from 400  milliseconds to 375 milliseconds each. 

75:04

This 25 millisecond difference might sound  small, but given the size of Jane Street's fleet,  

75:08

that improvement could free up thousands of B200s. Jane Street open sourced all the relevant code. 

75:13

If you want to check it out, I've linked the  GitHub repo and the talk in the description below. 

75:17

And if you find this stuff exciting, Jane  Street is hiring researchers and engineers. 

75:21

Go to janestreet.com/dwarkesh to learn more. Can I ask a very clumsily phrased question? 

75:29

There are these deep principles  that we've discovered a couple of. 

75:35

One is this idea that if there's a symmetry  across a dimension, it corresponds to a conserved  

75:39

quantity. It's a very deep idea. There's  another—which you've written a lot about,  

75:43

written a textbook about in fact—about ways to  understand what kinds of things you can compute,  

75:52

what kinds of physical systems you can  understand with other physical systems,  

75:56

what a universal computer looks like, et cetera. Is your view that if you go down to this level of  

76:01

idea of Noether's theorem or the Church-Turing  principle, that there's an infinite number  

76:07

of extremely deep such principles? Because I feel what makes them special  

76:11

is that they themselves encompass so many  different possible ways the world could be. 

76:17

But no, the world has to be compatible with  a couple of these very deep principles. 

76:23

I don't know. All I have here  is speculation and instinct. 

76:29

My instinct is that we keep finding  very fundamental new things. 

76:33

It was quite formative for me to understand,  as I gave the example before, these wonderful  

76:40

ideas of Church and Turing and these other  people about universal programmable devices. 

76:46

Then you understand later, this also contains  within it the ideas of public-key cryptography. 

76:51

Then you understand later,  that also contains within it  

76:55

the ideas people refer to as cryptocurrency. There's a very deep set of ideas there about  

77:00

the ability to collectively maintain an  agreed-upon ledger, which is built upon this. 

77:11

It's taken many years to figure out  the right canonical form of those. 

77:18

Just this fact that you keep finding what  seem like deep new fundamental primitives  

77:28

has been a very important intuition pump for me. I've given that particular example, but I think  

77:35

you see that same pattern  in a lot of different areas. 

77:37

What is your interpretation then of this empirical  phenomenon where whatever input you consider into  

77:44

the scientific process or technological progress…  Economists have studied this a million ways. 

77:50

It just seems to require a very consistent  rate of X percent more researchers per year. 

77:55

There's this famous paper from a couple years  ago by Nicholas Bloom and others where they say,  

78:00

"How many people are working in the semiconductor  industry, and how has it increased over time  

78:05

through the history of Moore's law?" I think they find that Moore's law  

78:08

means transistor density increases  40% a year, but to keep that going  

78:14

the number of scientists has increased  9% a year, in the semiconductor industry. 

78:19

They go through industry after  industry with this observation. 

78:23

Is your view that there are these deep  ideas, but they keep getting harder to find? 

78:25

Or is there another way to think about what's  happening with these empirical observations? 

78:32

First of all, all of their examples are narrow. They pick a particular thing, and then they  

78:37

look at a particular metric. GPUs don't show up  there. All of a sudden you get this ability to  

78:49

parallelize, and that's really interesting. There are a lot of external consequences. 

79:02

Basically they have these  simple quantitative measures. 

79:04

They look at it in agricultural productivity. 

79:06

They look at it in a whole lot of different  ways, but you do have to focus narrowly. 

79:14

I'm certainly interested in the fact that  new types of progress keep becoming possible. 

79:21

But I think even there, there does still seem  to be some phenomenon of diminishing returns.  

79:33

Is that intrinsic? Is that something about the  structure of the world? What is it? One thing  

79:38

which hasn't changed that much is the individual  minds which are doing this kind of work. 

79:44

Maybe those should be improved as well,  or some feedback process going on there. 

79:54

Maybe that changes the nature of things. I look at scientific progress up until,  

80:02

let's say, 1700, and it was very  slow, and also very irregular. 

80:08

You had the Ionians back five centuries before  Christ doing these quite remarkable things, and so  

80:16

much knowledge would get lost, and then it would  be rediscovered, and then it would be lost again. 

80:22

You'd have to say that progress was very slow. It's partially just bound up with the fact  

80:28

that there were some very good  ideas that we just didn't have. 

80:31

Even once you've had the ideas, you  need to build institutions around them. 

80:35

You actually need to solve a whole lot of  different problems about training, allocation  

80:39

of capital, and all these kinds of things. Even just basic security for researchers,  

80:44

so they're not worried about the  Inquisition or things like that. 

80:48

There are all these complicated problems. You solve all those complicated problems,  

80:51

and then all of a sudden, boom, there's  a massive burst of scientific progress. 

80:56

If there's some kind of stagnation, if you're  not changing those external circumstances, yes,  

81:02

you may start to get diminishing returns again. But that doesn't mean there's anything  

81:07

intrinsic about the situation. Maybe something external needs to change again. 

81:14

Obviously, a lot of people think AI  is potentially going to be a driver. 

81:19

It certainly will at some level. To that extent, you can think of a lot  

81:24

of modern scientific instrumentation  as really, at some level, robots. 

81:31

What is the James Webb Space Telescope? It's unconventional maybe to describe  

81:37

it as a robot, but it's not  completely unreasonable either. 

81:42

It is an example of a highly automated, very  sophisticated system with electronically  

81:47

mediated sensors and actuators, where machine  learning is being used to process the data. 

81:55

In that sense, we're already  starting to see that transition. 

81:58

We've been seeing it for decades. I have this "smoke a joint and take  

82:03

a puff" thought, which— I think we've had a few. 

82:06

I think we're getting to that part of the  conversation, and then you can help me get  

82:08

my foot out of my mouth and figure out  a more concrete way to think about it. 

82:14

To your point that there was the Industrial  Revolution, the Enlightenment, and now there's AI,  

82:21

and each might be a different pace or a  different way in which science happens. 

82:26

If you think about the pace of how fast  such transitions have been happening,  

82:32

you can draw over the long span of human  history this hyperbolic rate of growth that is  

82:39

increasing over time as well. A hundred thousand years ago,  

82:41

you had the Stone Age. 

82:43

You go back even much further, how  long have primates been around? 

82:46

It would be millions of years. A hundred thousand years ago, the Stone Age,  

82:49

then ten thousand years ago, the Agricultural  Revolution, then three hundred years ago,  

82:54

the Industrial Revolution, each marked by this  increase in the rate of exponential growth. 

83:00

Then people think it's going  to happen again with AI. 

83:04

But that would happen potentially even faster. It would not have occurred to somebody  

83:09

at the beginning of the Industrial  Revolution that the next demarcation  

83:12

in this trend will be artificial intelligence. So if things are getting faster, and it's hard  

83:20

to anticipate what the next transition will be. I guess we just think of this singularity between  

83:25

now and AI as what distinguishes  the past from the future. 

83:29

But applying the same heuristic that  many people in the past should have had,  

83:36

maybe the "Intelligence Age" is also  quite short and the next thing after that,  

83:40

we don't even have the ontology to describe  what it is, the future will not think of the  

83:46

past as pre-intelligent AI and post-AI. No, obviously we can't prove this,  

83:55

but it certainly seems quite plausible. Part of the issue is just that the substrate  

84:01

we have available to conceive seems all wrong. You can't speculate with a bunch of chimpanzees  

84:09

about what it would be to have language. Just to pick a major transition in the past,  

84:20

the transition itself is the thing. It  seems likely. If we're talking about  

84:27

"taking a puff" kind of thoughts, I'm certainly  amused by the idea that there's going to be some  

84:34

transition involving artificial general  intelligence using classical computers. 

84:42

But actually, there'll be an interesting  transition with quantum computers as well. 

84:45

They're probably capable of a strictly larger  class of potentially interesting computations. 

84:52

So maybe the character of AQGI,  or whatever it should be called,  

84:59

is actually qualitatively different. So maybe there's a brief period  

85:04

between those two things. As I say, this is just speculation,  

85:09

but it's certainly amusing. Is there a reason to think that? 

85:12

From what I understand, for decades people  like you have put pretty tight bounds on  

85:17

the kinds of things quantum computers are  going to do. It'll speed up search somewhat.  

85:24

The kinds of things it speeds up extremely,  like Shor's algorithm, it seems like… Again,  

85:28

maybe this is to your point that we can't predict  in advance what's down the tech tree, but at least  

85:32

from here, it seems like you break encryption, but  what else are you using Shor's algorithm to do? 

85:36

We've only been thinking  about it for 40 or so years. 

85:43

Not for very long, and we haven't thought  that hard about it as a civilization. 

85:52

Does it turn out that it's very narrow?  Maybe. Does it turn out that it's very broad? 

85:56

That's also a really radical expansion  that seems distinctly possible. 

86:01

Keep in mind as well, we've been doing it  without the benefit of having the devices. 

86:06

That's a pretty big bottleneck to have. If you're thinking about computer science  

86:11

in the 1700s and you're like, "it can do AND/OR,  what can come out of that?" You can't anticipate  

86:16

Bitcoin. You can't anticipate deep learning. Maybe you could if you were sufficiently bright,  

86:21

but it is a pretty hard situation. What is your inside view, having been  

86:30

in and contributing to quantum information and  quantum computing back in the '90s and 2000s? 

86:35

What is your telling of the  history of what was the bottleneck? 

86:40

What was the key transition  that made it a real field? 

86:46

How do you rank the contributions from Feynman  to Deutsch to everybody else who came along? 

86:53

Let's just focus on the question  about what actually changed. 

86:57

Why was quantum computing not a thing in  the 1950s? It could have been. Somebody  

87:04

like John von Neumann is a good example. He  was absolutely pioneering computation. He also  

87:10

wrote a very important book about quantum  mechanics and was deeply interested in it. 

87:14

He could have invented quantum  computing at that time,  

87:18

and I think there were quite a number  of people who potentially could have. 

87:21

So why do we have these papers by people  like Feynman and Deutsch in the '80s? 

87:25

Those are fairly regarded as  the foundation of the field. 

87:31

There are some partial anticipations a little  bit earlier, but they were nowhere near as  

87:36

comprehensive and nowhere near as deep. You should  ask David. You can't ask Feynman, unfortunately,  

87:46

but he'll know much better than I do. A couple things that I think are interesting. 

87:51

One is that computation became far more  salient in the late '70s and early '80s. 

87:58

It just became a thing which many more people were  interested in, partially for very banal reasons. 

88:04

You could go and buy a PC. You could buy an Apple II. 

88:06

You could buy a Commodore 64. You could buy all these kinds of things. 

88:09

It became apparent to people that these were very  powerful devices, very interesting to think about. 

88:14

At the same time, in the quantum case,  that was also the time of the Paul  

88:20

trap and the ability to trap single ions. Up to that point, we hadn't really had the  

88:26

ability to manipulate single quantum states. You got these two separate things that  

88:31

for historically contingent reasons  had both matured around 1980 or so. 

88:41

Somebody like von Neumann could have had the idea  earlier, but it is quite an interesting factor. 

88:52

There's a story about Richard Feynman. He went and got one of the  

88:55

first PCs around 1980 or 1981. He was apparently so excited with this device,  

89:04

he actually tripped and hurt himself quite  badly carrying his brand-new computing device. 

89:16

That's a very historically contingent  coincidence, having somebody who's very  

89:25

talented and understanding of quantum mechanics  also just very excited about these new machines. 

89:32

It's not so surprising perhaps  that he's thinking about it then. 

89:36

What similar story could you  have told 10 years earlier? 

89:41

The conditions don't exist for it. I mean, it's quite a banal story, but… 

89:47

One of the things we were going to discuss was  this idea you had about the market for follow-ups. 

89:53

I think this is the perfect story to  discuss it for because you wrote the  

89:58

textbook about the field. "Mike and Ike" is  the definitive textbook on quantum information. 

90:06

You presumably came in after Deutsch. But you in the '90s somehow identified  

90:12

it as the thing that is worth  following up on and building on. 

90:17

Instead of talking about it more abstractly,  I'd love to just hear the firsthand story  

90:20

of how you knew that this is the thing to do. Of all the things that were happening in physics  

90:25

and computing, how did you decide  you want to think about this problem? 

90:30

Richard Feynman writes this great paper in 1982. David Deutsch writes an absolutely fantastic  

90:35

paper in 1985 sketching out a lot of the  fundamental ideas of quantum computing. I'm  

90:44

11 in 1985. I'm not thinking about this.  I'm playing soccer and doing whatever. 

90:49

But in 1992, I took a class on quantum mechanics  that was really terrific, given by Gerard Milburn. 

90:56

I just went and asked Gerard one day  after the fifth lecture or something. 

91:02

I said, "Do you have any papers or  whatever that you could give me?" 

91:09

He said, "Come by my office  in a couple of days' time." 

91:12

I did, and he presented me with a giant stack  of papers, which included the Deutsch paper,  

91:19

the Feynman paper, and a whole bunch of other  very fundamental papers about quantum computing  

91:24

and quantum information at a time when essentially  nobody in the world was working on it. He was. I  

91:32

think he wrote the very first paper that proposed  a practical approach to quantum computing. 

91:39

It wasn't very practical, but it  was actually in a real system. 

91:44

So in some sense, I'm benefiting  from the taste of this other person. 

91:51

As soon as I read the papers…  These are exciting papers. 

91:57

They're asking very fundamental questions,  and you realize I can make progress here. 

92:03

These are things that one  could potentially work on. 

92:06

Deutsch has this conjecture, or thesis or whatever  you’d call it, that a universal model, a quantum  

92:22

Turing machine, should be capable of efficiently  simulating any physical system at all. 

92:28

This is a very provocative idea. I think in that paper,  

92:32

he more or less claims that he's proved it. I'm not sure everybody would agree with that. 

92:39

There are questions about whether or not you  can simulate quantum field theory effectively. 

92:45

That kind of question is very  interesting and very exciting. 

92:52

It's obviously a fundamental  question about the universe. 

92:56

He has some wonderful ideas in there about  quantum algorithms, where they come from,  

93:04

what they mean, and what they relate  to the meaning of the wave function. 

93:07

Questions like this are still not  agreed upon amongst physicists. 

93:16

There's just some sense of, "Oh, I am in contact  with something which is (A) deeply important,  

93:20

and (B) we as a civilization don't have this." Of course, you start to focus  

93:26

your attention a little bit there. I'm not sure I got the answer to the question… 

93:35

Maybe I misunderstood the question. Maybe I'll explain the motivation first. 

93:44

In a previous conversation, we were discussing how  you could have known in the 1940s that the Shannon  

93:48

theorems and Shannon's way of thinking about a  communication channel is a deep idea that goes  

93:57

beyond the problems with pulse-code modulation  that Bell Labs was trying to solve at the time,  

94:02

and that it applies to everything from quantum  mechanics to genetics to computer science. 

94:09

One of the ideas you stated that we  didn't get a chance to talk about  

94:15

yet… Shannon published this paper. There are all these other papers,  

94:19

but there's some market of follow-ups where  people gravitate to and build upon Shannon's work. 

94:23

How do they realize that that's the thing  to do, and how does that process happen? 

94:29

I guess you gave your local answer. You read these papers, and you  

94:33

immediately realized there's work to be  done here. There's low-hanging fruit.  

94:37

There's some deep provocative idea  that I need to better understand,  

94:40

and I could tractably make progress on. To some extent, you're saying, "Okay,  

94:48

I wanted to get into this game of contributing to  humanity's understanding of the universe," and you  

94:57

are applying this low-hanging fruit algorithm. You're like, "elative to my particular set of  

95:01

interests and abilities, where should  I pick up my shovel and start digging?" 

95:08

There it was like, "Oh, this looks like  quite a good place to start digging." 

95:16

Different people, of course,  chose very differently. 

95:21

It was a very unusual choice at the time. This was  1992. Very few people were thinking about that. 

95:29

Fast-forwarding a bit, I don't know how  you think about your work on the open  

95:34

science movement now, but did it work? What does success there look like? 

95:41

What is the movement trying to accomplish? It's interesting. You didn't stop and define  

95:51

open science there, which 20 years ago you would  have had to do. People recognize the phrase.  

95:58

People have some set of associations with it. Most often, they have a relatively simple  

96:03

set of associations. It means maybe something  

96:06

about making scientific papers open access. Very often they have some set of notions about  

96:11

also making code openly available  or making data openly available. 

96:19

Those are already very large successes of the open  science movement, to make those salient issues. 

96:27

Those are issues on which people have opinions,  and there are relatively common arguments. 

96:35

This is like the meme version: publicly  funded science should be open science. 

96:42

That's a distillation of a set of ideas  which you might be able to contest. 

96:47

But if you can get people actually thinking  about it and engaged with that kind of argument,  

96:53

that's a very fundamental issue to be considering  in the whole political economy of science. 

97:01

If you go back three centuries, there  was a very similar argument prosecuted,  

97:09

which is the question: do we publicly  disclose our scientific results or not? 

97:13

If you look at people like Galileo and  Kepler, the extent to which they publicly  

97:19

disclosed was done in a very odd way. Sometimes they did bizarre things where  

97:26

they published some of their results as anagrams. They'd find some discovery, write down the result  

97:38

in a sentence, scramble it, and publish that. Then if somebody else later made the same  

98:02

discovery, they would unscramble the anagram  and say, "Oh, yeah, I actually did it first." 

98:07

This is not an ideal foundation  for a discovery system. 

98:12

It took a very long time, over a century, I think,  to obtain more or less the modern ideals, in which  

98:20

you disclose the knowledge in the form of a paper. There is an expectation of attribution, and a  

98:27

reputation economy gets built. "So-and-so did this  work, so they deserve the credit for that," and  

98:34

that's the basis for their careers. This is the underlying  

98:37

political economy of science. That made a lot of sense when you have a printing  

98:42

press and the ability to do scientific journals. Then you transition to this modern situation,  

98:48

where you can start to share a lot more. You can share your code,  

98:52

your data, your in-progress ideas. But there's no direct credit associated to those. 

99:00

It's not at all obvious how much reputation  should be associated to them. That's all  

99:10

constructed socially. Making it a live issue  is a very important thing to have done. 

99:18

I view that as one of the main positive  outcomes of work on open science. 

99:23

I'll give you a really practical  example to illustrate the problem. 

99:28

For a long time in physics, there was a preprint  culture in which people would upload preprints  

99:37

to the preprint archive, and  in biology, this didn't happen.  

99:42

There was no preprint culture. That's changing  now, but for a long time, this was the case. 

99:47

I used to amuse myself by asking physicists  and biologists why this was the case. 

99:54

What I would hear from biologists was they would  say, "Biology is so much more competitive than  

100:01

physics that we need to protect our priority,  so we can't possibly upload to the archive. 

100:10

We have to just publish in journals." Then I would sometimes hear from physicists,  

100:14

"Physics is so much more competitive  than biology that we need to establish  

100:18

our priority by uploading as rapidly  as possible to the preprint archive. 

100:22

We can't possibly wait to  do it with the journals." 

100:25

I think this emphasizes the extent to  which this kind of attribution economy  

100:28

is just something we construct. It's something we do by agreement. 

100:36

Any attempt to change that economy results in a  different system by which we construct knowledge. 

100:43

There is this very fundamental set of problems  around the political economy of science. 

100:51

We've got this collective project,  and how we mediate it depends upon  

100:56

the economy we have around ideas. One of the things you've emphasized  

101:01

as a part of this project of open science, and we  talked about it earlier, is collective science,  

101:06

or groups of people making progress on a  problem where no individual understands  

101:11

all the logical and explanatory levels  necessary to make a leap or a connection. 

101:20

Outside of mathematics, what is the  best example of such a discovery? 

101:24

I'm not sure I have a well-ordering  of them to give you a best. 

101:29

An example that I think is very  interesting is the LHC, where it's  

101:34

just this immensely complicated object. Years ago, I snuck into an accelerator  

101:42

physics conference. I didn't know anything at  

101:44

all about accelerator physics, but I was just  curious to see what they were talking about. 

101:49

This particular group of people  were experts on numerical methods,  

101:53

in particular on inverse methods. Inside these accelerators,  

101:59

you have these cascades. A particle will be massively accelerated, maybe  

102:04

it'll be collided, and then you'll get a shower  of particles which decays and decays and decays. 

102:10

There's just this incredible, consequential  shower, which is ultimately what you see  

102:17

at the detector. Then you have to  

102:20

retroactively figure out what produced it. There are these very complicated inverse  

102:25

problems that need to be solved. You've got this final data,  

102:29

but you need to figure out what produced it,  and that's how you look for signatures of these. 

102:34

Many of these people were incredibly  deep experts on simulation methods  

102:40

for following particle tracks. This was really deep and difficult stuff. 

102:46

I was like, "Wow, you could spend a lifetime  just learning how to do this and how to solve  

102:52

some of these inverse problems, and you would know  very little about quantum field theory, detector  

103:00

physics, vacuum physics, or data processing,  all these things that are absolutely essential  

103:09

to understanding, say, the Higgs boson". I don't think it's possible for one person  

103:17

to understand everything in depth. Lots of people broadly understand a  

103:22

lot of these ideas, but they don't understand  everything in the depth that is actually utilized. 

103:29

That's why there are these papers  with well over a thousand authors. 

103:34

Those people can talk to one another at a  high level, but they don't understand each  

103:39

other's specialties in all that much depth. Things like detector physics, vacuum physics,  

103:45

solving inverse problems, this stuff is  incredibly different from each other. 

103:52

To understand it in real detail is serious work. How do you think about prolificness versus depth? 

104:02

Maybe Darwin's an example of somebody who's  just gestating on something for many decades.  

104:07

There are other examples. Einstein during the  year he comes up with special relativity is  

104:11

just doing a bunch of different things. And Pais talks about how they were all  

104:14

relevant to the eventual build-up. It's something I stress about a lot. 

104:20

Sometimes I feel I'm too slow. It's funny though, the Darwin  

104:24

example is really interesting. Prolific at  what? God knows how many letters he wrote. 

104:33

It must have been an enormous number. So he was certainly very active. 

104:41

There's two types of work that tend to be  involved in any kind of creative project. 

104:46

There's routine stuff, and there you  just want to avoid procrastination. 

104:49

You just want to ask, "How do I get good at this?"  or "How do I outsource it?" and "How do I do it as  

104:54

rapidly as possible?" and just avoid getting  into a situation where you're prolonging it. 

105:02

Then there's high-variance stuff where you  actually need to be willing to take a lot of time. 

105:11

You need to be willing to go to different  places and talk to different people,  

105:15

where in any given instance, most of  it is just not going to be an input. 

105:20

Somehow balancing those two things… I  think a lot of people are very good at  

105:25

doing one or the other, but it's almost like  a personality trait which one you prefer. 

105:31

People tend to end up doing a lot  of one and not enough of the other. 

105:37

So I certainly try and balance those two things. Einstein is such an interesting example. 1905  

105:45

is just this extraordinary year. You can delete special relativity  

105:48

entirely, and it's an extraordinary year. You can delete special relativity, and you can  

105:53

delete the photoelectric effect for which he won  the Nobel Prize, and it's still an extraordinary  

105:58

year, plausibly a multi-Nobel-Prize-winning  year. So what's he doing? Maybe the answer is  

106:07

just that he's smarter than the rest of us. There's a lot of luck as well. 

106:16

Certainly for myself anyway, trying to  identify those things that are routine  

106:22

that I should get good at, and then just  try to do them as quickly as possible. 

106:27

I think that's yielded a  certain amount of returns. 

106:30

But also being willing to bet a  little bit more on myself on the  

106:34

variance side has also been very, very helpful. That's really hard, because intrinsically you're  

106:41

putting yourself in situations where you  don't know what the outcome is going to be. 

106:45

If you're very driven to be productive, and  actually mostly it's not working over there,  

106:52

you think, "Let's reduce this." It doesn't  feel right. When I worked in San Francisco,  

106:58

a practice I used to have each day was  instead of taking the 15-minute walk to work,  

107:04

I would take the more beautiful 30-minute walk. 

107:08

Partially just because it was beautiful, but  partially also as just a reminder that there  

107:14

are real benefits to not being efficient. But it's not an answer to your question. 

107:19

Really, I think all I'm saying is  I struggle a lot with the question. 

107:22

I think Dean Keith Simonton has this  famous equal odds rule where he says  

107:30

the probability that any given thing you  release—any paper, book, whatever—will  

107:34

be extremely important for a given person  through their lifetime is not that different. 

107:40

What really determines in what era they are the  most productive is how much they're publishing. 

107:48

Any given thing has equal odds  of being extremely important. 

107:53

I think some of the most successful creatives  or scientists, they're just doing a lot. 

107:58

Shakespeare was just publishing a lot. Of course, then there are counterexamples. Gödel  

108:03

published almost nothing. But broadly speaking,  you need a very good reason to not do that. 

108:17

It's funny, I've met a lot of people  over the years who are clearly brilliant,  

108:23

and they're just obsessed that they are going to  work on the great project that makes them famous,  

108:28

and they never do anything. That seems connected.  It's a type of aversiveness. I think very  

108:33

often they just don't want public judgment. Something that I would love to see… There's  

108:39

an awful lot of biographies and memoirs  and histories of people who achieve a lot. 

108:45

I wish there was a very large  number of biographies of people  

108:50

who are fantastically talented who just missed. I've known people who won gold medals at IMOs  

109:02

and things like that, who then tried to become  mathematicians and failed. What happened? What  

109:11

was the reason? I suspect in many cases that's  actually more informative than anything else. 

109:18

You have this essay that I was reading  before this interview about how you  

109:23

think about what the work you're doing is. And "writer" doesn't seem like the right label. 

109:28

As you say, was Charles Darwin a writer? What  exactly is that label? I'm a podcaster. In a way,  

109:36

obviously our work is very different,  but I also think a lot about what this  

109:41

work is and how I get better at it. In particular, how can I make sure  

109:45

there's some compounding between the  different people I talk to on the podcast? 

109:50

I worry that instead of this compounding, I build  up some understanding that's somewhat superficial  

109:58

about a topic, and then it depreciates. I move down to the next topic, and it depreciates. 

110:04

There are a lot of podcasters in the world who  will interview way more experts than I have,  

110:10

and I don't think they're much the  wiser or more knowledgeable as a result. 

110:15

So it's clearly possible to mess this up. I wonder if you have thoughts or takes or  

110:21

advice on how one actually learns in  a deeper way from this kind of work. 

110:29

It's an incredibly complicated and rich question. It seems like the question is,  

110:37

how do you make it a higher-growth context? How do you make it a more demanding context? 

110:42

You can do that in relatively small ways  that might yield compounding returns,  

110:47

or you can do something that is more radical. Maybe it means starting a parallel project in  

110:52

which you do something that is  actually quite a bit different. 

110:55

There is something really interesting  about how being very demanding can  

111:02

simply change your response to something. Something that I would sometimes do with  

111:07

students and sometimes with myself,  it was really aimed more at myself,  

111:10

was they would say some week, "I'm going to  try and do this work over the coming week." 

111:18

Then the next week would come by  and they hadn't solved the problem. 

111:23

If a million dollars had been at stake,  would you have put the same effort in? 

111:27

And the answer is no, invariably. They've tried, but they haven't really tried. 

111:36

I think that's a very familiar  feeling for all of us. 

111:38

You could do a lot more if you had just the right  demanding taskmaster standing by you and saying,  

111:50

"Look, you're barely operating here." I do wonder a little bit about  

111:56

what's the demanding taskmaster? What can they ask you that is going  

112:01

to make your preparation way more intense? The most helpful thing honestly is… For some  

112:09

subjects it is very clear how I prep. I'm doing an upcoming episode on chip  

112:14

design with the founder of a company that does  chip design, and he wrote a textbook on it. 

112:20

Yesterday I went over to his office, and we  brainstormed five roofline analyses I can do. 

112:27

If I understand that, I have  some good understanding. 

112:31

The problem is with almost every other  field, there's not this curriculum. 

112:37

When I interviewed Ilya three, four years  ago, it was: implement the transformer,  

112:41

and if you implement it, you have some nugget  of understanding you have clamped down. 

112:45

With other fields, it's just that I vaguely  understand this. It's not clamped. There's  

112:52

no forcing function of "do this exercise,  and if you do it, you will understand." 

112:58

Really what you're saying is you can do  a good job at podcasting without actually  

113:04

attaining this kind of understanding, and  that's the problem from your point of view. 

113:07

You want to change your job description so that  you are internalizing these chunks and just  

113:13

getting this kind of integration each time. It seems to me that what that means is you  

113:18

actually want to change the structure  of the work output at some level. 

113:27

There’s this terrible idea that lots of people  have that they should be in flow all of the time. 

113:34

And as far as I can tell, high performers  just don't believe this at all. 

113:38

They're in flow some of the time. You certainly see this with athletes. 

113:41

When they're actually out there  playing basketball or tennis,  

113:46

ideally they are in flow much of the time. But when they're training they're not. 

113:51

They're stuck a lot of the time,  or they're doing things badly. 

113:55

I suppose I wonder what that looks like for you. That I would be extremely satisfied with. 

114:00

The problem is I just don't know what  the equivalent of doing 64 laps is. 

114:06

This is a thing you can change by choosing  guests where there is a legible curriculum. 

114:12

So maybe it's a mistake not to have done that. Also, there's no real way to prep for Terence Tao. 

114:19

There's no curriculum that's a plausible one. There are many failure modes, but one  

114:28

long-term dynamic I'm worried about is that you  can have a good podcast and reach a local maximum,  

114:34

but for no particular guest or  topic are you going deep enough. 

114:39

My model of learning is that if you don't really  understand the deeper mechanism, you're just  

114:44

mapping inputs and outputs of a black box. That just fades incredibly fast or is  

114:49

not worth it in the first place. You just move on and it's over. 

114:54

You need to build the intermediate connection. AI in a weird way is really easy for that reason,  

115:05

because there is a clear thing you can do. Just implement it, and then you understand it. 

115:12

If I applied that criterion elsewhere,  do I just not do history episodes? 

115:16

Exactly. Ada Palmer. Wonderful to  talk to, incredibly interesting. 

115:22

But for you personally, what changed? There are some things I learned. 

115:27

If I had allocated more time,  especially after the interview, to  

115:32

write up 2,000 words on everything I learned  and how it connects to other things I know. 

115:36

Maybe that's a thing worth doing,  spreading out the episodes more  

115:39

and spending more time afterwards consolidating. I would pay infinite amounts of money if there was  

115:46

somebody who was really good at coming up with the  curriculum, the practice problems you need to do,  

115:51

and the exercise you need to do after the  interview to clamp what you have learned. 

115:55

Have you tried doing that with somebody? It's hard to find someone. I haven't  

116:00

tried super hard, but isn't it going to be  tough to find somebody who could do that  

116:04

for every single kind of discipline? Maybe I should just hire different  

116:08

ones for different topics. Maybe. There's something about,  

116:12

what problem are you solving for each episode? As far as I can tell, that's the only way I  

116:18

really understand anything. I get interested in  something. At first, I don't even have a problem,  

116:24

but there's just some sense that there's some  contribution to make here, and gradually you  

116:27

hone in, and there's a problem. Funnily enough, spending time  

116:32

stuck is incredibly important. That used to just be annoying. 

116:39

Now it seems like it's maybe even the  most important part of the whole process. 

116:47

That hard-won nature of it means  that I internalize it afterwards. 

116:53

I've written 10,000-word essays in  a couple of days, and I've written  

117:01

them in three months or six months. I feel like I didn't learn very much  

117:08

from the ones that only took a couple of days. Whereas some of the ones that took three months,  

117:16

15 years later, I'll still remember. Can you describe outside of physics how  

117:23

you learn, of the ones that took three months? By far the most common thing is there's always  

117:31

some creative artifact. Sometimes it's a  class. Sometimes it's engagement with a  

117:37

group of people who are working on some  collective creative artifact together. 

117:45

You might not even be aware  of it, but you're acting as  

117:48

an input to their creative ends in some way. Sometimes it's an essay or a book or whatever. 

117:56

It's one of the reasons why I  often quite enjoy doing podcasts. 

118:03

I said yes to come here partially because I  know you ask unusually demanding questions. 

118:10

That's an attempt to get this sort of perspective  from a different kind of a forcing function. 

118:17

Trying to pick the most  demanding creative context. 

118:20

For this interview, I went through three  lectures of the Susskind special relativity book. 

118:24

The problem is that there's  almost no practice problems in it. 

118:27

So I hired a physicist friend. I haven't done it yet, but for every lecture  

118:33

I want a bunch of practice problems to go through,  and I'm planning on being appropriately humbled. 

118:39

How do you make it as jugular as possible? The higher you can raise the stakes, the better. 

118:46

The interview is in some  sense high stakes, but also  

118:49

it doesn't necessarily test deep understanding. I don't think the interview is that high stakes. 

118:54

You're not writing a book about special  relativity, and you're not trying to write a  

118:57

book that replaces whatever the existing standard  textbook is. That's a really high stake. By the  

119:05

way, a phrase that I find particularly difficult. People will talk about "going deep" on a subject,  

119:16

and it turns out different people have  different ideas of what this means. 

119:19

For some people it means they  read a couple of blog posts. 

119:22

For some people it means  they read a book about it. 

119:24

For some people it means  they wrote a book about it. 

119:32

The standard you hold yourself  to determines a lot about your  

119:36

ability to integrate knowledge in this way. I found that I'm in some sense able to move  

119:47

much faster on some things through the help of  AI, but I don't know if I'm learning better. 

119:51

I think it's probably because… The hardest  thing, the thing that is most demanding,  

119:56

is so aversive that you try to take  any excuse you can to get out of it. 

120:00

Just having a back-and-forth conversation  with an LLM where you gloss over… 

120:04

It’s entertaining but not  necessarily anything else. 

120:07

It’s such an easy way to get out of the thing. In fact, it makes it easier because instead  

120:12

of doing some intermediate thinking, there's  always a next question you can ask a chatbot. 

120:17

Yeah. And it's somewhat valuable. That’s part of  the seductiveness, of course. It's not actually  

120:25

useless. But it can substitute for actually  doing the thing that maybe you should be doing.  

120:33

It’s interesting. To what extent should  you be outsourcing that kind of stuff?  

120:43

It’s an interesting judgment call. There is a  whole bunch of routine work that you want done. 

120:55

It's low value for you, so if you can  get a chatbot to do it, you may as well. 

121:01

Somebody interviewed the pioneering  computer scientist Alan Kay years ago,  

121:04

and he was asked what he thought about Linux. If I remember his answer correctly, he basically  

121:09

said, "It doesn't have anything  to do with computer science. 

121:13

It's just a great big ball of mud. There are a few interesting ideas  

121:17

in there which are worth understanding, but  mostly all you're learning is stuff about Linux. 

121:24

You're not actually learning  anything which is transferable." 

121:27

I thought that was very interesting. There's a certain kind of seductiveness  

121:33

to some things where it's sort  of a Rube Goldberg machine. 

121:38

You can just learn about all the  bits, and it feels entertaining. 

121:42

But if you step back and think about  what you're actually doing here,  

121:48

it might not actually be meeting your objectives. Maybe you want to become a sysadmin, and learning  

121:52

Linux is a great use of your time. There's no harm in that at all. 

121:57

But if your objective is to understand the  fundamentals of computing, it's much less  

122:03

clear that that's a good use of your time. It was certainly an answer I've thought a  

122:08

lot about, where for a certain type of mind,  there is a seductiveness in just learning  

122:17

systems and confusing that with understanding. Okay, I'll keep you updated on how this goes. 

122:24

I owe you a text within a month  of some revamped learning system. 

122:28

I'd be really curious. It's also true  that tiny incremental improvements  

122:33

in this are just worth so much. It's the main input into the podcast. 

122:39

It's great that the bookshelves are fancy  and I've got a blackboard or whatever,  

122:43

but really the thing that makes the podcast  better is if I can improve the learning I do. 

122:47

So yes, it's worth every morsel of improvement. All right, thanks for the therapy session. Great  

122:58

note to end on. Thanks, Michael. All right. Thanks, Dwarkesh.

Interactive Summary

Loading summary...