HomeVideos

What are we scaling?

Now Playing

What are we scaling?

Transcript

370 segments

0:00

I'm confused why some people have super

0:01

short timelines yet at the same time are

0:03

bullish on scaling up reinforcement

0:05

learning a top LLMs. If we're actually

0:08

close to a humanlike learner, then this

0:09

whole approach of training on verifiable

0:11

outcomes is doomed.

0:15

Now, [music] currently the labs are

0:17

trying to bake in a bunch of skills into

0:19

these models through mid-training.

0:21

There's an entire supply chain of

0:22

companies that are building RL

0:24

environments which teach the model how

0:26

to navigate a web browser or use Excel

0:28

to build financial models. Now either

0:31

these models will soon learn on the job

0:33

in a self-directed way which will make

0:35

all this freebaking pointless or they

0:37

won't which means that AGI is not

0:39

imminent. Humans don't have to go

0:40

through the special training phase where

0:41

they need to rehearse every single piece

0:43

of software that they might ever need to

0:44

use on the job. Baron Millig made an

0:46

interesting point about this in a recent

0:47

blog post he wrote. He writes, quote,

0:50

"When we see frontier models improving

0:51

at various benchmarks, we should think

0:53

not just about the increased scale and

0:54

the clever ML research ideas, but the

0:57

billions of dollars that are paid to

0:59

PhDs, MDs, and other experts to write

1:02

questions and provide example answers

1:04

and reasoning targeting these precise

1:06

capabilities. You can see this tension

1:08

most vividly in robotics. In some

1:10

fundamental sense, robotics is an

1:12

algorithms problem, not a hardware or a

1:14

data problem. With very little training,

1:16

a human can learn how to tell or operate

1:18

current hardware to do useful work. So

1:20

if you actually had a humanlike learner,

1:22

robotics would be in large part a solved

1:24

problem. But the fact that we don't have

1:25

such a learner makes it necessary to go

1:27

out into a thousand different homes and

1:29

practice a million times on how to pick

1:31

up dishes or fold laundry. Now, one

1:33

counter argument I've heard from the

1:34

people who think we're going to have a

1:36

takeoff within the next 5 years is that

1:38

we have to do all this cludgy RL in

1:41

service of building a superhuman AI

1:43

researcher. And then the million copies

1:45

of this automated Ilia can go figure out

1:47

how to solve robust and efficient

1:49

learning from experience. This just

1:50

gives me the vibes of that old joke,

1:52

we're losing money on every sale, but

1:54

we'll make it up in volume. Somehow,

1:56

this automated researcher is going to

1:57

figure out the algorithm for AGI, which

1:59

is a problem that humans have been

2:01

banging their head against for the

2:02

better half of a century, while not

2:04

having the basic learning capabilities

2:06

that children have. I find it super

2:08

implausible. Besides, even if that's

2:10

what you believe, it doesn't describe

2:12

how the labs are approaching

2:14

reinforcement learning from verifiable

2:15

reward. You don't need to pre-bake in a

2:18

consultant skill at crafting PowerPoint

2:20

slides in order to automate Ilia. So

2:22

clearly, the lab's actions hint at a

2:24

worldview where these models will

2:25

continue to fare poorly at

2:26

generalization and on the job learning,

2:28

thus making it necessary to build in the

2:31

skills that we hope will be economically

2:33

useful beforehand into these models.

2:36

Another counter argument you can make is

2:37

that even if the model could learn these

2:40

skills on the job, it is just so much

2:41

more efficient to build in these skills

2:44

once during trading rather than again

2:47

for each user and each company. And

2:49

look, it makes a ton of sense to just

2:50

bake influency with common tools like

2:52

browsers and terminals. And indeed, one

2:54

of the key advantages that AGIS will

2:56

have is this greater capacity to share

2:58

knowledge across copies. But people are

3:00

really underrating how much company and

3:02

context specific skills are required to

3:05

do most jobs. And there just isn't

3:07

currently a robust efficient way for AIS

3:10

to pick up these skills.

3:15

I was recently at a dinner with a AI

3:17

researcher and a biologist. And it

3:19

turned out the biologist had long

3:21

timelines. And so we were asking about

3:23

why she had these long timelines. And

3:24

then she said, you know, one part of

3:26

work recently in the lab has involved

3:28

looking at slides and deciding if

3:30

[music] the dot in that slide is

3:32

actually a macroofage or just looks like

3:34

a macroofage. And the AI researcher, as

3:36

you might anticipate, responded, look,

3:38

image classification is a textbook deep

3:40

learning problem. This is death center

3:42

in the kind of thing that we could train

3:43

these models to do. And I thought this

3:46

is a very interesting exchange because

3:47

it illustrated a key crux between me and

3:50

the people who expect transformative

3:52

economic impact within the next few

3:53

years. Human workers are valuable

3:55

precisely because we don't need to build

3:58

in the schley training bloops for every

4:00

single small part of their job. It's not

4:03

net productive to build a custom

4:04

training pipeline to identify what

4:06

macrofages look like given the specific

4:09

way that this lab prepares slides and

4:11

then another training loop for the next

4:13

lab specific microtask and so on. What

4:16

you actually need is an AI that can

4:18

learn from semantic feedback or from

4:20

self-directed experience and then

4:22

generalize the way a human does. Every

4:24

day you have to do a 100 things that

4:26

require judgment, situational awareness,

4:29

and skills and context that are learned

4:30

on the job. These tasks differ not just

4:33

across different people but even from

4:35

one day to the next for the same [music]

4:37

person. It is not possible to automate

4:40

even a single job by just baking in a

4:43

predefined set of skills let alone all

4:45

the jobs. In fact, I think people are

4:47

really underestimating how big a deal

4:48

actual AI will be because they are just

4:50

imagining more of this current regime.

4:52

They're not thinking about billions of

4:54

humanlike intelligences on a server

4:56

which can copy and merge all the

4:57

learnings. And to be clear, I expect

4:59

this, which is to say I expect actual

5:01

brain-like intelligences within the next

5:04

decade or two, which is pretty [ __ ]

5:05

crazy.

5:09

Sometimes people will say that the

5:10

reason that AIs are more widely deployed

5:13

right now across firms and already

5:15

providing lots of value outside of

5:17

coding is that technology takes a long

5:19

time to diffuse. And I think this is

5:21

cope. I think people are using this code

5:23

to gloss over the fact that these models

5:25

just lack the capabilities that are

5:26

necessary for broad economic value. If

5:28

these models actually were like humans

5:30

on a server, they'd diffuse incredibly

5:32

quickly. In fact, they'd be so much

5:34

easier to integrate and onboard than a

5:36

normal human employee is. They could

5:38

read your entire Slack and drive within

5:40

minutes. And they could immediately

5:42

distill all the skills that your other

5:43

AI employees have. Plus, [music] the

5:45

hiring market for humans is very much

5:48

like a lemons market where it's hard to

5:50

tell who the good people are beforehand.

5:52

And then obviously hiring somebody who

5:54

turns out to be bad is very costly. This

5:57

is just not a dynamic that you would

5:59

have to face or worry about if you're

6:02

just spinning up another instance of a

6:04

vetted hi model. So for these reasons, I

6:06

expect it's going to be much easier to

6:08

diffuse AI labor into firms than it is

6:10

[music] to hire a person. And companies

6:12

hire people all the time. If the

6:14

capabilities were actually at AGI level,

6:17

people would be willing to spend

6:18

trillions of dollars a year buying

6:20

tokens [music] that these models

6:22

produce. Knowledge workers across the

6:24

world cumulatively earn tens of

6:26

trillions of dollars a year in wages.

6:28

And the reason that labs are orders of

6:30

magnitude off this figure right now is

6:32

that the models are nowhere near as

6:35

capable as human knowledge workers.

6:39

Now you might be like look how can the

6:41

standard have suddenly become labs have

6:43

to earn tens of trillions of dollars of

6:44

revenue a year right like until recently

6:46

people were saying can these models

6:48

reason do these models have common sense

6:50

are they just doing pattern recognition

6:52

and obviously AI bulls are right to

6:55

criticize AI bears for repeatedly moving

6:58

these goalpost and this is very often

7:00

fair it's easy to underestimate the

7:02

progress that AI has made over the last

7:04

decade but some amount of goalpost

7:06

shifting is actually justified if If you

7:08

showed me Gemini 3 in 2020, I would have

7:10

been certain that it could automate half

7:12

of knowledge work. And so we keep

7:14

solving what we thought were the

7:15

sufficient bottlenecks to AGI. We have

7:17

models that have general understanding.

7:19

They have few shot learning. They have

7:20

reasoning. And yet we still don't have

7:23

AGI. So what is a rational response to

7:26

observing this? I think it's totally

7:28

reasonable to look at this and say, "Oh,

7:30

actually there's much more to

7:31

intelligence and labor than I previously

7:33

realized." And while we're really close

7:35

and in many ways have surpassed what I

7:38

would have previously defined as AGI in

7:40

the past, the fact that model companies

7:43

are not making the trillions of dollars

7:44

in revenue that would be implied by AGI

7:48

clearly reveals that my previous

7:50

definition of AGI was too narrow. And I

7:52

expect this to keep happening into the

7:54

future. I expect that by 2030, the labs

7:56

will have made significant progress on

7:59

my hobby horse of continual learning and

8:01

the models will be earning hundreds of

8:03

billions of dollars in revenue a year,

8:04

but they won't have automated all

8:06

knowledge work. And I'll be like, look,

8:08

we made a lot of progress, but we

8:10

haven't hit AGI yet. We also need these

8:12

other capabilities. We need X, Y, and Z

8:15

capabilities in these models. Models

8:17

keep getting more impressive at the rate

8:18

that the short timelines people predict,

8:20

but more useful at the rate that the

8:22

long timelines people predict.

8:28

It's worth asking what are we scaling

8:30

with pre-trading? We had this extremely

8:33

clean and general trend in improvement

8:35

in loss across multiples orders of

8:37

magnitude in compute. Albeit this was on

8:39

a power law which is as weak as

8:42

exponential growth is strong. But people

8:44

are trying to launder the prestige that

8:46

three training scaling has, which is

8:48

almost as predictable as a physical law

8:50

of the universe to justify bullish

8:53

predictions about reinforcement learning

8:55

from verifiable reward for which we have

8:57

no welfare publicly known trend. And

9:00

when intrepid researchers do try to

9:02

piece together the implications from

9:03

scarce public data points, they get

9:05

pretty bearish results. For example,

9:07

Toby Board has a great post where he

9:09

cleverly connects the dots between the

9:11

different O series benchmarks and this

9:14

suggested to him that quote we need

9:16

something like a millionx scale up in

9:18

total RL compute to give a boost similar

9:20

to a single GPT level. End quote.

9:27

So people have spent a lot of time

9:29

talking about the possibility of a

9:31

software in the singularity where AI

9:32

models will write the code that

9:35

generates a smarter successor system or

9:37

a software plus hardware singularity

9:39

where AIs also improve their successor's

9:42

computing hardware. However, all these

9:43

scenarios neglect what I think will be

9:45

the main driver of further improvements

9:48

at top AGI continual learning. Again,

9:51

think about how humans become more

9:52

capable than anything. It's mostly from

9:54

experience in the relevant domain. Over

9:57

conversation, Baron Miller made this

9:58

interesting suggestion that the future

10:00

might look like continual learning

10:02

agents who are all going out and they're

10:04

doing different jobs and they're

10:05

generating value and then they're

10:06

bringing back all their learnings to the

10:08

hive mind model which does some kind of

10:11

bash distillation on all of these

10:13

agents. The agents themselves could be

10:15

quite specialized containing what

10:16

Karpathi called the cognitive core plus

10:19

knowledge and skills relevant to the job

10:21

they're being deployed to do. Solving

10:23

continual learning won't be a singular

10:26

one and done achievement. Instead, it

10:27

will feel like solving in context

10:29

[music] learning. Now, GBT3 already

10:31

demonstrated in context learning could

10:33

be very powerful in 2020. [music] It's

10:35

uh in context learning capabilities were

10:37

so remarkable. The title of the GPT3

10:39

paper was language models are a few shot

10:41

learners. But of course, we didn't solve

10:43

in context learning when GPD3 came out.

10:45

And indeed, there's still plenty of

10:46

progress that still has to be made from

10:48

comprehension to context length. I

10:50

expect a similar progression with

10:53

continual learning. Labs [music] will

10:54

probably release something next year

10:56

which they call continual learning and

10:58

which will in fact count as progress

11:00

towards continual learning. But human

11:02

level on the job learning may take

11:05

another 5 to 10 years to [music] iron

11:07

out. This is why I don't expect some

11:09

kind of runaway gains from the first

11:10

model that cracks continual learning

11:13

that's getting [music] more and more

11:14

widely deployed and capable. If you had

11:16

fully solved continual learning drop out

11:18

of nowhere, then sure, it might be game

11:20

set match as SAT put it on the podcast

11:23

when I asked him about this body

11:24

disability. But that's probably not

11:25

what's going to happen. Instead, some

11:27

lab is going to figure out how to get

11:28

some initial traction on this problem

11:30

and then playing around with this

11:31

feature will make it clear how it was

11:34

implemented and then other labs will

11:36

soon replicate the breakthrough and

11:38

improve it slightly. Besides, I just

11:40

have some prior that the competition

11:41

will stay pretty fierce between all

11:43

these model companies. And this is

11:44

informed by the observation that all

11:46

these previous supposed flywheels,

11:48

whether that's user engagement on chat

11:50

or synthetic data or whatever, have done

11:53

very little to diminish the greater and

11:54

greater competition between model

11:56

companies. [music] Every month or so,

11:58

the big three model companies will

11:59

rotate around the podium, and the other

12:01

competitors are not that far behind.

12:02

There seems to be some force, and this

12:04

is potentially talent poaching. It's

12:06

potentially the rumor mill SF or just

12:08

normal reverse engineering which has so

12:10

far neutralized any runaway advantage

12:12

that a single lab might have had. This

12:15

was an narration of an essay that I

12:16

originally released on my blog at

12:18

dwarcash.com. [music]

12:19

I'm going be publishing a lot more

12:21

essays. I found it's actually quite

12:22

helpful in ironing out my thoughts

12:23

before interviews. If you want to stay

12:25

up to date with those, you can subscribe

12:26

atash.com. [music]

12:28

Otherwise, I'll see you for the next

12:30

podcast. Cheers.

Interactive Summary

The video discusses the discrepancy between the rapid advancements in AI capabilities and the slower pace of real-world economic impact. It argues that current AI models, despite impressive performance on benchmarks, lack the human-like learning and generalization abilities necessary for widespread adoption and significant economic value. The speaker highlights that while AI models are being trained on specific tasks and skills, humans learn and adapt much more fluidly. The essay proposes that continual learning, rather than just scaling or specific training, will be the key driver for future AI progress, likening its development to the gradual improvement seen with in-context learning. Despite the intense competition among AI labs, the author predicts a steady, rather than explosive, progression of AI capabilities, with human-level learning taking several more years to achieve.

Suggested questions

6 ready-made prompts