HomeVideos

Everyone is Wrong about Tokens

Now Playing

Everyone is Wrong about Tokens

Transcript

357 segments

0:00

You see this right here? Yeah. That's

0:02

$1.3 million

0:05

spent in OpenAI tokens in the last 30

0:08

days. 603

0:11

billion tokens spent. Now, even if I

0:13

were to try my hardest, I am not

0:15

actually sure it's possible for me to

0:18

spend this amount of money or that

0:20

amount of tokens. I have no idea how we

0:23

accomplished such things. And when I saw

0:25

this, I thought this is just the most

0:27

ridiculous thing and I

0:29

This is so stupid. But then, I started

0:31

thinking about it more and more and I

0:33

realized that there's a future that is

0:36

developing in which I think a lot of

0:37

people are wrong and I think this post

0:40

right here really helps it kind of

0:42

crystallize in my mind

0:45

where things are going. So, I got a lot

0:48

of yapping to do, so I hope you're going

0:49

to you know, strap down cuz I I I think

0:51

that yeah yeah you you you you you

0:53

probably aren't going to see this one

0:55

coming. I don't think you you understand

0:56

what's going to happen here in the next

0:58

year. And I think I might be right on

0:59

this one. I'm going to In fact, I'm

1:00

going to do something I normally don't

1:01

do. I'm going to make a tech prediction.

1:03

I know.

1:05

Kind of dangerous. I I do just got to

1:07

yap about this for a second, okay? The

1:09

reason why I have to yap about this is

1:11

that whenever a post like this happens,

1:13

there's always the exact same thing that

1:15

happens. There's this entire

1:17

fluencer market when it comes to AI and

1:20

I largely think they're just simply

1:21

pull-overs from the crypto days. The

1:23

crypto NFT bros moved over to AI. When

1:26

they see someone make a post like this,

1:27

Papa Pete, of course, they go, "Oh, hey

1:30

bros. Hey bros. Everybody. Uh I don't

1:31

know if you know this. If you aren't

1:33

spending like $100,000 if you A- if

1:36

you're not even hitting 10 billion, if

1:38

you're not even in the B's when it comes

1:40

to token usage a month, you're not going

1:42

to make it. You know how I know that?

1:43

Look at Papa Pete, okay? Open cloud guy,

1:46

he knows what he's doing. Do you know

1:47

what you're doing? Not going to make it.

1:49

Permanent underclass. Hey, buy my course

1:51

and I'm going to teach you how to do AI

1:53

properly." And it's such a bad takeaway.

1:56

And let me explain it in more simple

1:57

terms. Like, you know, the funny thing

1:58

about history and about tech, they don't

2:01

repeat but they do rhyme. I've heard

2:02

that once and it makes me feel like I'm

2:04

really smart saying that.

2:07

You know what I mean? So, what I mean by

2:09

they they rhyme is that in 2016, 2018,

2:13

2020, if you would see any startup, you

2:16

if you went and talked to any of your

2:17

friends in the Silicon Valley, there was

2:19

an entire culture

2:21

that had more microservices and

2:24

Kubernetes usage than they did literal

2:27

customers. I actually had a friend

2:29

lament to me that he was managing 10

2:32

different microservices and he had three

2:34

customers. Unironic, that's not me

2:36

making up or exaggerating things. He had

2:38

triple the services for three customers

2:41

and he was just like, "What the hell

2:43

have I done with my life?" And it's just

2:45

like, "Brother, you have to quit

2:46

listening to Google for how to run a

2:48

company. Just because it works for them

2:51

does not mean it works for you." And

2:53

this is kind of that same vibe. Just

2:55

because this works for Pete, which by

2:57

the way, guess how many dollars he paid

2:59

for those tokens? Yeah, zero. You know

3:02

how much money you're going to pay for

3:03

those tokens? Yeah, full price, okay,

3:05

buddy. It's not going to be cheap, okay?

3:06

You're not spending six six hundred

3:08

three billion dollars in tokens per

3:10

month. And if you if you are, I mean,

3:12

well, hey, nice to meet you, sir. I did

3:14

not realize you

3:15

I was not aware of your game. And so, I

3:17

just wanted to kind of get that out of

3:18

the way, okay? Now to the future, the

3:20

thing that I think all of you have

3:22

wrong, okay? But first, the bag. You see

3:25

these people walking around with their

3:26

laptops cracked just so their agents

3:29

don't stop running?

3:30

Mine never stop running. When making

3:32

changes with Cloud Agents, you can see

3:33

the diffs inline just like with any

3:35

other agent. It will create a PR and you

3:38

can actually see your CI running live

3:40

within the Cloud Agent. You can see the

3:41

status of the CI when it completes and

3:43

you can even go back and fix the failing

3:45

CI. Not only that, but you can also just

3:47

run live commands in the terminal. That

3:49

is my project right there. This is not

3:52

on my computer. This is in the cloud

3:54

running where I can ask it to do things.

3:57

I can ask for changes. I can ask for

3:59

changes on my phone and see the game

4:01

played via MP4. What's even crazier is I

4:05

can just take over the desktop and I can

4:06

place towers and I can just play the

4:08

game. Start round. And I can watch the

4:11

bats happen. This is my game. Try Cloud

4:14

Agents today. cursor.com/agents

4:17

and never have to worry about your

4:19

laptop being open again.

4:21

Okay, welcome back. Let's talk about the

4:22

future here for a second. So, something

4:24

that you need to kind of keep in mind

4:26

when you see these things is that what

4:28

Pete's entire goal is, it's a research

4:31

project. How far can OpenAI take token

4:34

usage? Cuz remember, they believe the

4:36

future is going to be this token Utopia

4:39

where everybody just sits back and

4:42

relaxes like we're in Wall-E and we just

4:44

are able to out anything and you

4:46

have billions upon billions of tokens

4:48

for free cuz everything gets 10x cheaper

4:50

every single year, which by the way,

4:52

that promise is 2 years old and I feel

4:54

like things have never been more

4:55

expensive. I don't know. It feels that

4:58

way to me. Maybe I'm wrong, but things

4:59

kind of feel a little costly.

5:01

Nonetheless, 10x cheaper. Remember that.

5:03

10x cheaper every single year. And so,

5:04

at some distant point in the future, you

5:07

spending 603 billion tokens and every

5:10

last person on Earth doing that, which

5:12

by the way, we don't even have enough.

5:13

Like I don't even think there's enough

5:14

power on Earth to do that currently. We

5:16

might have to 10x all power on Earth and

5:18

only use it to power GP used to make

5:20

this happen. But again, I digress. If

5:22

this were to come across, this is how

5:24

projects could look. So, I think a lot

5:27

of people look at this and they're like,

5:28

"Oh, well, you know, OpenAI is being

5:29

evil." No, I think they actually just

5:30

believe this, right? Like I think they

5:32

actually believe that every last person

5:34

will be using Infinity tokens at all

5:36

times. And yeah, sure. They are the

5:38

benefactors of it. And I mean, it's a

5:39

good future for them, but I actually

5:41

also think they they think this is just

5:43

like how the world should work. This is

5:44

how projects should be ran. And so, this

5:47

is a research project which got me to

5:49

think about something for a second. And

5:51

it's kind of this funny conundrum that

5:53

you see. Uh right now, if you go to any

5:56

of the big companies, what are they all

5:58

about? Hey, what's your token spend? I

6:00

mean, there is literal people getting

6:02

fired because they're not using AI

6:05

enough. You've seen this, you've seen

6:07

the articles, you've seen

6:09

potentially these rage posts on Reddit.

6:11

I can never tell if what I'm reading on

6:12

Reddit is real or not, if it's just

6:14

there to rage bait me into a frothy

6:16

mouth just to go off and tweet a story

6:18

that doesn't even exist. But let's

6:19

pretend they do exist. People are

6:21

getting fired for not using enough AI.

6:23

The I've read stories about people who

6:24

are interviewing, if they use too much

6:26

AI, people don't like it. If they don't

6:28

use enough AI, people are not liking it.

6:29

Like interviewing sounds like hell.

6:31

Working at companies right now sounds

6:33

pretty awful cuz you're constantly being

6:34

shoved down the throat, you must use

6:35

this. People at Amazon, you better use

6:37

Kiro. Hey, if you're over at Google,

6:38

better use that Gemini, buddy. And just

6:40

keeps on going and going and going,

6:42

right? Well, there's kind of a problem

6:43

there. I don't think people realize what

6:46

the problem is. Because right now it's

6:48

like spend all the money you want,

6:49

right? Okay. Well, let's just rewind

6:52

like 18 months, okay? Not even that long

6:54

ago. Let's just go back a little bit.

6:58

You wanted a new computer.

7:01

Oh,

7:02

you want 32 GB of RAM? Well, we're going

7:05

to have to get a vice president to sign

7:08

off on those $400. Oh, and a chair?

7:11

Yeah, that chair, it's going to be a

7:13

used Herman Miller. Okay. You're I'm

7:15

sorry, but those buns of yours do not

7:17

get the luxury of sitting on brand new

7:19

Herman Miller, okay? You know what?

7:21

We're getting you a lifetime chair.

7:22

That's what you get. Yeah, you. You get

7:24

a lifetime chair and I'm going to go

7:25

grab some patio furniture padding and

7:27

duct tape that right onto your chair.

7:28

That's what you get. That's what you

7:29

deserve, okay? Because let's just face

7:31

it, we can't be bothering our VPs for

7:33

these $50 upcharges. We can't do that,

7:36

okay? Us as a multi-billion dollar

7:38

company, we are very concerned if you

7:41

spend $25. And now, all of a sudden,

7:45

you can spend infinity on tokens. In

7:47

fact, you're even encouraged to do so.

7:49

Going back to this for a second, if you

7:51

really think about that, that means it

7:54

takes $1.3 million a month to run

7:57

OpenClaw. So, how many engineers is

8:00

that? Well, like if you think about

8:02

that, let's just pretend we're a big

8:03

tech Google company. It costs $50,000 a

8:07

month, and you're spending $1.3 million

8:10

a month on just AI agents. To replace

8:12

those with just engineers with Well,

8:14

that kind of math you I mean, it's a

8:16

number. That kind of math you can't just

8:17

do off the top of your head. So, let's

8:18

just say 30 engineers. That's like 30

8:20

engineers worth of people working on

8:22

something. You can't just do this for

8:24

every single project. Your company at

8:27

some point's going to go, "Okay,

8:28

timeout. We've made a mistake. We have

8:31

decided that we let you use all the

8:33

tokens you want. That's bad. We're going

8:35

to go back to the old days. Who's the

8:37

most token efficient? Oh, you're not

8:39

token efficient. You're spending 603

8:42

billion tokens on maintaining a simple

8:44

project? No, we're not going to do

8:46

that."

8:47

You're gone. There's going to come a

8:49

world where there's an entire consultant

8:51

class going through these companies

8:54

teaching people how to be efficient with

8:56

tokens. No longer will we see this world

8:59

of infinity token usage. Instead, it's

9:01

going to be, "Okay, who's the top

9:03

performers by features and things

9:04

delivered, not just by how much you

9:06

spend." Because in the old world, we

9:08

used to do buy versus build. Do you

9:10

build the thing or do you buy the thing?

9:12

Depending on the cost and the trade-off,

9:14

sometimes it's better to, you know,

9:15

trade the time for the money or the

9:17

money for the time. But now, we kind of

9:19

have a new world. It's like buy versus

9:21

build versus vibe. Do you vibe it? Well,

9:23

vibing takes both time and money. So,

9:26

which is the proper trade-off? And I

9:27

think companies are going to quickly

9:29

snap back to the old way in which

9:31

they've always done things. It's going

9:33

to be, "Okay, who's the most efficient?

9:35

Who knows how to use these things the

9:37

best? It's not going to be the people

9:39

spending Infinity. It's not going to be

9:40

the fluencers that's telling you

9:42

you need to run 500 agents in the cloud

9:44

at all times or you're not going to make

9:46

it. It's going to be the people that are

9:47

just being engineers. They're the people

9:50

like learning. People that actually want

9:52

to just do good work and use things to

9:54

help speed them up in certain areas. And

9:56

that's my prediction. Yes, I I'm doing a

9:59

prediction. I'm doing an actual

10:00

prediction. I know you're not supposed

10:02

to do predictions. Tech predictions

10:03

almost are largely you're always wrong,

10:06

but I do think in the near future we are

10:09

going to see token efficiency as an

10:11

entire argument as opposed to simply

10:13

token maxing. Token maxing is because

10:16

we're just trying to figure out is this

10:17

even viable? And by we, I don't mean me.

10:20

I'm out here still hand coding stuff for

10:23

my video game, okay? This is it's a

10:25

different world. But nonetheless, this

10:26

was very interesting to see. I was very

10:28

happy I got to read about this and kind

10:30

of see the live reaction from everybody

10:33

because people were just, you know,

10:35

instantaneously suspicious. Like, "Oh,

10:37

this is just open code trying to make

10:38

money." Yeah.

10:40

They are they are trying to make money.

10:41

I'll tell you that much. But they also

10:42

this is just like what they think the

10:44

future looks like. You and 100 agents

10:46

non-stop doing stuff. And maybe at some

10:49

point in the future, maybe hey, you know

10:50

what? Maybe in 10 years, some large

10:53

amount of time when we have, you know,

10:55

100x more energy and 1,000x more GPUs.

10:58

Yeah, maybe that future does exist in in

11:00

some far away place. But right now, to

11:02

me at least, the big takeaway here is

11:05

I think you got to start thinking about

11:06

token efficiency. You got to start

11:08

thinking about how you're actually using

11:09

it. Maybe having a kajillion agents does

11:12

work for one person.

11:13

But I'm not sure if this is really a

11:15

sustainable approach for anybody, even

11:18

if the promised 10x is going to happen.

11:20

Okay, sorry. I made a future prediction

11:22

and I'm probably going to be wrong, but

11:23

I you know, honestly, I think I'm right.

11:25

Also, the consulting class, can we all

11:27

just agree that's going to be the most

11:28

annoying people in the universe?

11:29

Honestly,

11:31

I'd almost rather take the crypto bros

11:34

who are going to be like, "Oh, you got a

11:35

token max." than

11:37

the new class of agile coaches that are

11:39

going to be coming out. These agile

11:40

coaches for token efficiency is just

11:43

going to be the worst. Oh my gosh. There

11:45

is actually going to be prompt trainers.

11:47

Like

11:47

it's going to be like Pokémon trainers,

11:49

but they're going to be prompt trainers

11:50

and you're going to have to go in there

11:52

and they're going to like one-v-one you

11:53

on prompts. It's going to be so

11:55

ridiculous. It's all horoscopes, baby.

11:57

The name

11:59

is the Brian Magen.

Interactive Summary

The video discusses the viral revelation of an entity spending $1.3 million on OpenAI tokens in a single month. The speaker critiques the current 'AI influencer' trend of promoting extreme token usage, comparing it to past trends of corporate over-engineering, like excessive microservice usage. The speaker predicts a shift away from 'token maxing' toward a focus on token efficiency as companies realize that infinite AI spending is unsustainable. Ultimately, the speaker anticipates the rise of a new, potentially annoying 'consulting class' focused on optimizing token usage rather than just raw consumption.

Suggested questions

3 ready-made prompts