No way this actually works

Watch on YouTube

Now Playing

Transcript

187 segments

0:00

Now, I know a lot of you have recently

0:02

been hitting some limits when it comes

0:04

to using Claude code. The conventional

0:06

wisdom, of course, is that you're

0:07

holding it wrong, but actually, it turns

0:09

out there's a better way to save on

0:11

tokens. The solution, I honestly I I

0:14

didn't believe it, but it actually

0:15

works. It actually works quite well. And

0:17

here's the thing, you will save actual

0:19

real money using this method. And no,

0:22

I'm not exaggerating. I'm talking about

0:24

Caveman. Now, you may not know what

0:26

Caveman is. And hey, if you don't know

0:28

what it is, there's there's a couple of

0:30

kind of pop references you might be

0:31

familiar with. First off, GrugBrain Dev.

0:34

If you haven't heard of Grugra Dev, I

0:36

highly recommend the essays. They go

0:38

about as counterintuitive counter. I

0:40

think I just made up a word. They're

0:42

like countercultural, but I also use the

0:44

word intuitive. I kind of just made a

0:46

baby with them. Countercultural to what

0:48

is going on in today's space age AI.

0:50

37,000 lines of code. Just only let AI

0:54

review AI's kind of nature. This is for

0:56

the simpler man. Okay.

0:58

>> So me think why waste time say lot word

1:00

when few word do trick.

1:02

>> That's actually what caveman is. Instead

1:04

of allowing claude code or codeex or

1:07

whatever to go off and say a bunch of

1:09

expressive statements. Hey man, you're

1:11

absolutely right. You could spend money

1:13

getting glazed like wild. Instead it

1:15

goes straight to the heart of the issue

1:17

which is to actually just stop saying so

1:20

many things. And with the cost of output

1:22

tokens, that's actually this actually

1:24

can save some some serious money. Okay,

1:26

so what does actually the caveman scale

1:28

look like? Well, I can't show it to you

1:29

because apparently on GitHub, it can't

1:32

show 200 lines of markdown. And when I

1:35

try to go look at it raw right now, they

1:37

broke raw. It just downloads it. Like it

1:39

doesn't even take me to the web page.

1:41

Anyways, I downloaded it and this is all

1:42

it says to do right here. Watch this.

1:44

Drop any articles. So just don't use a

1:46

and and the drop all filters. Just

1:49

really basically actually simply drop

1:51

all the pleasantries. Sure. Certainly.

1:53

Of course. Happy to. A short cinnamons

1:55

big not extensive fix not implement a

1:58

solution for. All of these are actually

2:00

real token dropping phrases that you can

2:02

actually save actual money with which is

2:04

kind of insane. No hedging. Skip the it

2:07

might be worth considering. Fragments

2:09

fine. No need full sentence. Technical

2:11

terms remain the same. So you can use

2:13

polymorphism is still polymorphism. We

2:15

don't we don't shorten up those terms.

2:17

Code blocks unchanged. Caveman speak

2:19

around code, not in code. Air messages

2:22

quoted exact. Caveman only for

2:23

explanation. You can get the same

2:25

results. The only difference is Claude

2:27

just doesn't sit there and glaze you and

2:29

say a bunch of stupid words at you with

2:31

well actually the fix was quite simple

2:33

and your insight into the sol into the

2:35

problem space was actually the right

2:37

direction. All I had to do and it's just

2:39

like no no no shut up. Stop saying that.

2:42

Here's a good example. Sure, I'd be

2:44

happy to help you with that. The issue

2:45

you are experiencing is likely caused by

2:48

no, don't do that. Yes. Bug in O

2:51

middleware token expiry check. Use this,

2:54

not that fix. So often you can actually

2:56

drop a lot of tokens. Even just this

2:58

alone, you can see right here it goes

3:01

from 69 tokens to 19 tokens. It even

3:04

allows you to do various levels of

3:05

caveman. You can do light where you're

3:06

just trimming the fat. You can do like

3:08

kind of the full one. You can also do

3:10

the ultra maximum one. All full rules

3:13

plus abbreviate common terms DB, O

3:15

config, request, res, u, fn, imple,

3:18

strip conjunctions where possible,

3:20

one-word answers when one word enough,

3:22

arrow notation for causality, and this

3:24

just actually works. Like, this is just

3:26

the free hack. They even have like a

3:28

basic table breaking down the various

3:30

usages, explaining a re a reactender

3:32

bug. It goes from 1,180 to 159 tokens.

3:37

87% saved, which also just shows how

3:40

fluffy the language is. Like, think

3:42

about it's like bloiating. It's just

3:44

saying a bunch of nonsense with these

3:46

big extravagant words without actually

3:48

saying anything at all. I don't want to

3:51

be much of a conspiracy theorist, but

3:53

you know, I'm just saying Claude, they

3:55

do make their money by output tokens. So

3:58

instead of just being like off is

3:59

broken, it needs to just go on just a

4:01

rampage soliloquy to let you know every

4:04

last possible thing that could possibly

4:06

be said about a topic that could be said

4:07

in three words. It's truly an impressive

4:10

piece of technology. I can't like

4:11

honestly the trade for affordable

4:14

computers in Rainforest for one of these

4:16

little black magic, you know, sandboxes

4:19

is pretty it's pretty fantastic. I would

4:21

make that trade any day of the week.

4:24

Also, if you don't know anything about

4:25

me, I typically don't cite studies

4:27

because largely I think studies have

4:29

been gamed this the facts the facts

4:32

you're getting hit with I'm not too sure

4:34

you can really trust those cuz you could

4:36

you know there's lies there's damned

4:37

lies and then there's statistics. But

4:39

hey, since this one's going up in my

4:41

favor in March 2026, so just a couple

4:44

days ago, brevity constraints reverse

4:46

performance hierarchies and language

4:48

models. All of that, all those words

4:50

just simply mean that uh making the

4:52

response brief improves accuracy by 26

4:55

percentage points. Now, what is 26% more

4:58

accurate? Some would some would say that

5:00

sounds like a lot. What does 26% even

5:03

mean? It doesn't really matter. You know

5:05

why? Cuz it's more accurate. Okay. Hey,

5:07

green mean good. Okay. We got that graph

5:09

that's going up and to the right. And

5:11

that's all you need in life. Okay. When

5:13

things get better, it's good. Things

5:15

bad, not good. So, go ahead, give it a

5:18

try. Go check out this Julius Brussy's

5:20

caveman, which also Can we just take a

5:23

quick step to the side? We got we got we

5:25

got to chat about this for a second.

5:26

Why? Why oh why why oh why does every

5:30

single agent program you can possibly

5:32

download have its own skill directory

5:35

that you put skills into? This has to be

5:38

the greatest XKCD outcome that could

5:41

ever be. you have like any project I

5:44

seem to walk into has like 20 separate

5:46

folders for the same text and they're

5:50

all committed. [laughter] It's just like

5:52

why why did we get here? [gasps] I

5:55

THOUGHT WE GOT PhD level intelligence.

5:58

Instead, we just have absolutely junior

6:01

level execution. It hurts me. It hurts

6:04

me deep down. Anyways, so if you're

6:07

struggling out there using Claude code

6:09

and the you're holding it wrong message,

6:11

in fact, it did not help you. Why don't

6:12

you give this a try right here? Okay, go

6:14

check it out. Don't say I never told you

6:17

anything. Okay, cuz this is good. This

6:18

is good information right here. Okay,

6:20

this good thing you download, use now

6:23

name Prime Gen. Hey, is that HTTP? Get

6:26

that out of here. That's not how we

6:28

order coffee. We order coffee via SSH

6:31

terminal.shop. Yeah, you want a real

6:32

experience. You want real coffee. You

6:35

want awesome subscriptions. So you never

6:36

have to remember again. Oh, you want

6:38

exclusive blends with exclusive coffee

6:41

and exclusive content? Then check out

6:44

CRON. You don't know what SSH is?

6:47

>> Well, maybe the coffee is not for you.

6:50

[singing]

6:51

[music]

6:54

Live the dream.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The video introduces a technique called "Caveman" to reduce token usage and save money when using AI models like Claude. The conventional advice is often to improve prompts, but "Caveman" focuses on making the AI's responses more concise by eliminating unnecessary words, pleasantries, and hedging. This method can significantly reduce the number of output tokens, leading to cost savings. The speaker explains that "Caveman" involves stripping out filler phrases, avoiding unnecessary full sentences, and abbreviating common terms, while keeping technical terms and code blocks intact. The goal is to get straight to the point, as exemplified by the "So me think why waste time say lot word when few word do trick" phrase. The video also briefly touches on the inefficiency of agent programs having individual skill directories and promotes a coffee subscription service.