No way this actually works
187 segments
Now, I know a lot of you have recently
been hitting some limits when it comes
to using Claude code. The conventional
wisdom, of course, is that you're
holding it wrong, but actually, it turns
out there's a better way to save on
tokens. The solution, I honestly I I
didn't believe it, but it actually
works. It actually works quite well. And
here's the thing, you will save actual
real money using this method. And no,
I'm not exaggerating. I'm talking about
Caveman. Now, you may not know what
Caveman is. And hey, if you don't know
what it is, there's there's a couple of
kind of pop references you might be
familiar with. First off, GrugBrain Dev.
If you haven't heard of Grugra Dev, I
highly recommend the essays. They go
about as counterintuitive counter. I
think I just made up a word. They're
like countercultural, but I also use the
word intuitive. I kind of just made a
baby with them. Countercultural to what
is going on in today's space age AI.
37,000 lines of code. Just only let AI
review AI's kind of nature. This is for
the simpler man. Okay.
>> So me think why waste time say lot word
when few word do trick.
>> That's actually what caveman is. Instead
of allowing claude code or codeex or
whatever to go off and say a bunch of
expressive statements. Hey man, you're
absolutely right. You could spend money
getting glazed like wild. Instead it
goes straight to the heart of the issue
which is to actually just stop saying so
many things. And with the cost of output
tokens, that's actually this actually
can save some some serious money. Okay,
so what does actually the caveman scale
look like? Well, I can't show it to you
because apparently on GitHub, it can't
show 200 lines of markdown. And when I
try to go look at it raw right now, they
broke raw. It just downloads it. Like it
doesn't even take me to the web page.
Anyways, I downloaded it and this is all
it says to do right here. Watch this.
Drop any articles. So just don't use a
and and the drop all filters. Just
really basically actually simply drop
all the pleasantries. Sure. Certainly.
Of course. Happy to. A short cinnamons
big not extensive fix not implement a
solution for. All of these are actually
real token dropping phrases that you can
actually save actual money with which is
kind of insane. No hedging. Skip the it
might be worth considering. Fragments
fine. No need full sentence. Technical
terms remain the same. So you can use
polymorphism is still polymorphism. We
don't we don't shorten up those terms.
Code blocks unchanged. Caveman speak
around code, not in code. Air messages
quoted exact. Caveman only for
explanation. You can get the same
results. The only difference is Claude
just doesn't sit there and glaze you and
say a bunch of stupid words at you with
well actually the fix was quite simple
and your insight into the sol into the
problem space was actually the right
direction. All I had to do and it's just
like no no no shut up. Stop saying that.
Here's a good example. Sure, I'd be
happy to help you with that. The issue
you are experiencing is likely caused by
no, don't do that. Yes. Bug in O
middleware token expiry check. Use this,
not that fix. So often you can actually
drop a lot of tokens. Even just this
alone, you can see right here it goes
from 69 tokens to 19 tokens. It even
allows you to do various levels of
caveman. You can do light where you're
just trimming the fat. You can do like
kind of the full one. You can also do
the ultra maximum one. All full rules
plus abbreviate common terms DB, O
config, request, res, u, fn, imple,
strip conjunctions where possible,
one-word answers when one word enough,
arrow notation for causality, and this
just actually works. Like, this is just
the free hack. They even have like a
basic table breaking down the various
usages, explaining a re a reactender
bug. It goes from 1,180 to 159 tokens.
87% saved, which also just shows how
fluffy the language is. Like, think
about it's like bloiating. It's just
saying a bunch of nonsense with these
big extravagant words without actually
saying anything at all. I don't want to
be much of a conspiracy theorist, but
you know, I'm just saying Claude, they
do make their money by output tokens. So
instead of just being like off is
broken, it needs to just go on just a
rampage soliloquy to let you know every
last possible thing that could possibly
be said about a topic that could be said
in three words. It's truly an impressive
piece of technology. I can't like
honestly the trade for affordable
computers in Rainforest for one of these
little black magic, you know, sandboxes
is pretty it's pretty fantastic. I would
make that trade any day of the week.
Also, if you don't know anything about
me, I typically don't cite studies
because largely I think studies have
been gamed this the facts the facts
you're getting hit with I'm not too sure
you can really trust those cuz you could
you know there's lies there's damned
lies and then there's statistics. But
hey, since this one's going up in my
favor in March 2026, so just a couple
days ago, brevity constraints reverse
performance hierarchies and language
models. All of that, all those words
just simply mean that uh making the
response brief improves accuracy by 26
percentage points. Now, what is 26% more
accurate? Some would some would say that
sounds like a lot. What does 26% even
mean? It doesn't really matter. You know
why? Cuz it's more accurate. Okay. Hey,
green mean good. Okay. We got that graph
that's going up and to the right. And
that's all you need in life. Okay. When
things get better, it's good. Things
bad, not good. So, go ahead, give it a
try. Go check out this Julius Brussy's
caveman, which also Can we just take a
quick step to the side? We got we got we
got to chat about this for a second.
Why? Why oh why why oh why does every
single agent program you can possibly
download have its own skill directory
that you put skills into? This has to be
the greatest XKCD outcome that could
ever be. you have like any project I
seem to walk into has like 20 separate
folders for the same text and they're
all committed. [laughter] It's just like
why why did we get here? [gasps] I
THOUGHT WE GOT PhD level intelligence.
Instead, we just have absolutely junior
level execution. It hurts me. It hurts
me deep down. Anyways, so if you're
struggling out there using Claude code
and the you're holding it wrong message,
in fact, it did not help you. Why don't
you give this a try right here? Okay, go
check it out. Don't say I never told you
anything. Okay, cuz this is good. This
is good information right here. Okay,
this good thing you download, use now
name Prime Gen. Hey, is that HTTP? Get
that out of here. That's not how we
order coffee. We order coffee via SSH
terminal.shop. Yeah, you want a real
experience. You want real coffee. You
want awesome subscriptions. So you never
have to remember again. Oh, you want
exclusive blends with exclusive coffee
and exclusive content? Then check out
CRON. You don't know what SSH is?
>> Well, maybe the coffee is not for you.
[singing]
[music]
Live the dream.
Ask follow-up questions or revisit key timestamps.
The video introduces a technique called "Caveman" to reduce token usage and save money when using AI models like Claude. The conventional advice is often to improve prompts, but "Caveman" focuses on making the AI's responses more concise by eliminating unnecessary words, pleasantries, and hedging. This method can significantly reduce the number of output tokens, leading to cost savings. The speaker explains that "Caveman" involves stripping out filler phrases, avoiding unnecessary full sentences, and abbreviating common terms, while keeping technical terms and code blocks intact. The goal is to get straight to the point, as exemplified by the "So me think why waste time say lot word when few word do trick" phrase. The video also briefly touches on the inefficiency of agent programs having individual skill directories and promotes a coffee subscription service.
Videos recently processed by our community