Why Your AI Gets Dumber After 10 Minutes
1612 segments
Your AI is literally suffocating right
now. I'm not even joking. Let me show you
something
that's actually going to change how you
use Claude Code and Cursor. If you've been
using
ChatGPT as well, any AI system will feel
sometimes it's getting dumber after just
10 minutes
of coding. And what I've basically done
is created this nice little visualizer to
kind
of show us how this stuff works. All you
do is just run a simple query and you have
Like a 200k token context window
Agentagine You target to create more
and then it does some tool calling as
well and then comes back and actually does
some more thinking to make sure it's the
right thing and then produces some code
output.
And so as it does code output, depending
on whatever type of code you're doing,
let's just say it puts a bunch of tokens
here.
Then you say, oh, can you go ahead and do
this type of thing here? It's , yeah, go
ahead. Cool.
And can you also grab this other files
and do these other things?
So now that it's grabbing files from your
code base, that starts to fill up.
and so on.
And then it starts to fill up.
And so the agent does another 8,000
tokens worth of thinking, does a couple of
tool calls
and stuff and does some more code output.
And then it starts to output the code or
output the changes.
Now this token window starts to actually
fill up pretty fast.
And we're basically at 96,000 tokens out
of 200.
And you'll probably see a little thing
that says , yeah, you're using 48%.
You're , okay, what if I just want to do
another quick request?
So then you start to grab some more stuff
from your database.
You start to now connect all the front
end and the back end.
Your agent does some thinking.
You say, I wanted to think ultra hard.
So then you basically pass in the most
amount of tokens this.
And you're just for three different, you
know, if you're thinking ultra hard,
you're going
to use 32,000 tokens.
So that's 8, 16, 24, and then one more
this.
And then it does some tool calls to grab
some additional things.
And then it's going to try to do the code
output and boom, just that, you know,
With a couple of more requests, now
you're basically hitting almost 98% right
at the very end.
And as you can kind of see, this type of
workflow starts to add up super fast, and
we've only done a couple of turns.
And this is the point sometimes where
maybe Claude Code or Claude Sonnet or
maybe even Cursor itself will start to
kind of say,
Hey, we're going to roll the context
window. You're , dude, I just started the
conversation. Why is the AI getting
dumber?
and it's maybe it's not that the AI is
getting dumber but I think it's more that
the
context windows are kind of limited here
and that a lot of these apps in the
background are trying
to do different things for you to try to
maybe not take up so much space and on top
of that
there's this interesting study that I
want to go ahead and show you this was
produced from ChromaDB
and in this study they did all this is a
really lengthy report and I definitely
would
recommend you check it out so they did
this comparison called the needle in a
haystack
and so on.
One of these things is you can have these
large context windows from Google
or these other companies, millions of
tokens.
You're , okay, if I throw everything in
there, it should be okay.
The thing that I learned about this and
what I really want to share with you that
I don't
hear many people talking about other than
in these really deep reports is that when
you have a single question and you have a
single thing for it to search at, that
accuracy
is extremely high.
And so what that means for you and your
codebase is that maybe if you have one
specific
narrow task and you have this giant
codebase, maybe just giving it that one
thing is going
to make it much more accurate.
And that actual thesis was proven through
their research and this paper here.
And so whenever you have basically
there's this similarity bias and as this
type
of thing kind of goes out, it starts to
get a little more distracted when you
start to
give it more tasks.
This is kind of where we're going to
approach this thing called subagents and
kind of why this is such a big movement in
this realm.
It's not only about the context window,
but it's also about accuracy and achieving
the nice things that you need to do in
your code.
So we can kind of unpack this.
This is going to be a little bit more
theory type of things, but I kind of want
to do this a little bit more with visuals
because there's a lot to unpack here.
and I feel as a live stream I can answer
some of your questions back and forth
maybe some of the
stuff you've been running into and I'm
hoping that this can sort of influence and
change a little bit
of your behavior for how you prompt the
models and then how you can get the most
out of them so that
that way you could just be really buzzing
and kind of going I feel sometimes a lot
of people it's
really easy to use an off-the-shelf
framework or an off-the-shelf MCP that
does all these tool
So, what you'll actually end up
discovering is that maybe some of this
type of stuff is abstracting it a little
too much for you, and maybe it's kind of
going off the deep end kind of a big old
machine gun and just spraying and praying,
as the old gamers used to say.
So, in this Haystack question, obviously,
this is kind of what they're saying is, if
there's a bunch of irrelevant context,
this is the needle.
and whenever you have something called a
distractor, something that sounds similar,
this is kind of where it starts to kind of
get thrown off a little bit.
So that's sometimes maybe some of your
queries, you're not really clear at the
very beginning about what you want.
And then if you kind of give two similar
things, that's kind of where it starts to
say, maybe he wants this one and maybe he
also wants this.
And that if you're as confused, the AI
will also just be as confused for the task
when you hand that off.
So they say here, the best writing advice
I got from my college classmate was to
write every week.
and then here it says the the I think the
best writing tip I received from my
college professor was to write every day.
See this is the needle and this is
the distractor. All in the same paragraph
when you asked you know what's
the advice I got from my college
professor it's gonna see these two
things. So it's gonna be oh I've seen
that in the training set I see
these two things and you're wait which
one should I give back as the
answer because the college professor
they're asking about the college
professor. I'm well they said I should
write every week but also the
and so on.
So, you know, that's the best writing tip
that I should write every day.
And so that's, you know, that type of
what we call hallucinations can happen.
And this is kind of why it's a little bit
more important.
And this is kind of why I want to share
this type of information is because not
only did
they do the study, they actually went
super ham and they're , how many
distractors
can we have?
And what's the actual accuracy pool,
right?
they did this whole study and it's just
blown my mind right now.
So check this out.
If you have basically what they call four
distractors, the input token length,
when you start to kind of add it in, look
how fast it degrades as far as accuracy.
It goes crazy if you have four or more
distractors.
And so think about this in your own code
base, think of them as distractors, right?
Whenever you have code that's repeated on
all these different components and you
say,
go and update this type of thing, you're
, and you just YOLO it, it's going to
update
and so on.
And then you have some type of layout,
you know, you're supposed to kind of sort
of have
some hierarchy and some components you
reuse.
But it's , oh, I'm going to make another
progress bar over here in the admin panel
because a user requested it.
And it's , well, I have a whole directory
of components that I use from ShadCN.
You should just initiate one of those and
make it there.
AI will be really happy to fulfill your
request and just starts to actually code
up the button
and everything that it wants to do in
that separate area for the admin panel. So
now you
actually have the user side and you have
the admin side that have literally a copy
of each other's
code. And this is what I call a
distractor. And this is how crazy it is
and how important it is.
And Chroma is actually showing, it's one,
the accuracy stays really high if you just
have one specific thing you're trying to
achieve and you add two and then you add
four. And then
Think about what people are saying for
AI. This is kind of why people make that
term AI slop,
is because as an engineer, you'll kind of
start to see these patterns develop.
You're ,
well, the AI isn't really good because it
just keeps rewriting code. And I feel most
of
the time I'm doing as an engineer is ,
I'm writing more rules to guide the model
right now.
There is some good news out of this.
Every single year, these models have been
getting more and more
intelligent, so you don't have to guide
them as much with as many rules than I
previously did
and so on.
So, we're, models are probably going to
get smarter and the tooling is going to
get a
lot better to support these types of
things so that , you don't have to worry
about
this too much, but as of right now, the
tooling that we have that I use a lot is
Claude Code
and subagents is one of those types of
keys.
So I want to show you what that looks in
my content visualizer because I think
that's
going to be the key here for kicking off
subagents and trying to basically narrow
that scope
and so on.
So, you can have very defined things that
you do in your codebase, generating code,
generating UI components, working with
your backend, understanding how those
types of
things flow together.
So when I hit reset and I go down, when I
basically do a code generation task, you
can
kind of see how many tokens you take up,
right?
44%.
But then now when you spawn a subagent
down below, you'll actually see this type
of thing
start to take place.
And so you say, I'm going to go ahead and
add a request.
Now that subagent starts to take on that
request.
And then if there's another concern that
that request takes on, you just do add
request
as well.
And you can kind of see how the subagents
really start to unlock this whole type of
thing.
So you're basically getting this main
request up here, and then you're handing
off these
individual components, right?
So in this case, in my situation, I
started to make one specifically around a
design system.
And then I'm going to now try to make a
subagent for my convex database, because
the convex
and so on.
So, I want to make sure that when I do
make a feature request that it does pick
these
types of rules up so I can actually learn
how to interact with my codebase in
smaller
segments so that I can keep that context
window really tightly focused and that
just ensures
higher accuracy.
Because I'm even noticing that some of
the code generation, even though I have
these
rules files, it may not pick them up and
it may not pick them up into those
sub-agents.
So, I want to kind of share this as a
primer for y'all because I feel this type
of
and so on.
So, you know, that visualization hasn't
really been talked about too much and it's
still
very confusing because I see lots of
graphs and things.
But the best thing is literally just this
type of thing.
You just do a tool call, agent does some
thinking, you do some more input and then
the code,
you know, comes out and so forth.
So this stuff adds up pretty fast, as you
can kind of see just from a 200k token
context window.
And that's why it's important to start
new chats.
When you start a new chat and you want to
do new code generation task, you know,
this
is the type of thing you'll do.
and so forth.
And that type of stuff happens.
Now with these thinking models, I'll show
you real quick.
They will basically do after the user
input, they'll do some thinking.
You may generate some code and the agent
will come back to do some more thinking to
understand what it just did.
And that's sort of the power of Claude
Code and this is the power of the new
Sonnet models with the additional thinking
and the tool calling.
and this is also very transparent in
Cursor as well.
So let me just go ahead and catch up with
the comments
because I know a lot of people have been
trying to catch up here
and wondering how everyone's doing here.
So, alright, so let's see what's going
on.
I need to dial in my subagents more
around their tool calling access.
I've given them a set to inherit all AI
tools
even though I only have two MCPs enabled.
Eating tokens.
Yeah, that's what's up, man.
Right, guess what? After some stress and
work, I need to recharge seeing your fire
cooking apps and producing some sharing
contents every day.
Thank you so much, Alan, for becoming a
member and also kind of catching up here.
What if we were to tell the AI to
reevaluate its solution? Does it still
hallucinate?
Yes it can still hallucinate especially
if you have all of this conversation where
it can be thrown off So in this context
the conversation is remember that needle
in the haystack stuff
So if you're continuing the same
conversation, if you take some of that
context and pass
it on into a new chat, or if you're just
continuing on, you're going to keep adding
these distractors.
And these distractors in your chat, if
you're telling it to fix itself, are still
going
to be in there.
and so you start to basically for every
single turn in the conversation so that's
every single
hey go fix itself okay just change this
one not this one so you've seen that
vibe coding meme this is basically what's
happening is you're creating more
opportunities
for the accuracy to continue to go down
so the further you get in your chat the
more
it keeps actually going down in accuracy
and so at that point you just got to
create a
new chat and pretty much just focus again
and try to have it rebuild its context
because
Now you're going to basically just be at
this point you're just gonna be you're
doing
some complex debugging task right you're
doing this type of thing and you're just
okay user
input this the agent does thinking it
does some tool calling some more code
output you're
no try again think harder and it's okay
at this point bro anything that you've
done
before you've just introduced massive rot
and that's basically where you're at
you're just
just crazy crazy amounts of rot but it's
important to know that why why that maybe
Maybe that's happening and then at every
type of conversation that what I call a
distractor is going to keep throwing the
language model off until you basically
start a new window.
In Claude Code you can hit escape twice.
So hitting escape twice in Claude Code
will basically let you go back to that
conversation and truncate things out.
But I would just sometimes recommend just
starting a new window. Right now Claude
Code doesn't have anything visually that
should show you how much of the context
window you're taking up, which I feel is
super important.
Cursor has recently just introduced that
feature into their agent and it's not I
don't think it's live for everyone yet
but you can kind of see down below it so
shows context usage for 83% so that is
super duper valuable because once you
start passing that mark I'll show you
really quick in in chroma's paper here
there's you know this is a performance
over three models with 500
over 500k tokens so this is those 500k
tokens once you start to pass that
and so on.
But the accuracy rate, once you start to
go up in the percentage, right?
50% is your golden zone, really.
But when you have more distractors, that
goes down really, really fast.
So you're talking about 30%, right?
So if you have one dedicated thing that
you're saying you're going to do, you're ,
in this admin panel, I want you to do
this specific thing on this component, on
this,
and just add a for loop or something,
it's going to be really, really good
because it's
just that one thing, right?
and so on.
So, that's just what it is.
But once you start to continue the
conversation, it hasn't been going well,
fix itself and
try to reimagine and rethink everything
that you possibly can so that you can be
the best
version of you, AI, and then that's not
going to help.
That's actually not going to help it at
all.
It's been really good to back up some of
the vibes that we've been feeling with
actual
research to better understand what this
whole vibe shift is really about, because
I feel
that's what's been happening now.
Yo, how's it going, Vlad? Thank you so
much for joining on the stream. I've
started playing
around with Gemini CLI and not going to
lie, I'm impressed how good it is at
reinforcing
or refactoring. Oh, I'll have to try that
out again. I'm using Google Code Assist
plan
at 20 something dollars a month. Yeah,
I'm going to have to check that out again
because
I do want to kind of compare some of the
notes as well. How does it compare to
Claude CLI?
Yeah, I'm just curious as well. I took
the Cursor Ultra package, but in 20 days
limit
I was not even using Opus. Wow, lots of
agentic coding probably. I'm just really
curious how that works.
So are you able to select other models?
Would love to know more about that
workflow and kind of what's going on.
Yeah. OK, that's interesting. Well,
there's a lot kind of going on here.
Have you considered developing any local
Mac apps too, or are you mostly focused on
developing on the web for mobile apps?
MobileApps. So right now I'm thinking
about doing a Mac app and a Swift app, but
at the same time,
, it's gonna be more on demand because
right now with my app RayTranscribes, for
those who
aren't familiar, this is basically my app
right now. So my app is basically to just
transcribe
a lot of my videos and I have a lot of
videos and a lot of live streams and stuff
so they can be
hours and hours long. I found a hack.
It's basically you can get discovered
literally
Just by uploading your transcripts and
putting them into YouTube.
Because when YouTube processes them, the
keywords that you say
will get picked up when people search for
them. So if I do Claude Limits,
I bet you my video shows up from
yesterday or something that. Yep, see?
There it is.
So it's the second video, right? And I
could even do a private mode or something.
and then you'll see Claude Limits shows
up it's the second video and so in this
type of
thing it's my video is only 21 hours ago
and that some of the top results and
because of you know having those
transcripts in there it's super important
but I also have
timestamps in there as well so that's
super nice because Google likes to chunk
out the
timestamps to help with their AI
overviews they basically you're helping
their rankings and so
So any type of reinforcement that you do,
you basically start to get ranked higher.
It's sort of a secret sauce.
I need to write an actual blog post about
it.
But my Transcriber app basically does
this type of thing.
And right now, this is extremely
efficient.
You can have your own custom dictionary.
So I have these different words that need
to be in there so that Claude Code, o3 are
actually in the transcripts specifically
for these type of keyword things.
And so, yeah, when it's processing the
audio, it's doing its stuff here.
So this is a multi-hour stream.
I think it's a four hour, three hour live
stream.
So it does this stuff pretty fast.
And you can see all these transcripts
that you're able to kind of do the
things.
So I've done hours and hours and hours of
stuff.
And this is basically my little app here.
And I think if I do want to do this so
that,
I'm almost done with my refactor, by the
way.
So this started out as a bunch of slop
and this is kind of why I started to get
around this context rot is because
when I'm done with the refactor,
I can now make the iOS app and because
I'm using a database by the name of
Convex, for those who
don't know you may hear this a lot but I
just I'm a huge advocate of this company
because they're
just amazing. They have really great
database software. So one of the things
that they have
here is fix the bug and set complete, say
true. Okay so it's a TypeScript database
so
you literally you're just basically
playing around and this is the actual
dashboard for the database.
So as you make changes here it's
basically happening in real time on the
back end. So
As you add these different things, it's
all happening. So someone on their iPhone
can be
uploading, doing a transcript. Someone on
the web can be doing that. Someone on an
Android phone can
be doing that. And so this is kind of why
I wanted to make sure that everything was
kind of settled
on my backend side. And so when I build a
Swift app, I could just hit this database.
I have a
clerk for my authentication. Clerk has an
SDK on the iOS side. Everything would just
be a smooth
experience and it would just be super
duper fast to hit my endpoint, send all
that stuff up,
and if they're on the web, because I know
a lot of people who do content creators,
they have stuff on the phone, right? They
have a short they just recorded. They want
to
be able to do it from their iOS app and
then they just want to be on the web and
just copy and paste
that transcript and put it into YouTube
or something that or whatever other
workflow
they do and they can do it because of the
way I'm setting everything up. So yeah,
long story short,
that's kind of the way I was going with
this. I have yet to try the latest
versions of Swift that
and so on.
So, I'll be spending some time probably
in the next couple of months studying
that, figuring
out what's going on with all the latest
APIs and animations, because they tend to
settle
the dust for that right before the fall.
So yeah, that's kind of why I've been
waiting a little bit.
But yeah, I don't know.
I was thinking about doing some iOS
stuff, but I've been kind of trying to
stay away
from it for now.
I don't want to get in trouble.
What's your marketing strategy for
RayTranscribes?
I think right now is just literally just
trying to code with it and then tell
people about it.
I want to reach out to more content
creators to try the app. And if you want
to try the app right
now, you can actually try it with the
code RayCooks. So if you put in the code
RayCooks,
what that will enable you to do is when
you go here to Ray Transcribes in my app,
let me just put this up again as well. So
if you start any of the plans today, go
start today,
That'll give you basically $15 off. So
you can get into the beta for $5 basically
for the first month or just get into the
starter plan for free for the first month
and you can start cooking.
And I give a very generous amount of
minutes. I mean, there's 6,000 minutes for
the beta and 3,000, which is way more than
enough than anyone has.
And then, yes, obviously you have all the
cool stuff that you have here and so
forth. So yeah. Oh my gosh, check it out.
Dennis just became a member.
Thank you so much, Dennis, for becoming a
member.
If you want to become a member, you can
secretly slide in through my Discord.
I still have a member post up.
And right now what's happening is
basically people are able to slide into my
Discord for $6.99 right now.
It's kind of a secret perk, but it's in
my member post.
If you go to my member post, there's some
instructions there that you can send to
get into the Discord.
There's now over 100-plus people in my
Discord because I finally reached my first
100 members.
and so my promise then was to double the
price so that you know we have this
exclusivity for folks
who are early. So if you're watching this
right now it's still early so this is your
time to sign
up. Once the price doubles I'm going to
use I'm basically going to use Polar.sh
and so Polar
has a nice Discord integration that you
can basically hook up and it's super easy
to code
and a lot of the community stuff will be
built using kind of Polar and and these
types of
things and I'll be showing you this stuff
along the way which would be really really
cool. So
So yeah, that's kind of where that's at.
Ray Transcribe is built with Next.js, or
do you use another?
No, Next.js.
So I built my app, Ray Transcribes, with
Next.js.
And yeah, it's basically using a route
handler.
I use the API from there.
And I have some client-side work that I
do.
So I do some processing on the client
side.
And I'm transitioning that to Vercel so
that everything is going to be handled
through Convex.
So a lot of the pre-processing stuff will
be kind of minimal now on the client side
and it's going to move a lot of it to the
back end side.
So a lot more updates on that. But yeah,
Next.js as far as the thing, ShadCN for
some of the frameworks.
I think I have, let's see, Tailwind CSS
v4 so that I have this cool tokenizing
system.
I'm using Clerk for authentication. The
back end database is Convex.
By the way, when I launched the app, I
had no database.
It was literally a v0 slap-on, , hey, I
want to get some paid users. I sent Stripe
links.
People signed up with the Stripe links. I
went into Clerk and I just added their
as part of their getting started. I was
doing it all by hand. And it wasn't until
I was on
vacation in Croatia and I lost three
users who paid. Three users paid for my
app and I couldn't
and so on.
I started to get to them in a couple of
hours because I was remote and in a
different time
zone and I was , man, I didn't want to
set up the Stripe database, all that
stuff.
But I'm glad I did it actually because
now it's much more scalable and I can
actually
do more things, which I plan to do
anyways, but it's starting to actually get
traction,
which is cool.
So yeah, I'm going to reach out to more
content creators and people who are , if
you do
You got a lot of great minutes for the
costs and I'm kind of wanted to pass it on
as well.
Hi from France you make DiscoverCondex
thank you I use it for an upcoming
personal project I and I think you get I
don know 20 free projects Something
ridiculously cool for free So it great
I've been using your transcribed
platform, Ray. Loving it. How is the
architecture behind it? Using Groq for
Faster Whisper?
Yes, Groq is one piece of the story. So
part of the reason why I was using Groq is
because it's just super duper fast.
Two is I was actually setting up this
cool thing and I prototyped it and that's
kind of how Ray Transcribes got started.
I wanted it to be even faster than any
other transcriber that's out there, and I
kind of broke the record because I set up
a web socket, and I would literally stream
everything as if it was being processed
and being sent up in chunks to be
processed.
And so I had this cool workflow that I
don't think anyone really optimized, but
as a nerd, I was , this is going to be fun
to do.
So, yeah, it's a pretty nice, simple type
of thing that you can set up.
and that's one piece of it but then I'm
moving into more features now so I
realized that speed isn't really the big
thing it's more about just kind of
workflows so I can do more features if
I'm able to maybe fast isn't
really the big deal is more of just about
getting that transcript and
then doing other cool things with it
adding timestamps maybe having some
stuff where it writes you and you know
collecting these types of things to
I was trying to set up a personal project
with Groq but did not work streaming to
Groq.
Oh, you may want to check out the AISDK
from Vercel. That's one of them.
The other thing too, I think, is with
Groq there's a limit. You have a 25
megabyte limit to stream a file up.
So you'll have to do some chunking with
the file. That took me a long time to
figure out.
and then once I figured it out then I
kind of understood what to do from there
but basically you have to keep track of
those files that you chunk up and then
when they come back down for the results
then you'll be able to actually put
them together as one type of thing it
sounds more complicated but it took a
long time to figure out I guess I don't
know whenever I try to start a web
project all these tools generate and
suggest super base lovable v0 is there
any tool that use convex to let you
connect the way they integrate with
and SuperBase. I think the best tool
right now is called Chef from Convex. So
chef.convex.dev. I'm
going to put this into the chat here and
I'm going to show you real quick. So if I
do chef.convex.dev
I'll show you. And so this is basically
their Vibe code version. I think they
forked it from Bolt
and you can sign in with your thing and
you can you can make a Slack clone,
Instagram clone,
Notion clone, everything you would
normally do and it hooks it up right into
your database.
They have their own auth stack, so Convex
has their own Convex auth. You can later
hook that up to Clerc, which I love, and
it allows you to do way more, in my
opinion.
So yeah, if I sign in, I should be able
to sign in. Cool. Yeah, so yeah, that's
kind of Chef. I might do some videos on
Chef, because I do this.
But actually, I code so much in Cursor
and Claude Code that I just take the rules
files and I just find I have more, you
know, I could do more with it and so forth
that.
There's also a template that you can use
to get started.
So GitHub, I forked one and I want to
show you this and I'll put this in the
chat so that you guys can have it too.
So one of them, it's called, what is it
called?
I based it off this.
It is this one.
So this is from this guy.
This guy is elite.
and so on.
And he basically made the Elite Starter
Kit.
And so go ahead and click this one.
And so this one actually has everything
set up.
And this is what we based off the...
I have a four-hour livestream for members
only, and we started with this actual
repo.
So we started this repo.
I walk you through getting this stuff set
up with Qlirk, getting the secret keys
from
Qlirk input in, getting the convex keys,
configuring those two, configuring a
webhook.
and so on.
But afterwards, what we actually got out
of it was the Stripe stuff.
So you can actually sign up with Stripe
and it's actually built right in.
It syncs to the database.
And then it has a dashboard which you can
actually use to, you know, this logs in
and
all that stuff that you can use to kind
of just add your features and various
things.
So yeah, this is a great starter kit.
This is pretty much my stack as well.
And when I saw this, I was , bro, this is
exactly my stack.
I want to keep building stuff off of
this.
and so I basically forked this and I'm
going to update my own and add my own
rules and everything that that I normally
would use.
And then I'm going to try to see if I can
make my own fork have my actual Claude
Code subagents, the design enforcer.
I want to have it so that has this convex
rules type of thing. So my subagents will
be more accurate because the goal here is
really for me to just start a repo,
not have to create these rules files all
the times. And I have this nice groundwork
to start from and just start to just say,
OK, I want to add this feature.
I want to do this. Okay, I want to
connect to the new images API and just add
something so that it generates images ,
you know, what I was doing for my
thumbnail.
So I want to do stuff that and I want to
I need to have a good base set.
And so this would have it all hooked up
and I'm just I just go.
So that's the end goal.
It's going to start with my members first
content and then eventually I'll roll that
out to the general public.
My members are kind of helping me shape
the feedback, which is really great.
And so right now you can join in as a
member, which is super awesome.
and it's pretty busy right now in the
discord it's really nice I have these
different channels and everyone's
participating so I definitely appreciate
that I spent my night watching yeah that
video yesterday oh it's a four and
a half hour video but it actually could
have been a couple of hours but that
the last half of the video is literally
taking that 380 page PRD that we
generated there was some good prompting
that happened in there and I'll give you
the long story short but basically the
director's cut is I in the prompt I
and so on.
And I said, make sure to include the UI,
UX trees for all of the different
components
that we're going to make.
So when you're making this PRD file,
we'll be able to understand that.
So then I gave it my sort of prompt of
what I was doing, and then I gave it this
prompt to generate a PRD, and that
generated the 300 something line PRD file.
We gave that to Claude Code and Claude
Opus for 33 minutes in that live stream
generated
the code.
So that was the last half of that stream.
So the last half of that stream is
literally just us waiting for that whole
thing to generate
because we weren't really sure.
I thought it was going to generate
everything in one minute.
It generated the full-blown app from
everything.
So we took that template.
On that template, it built everything on
top of that, all these different trees and
everything.
It spawned off four sub-agents, and each
of those sub-agents went in this
procedural order
from working on the UI, working on the
backend.
It referenced all these rules.
So that was a really deep stream.
I'm pretty sure there's a just so there's
so much in there and I feel that's a
perfect members only stream.
I did go live to the public and then it
just went to members only afterwards.
So I want to do more of those at least
once a week or once a month for members.
And, you know, that was actually by
request from several members who wanted to
just figure out something to get started.
So that was our massive thing.
Don't stop believing. Became a member.
Thank you so much, man, for becoming a
member.
I really appreciate that.
You guys are the real MVP.
I really need to have a watch of the 4
hour videos this weekend.
Yeah, I'm also, I just hired an editor
because my goal is actually to get my
first 100 members
and so I achieved that goal which is
really awesome.
And so you may see Dan in the chat.
So Dan is going to be cutting up some of
the videos and I got to speak with them to
figure
out what parts we want to kind of, how we
can trim that video down to make it
really great.
My goal might be just, I might just
re-record the whole thing myself just
offline and that
way we get that video.
So I think today is July 29th and I've
been streaming every single day from the
beginning
of July 1st and the growth has been
incredible.
I've grown almost 40% of my channel.
So I started off with around 10,000 subs.
Now we're at almost 14,000 subscribers
here on YouTube, which is a huge
milestone.
This is amazing.
Thanks to people you guys who have been
joining in, but also thanks to everyone
showing
and the rest.
So, I'm just going to be coming up every
day, making this chat really fun and
participating
and helping ask questions to drive some
of the discussions, which has been great.
The second part of it is I wanted to have
my first 100 paying members, and we've
totally
blown past that.
I think we're almost at 135 or 140 paying
members, so you get access to the Discord,
and that has actually helped basically
paying for Dan to do some of the video
editing, because
I want to save some of your time as well.
So the members will get to see those
videos edited first, and then they'll be
released
and the public later on. So yeah, there's
some really good meaty topics in there.
There's a lot of stuff I want to cover.
But today's video was my sort of
a dry run of my recording that I'm going
to do for a context rot video. And so you
got to see a little bit of that. So you
got to see a little bit of so for those
who don't know, I mean, this is how
serious I take my craft is that I started
this video off with wanting to do a
context window visualizer. So I went into
v0 because I originally did this in
Eraser.io are one of these platforms
where you literally draw. And I felt it
didn't really
do it justice. So I just went into vZero
and just created the app. And it's just
much easier to have
a real time app that you can actually do
and just deploy it to production. And so
that way,
you're able to play around with this and
then I could share the link as well. So
I'm actually
going to turn this Vercel app into a real
website that you can actually go interact
with and start playing with and actually
use it to maybe teach other people about
context windows,
So hopefully, I think the best form of
learning is teaching other people. So if
you can take what I just explained to you
and explain it to someone else, then
you've really mastered it.
It's part of the reason why I to just
turn on the camera to try to explain these
things, because then it helps me solidify
these thoughts. And that's just so
awesome.
So yeah, appreciate y'all for first
chilling on the growth here. So I really
need to watch. Okay, yeah. If you design
in Claude Code, it will build it. Yeah,
yeah, yeah, for sure.
All the Claude Code vids are popular.
Yeah, I think a lot of people are trying
to figure out
kind of how to use it, how to best
utilize it. And right now, for me, I'm
taking the less is
more approach. You can see why this is
really valuable for the less is more
approach, because
any type of tool that you add, it's going
to a tool call will start to eat up the
window,
and then you add some thinking tokens,
and then you add some more code output,
and you add more
agent thinking. And yet, look at that,
we're just filling this thing up fast, you
know, and
and let's just say it spawns off all
these sub-agents.
Do you want to be that guy who's the 5%?
Who's actually eating up all the windows
and everything that and you're gonna get
that bill
from OpenAI or ChatGPT or someone,
or even Claude Code and be , yo, give it
up.
Pay up or we're not gonna give you any
more tokens.
We're , yo, yo, yo, yo, yo, man, chill,
chill, chill.
Chill, chill, chill.
It's not that in here, you know what I
mean?
But if you are one of those people, we
are here for you.
This is the AI Anonymous group and we're
here to support other people you. We think
that
you're misunderstood just the way we are
misunderstood. Yeah, I might have a little
bit of addiction to AI products, but I
think that's natural, right? This is the
new age that
we're in. So if you're in here, smash
that thumbs up, definitely drop the
comments because it's
super helpful for everyone in here to
support each other. And I really, I'm just
in love with
the community that's kind of showing up
right now. And this is , I this vibe. This
This is such a good vibe because we're
all trying to learn and share best
practices.
And I think this is a really great way to
kind of start on that track, if you know
what I mean.
So, yeah, it's been crazy exponential.
Yeah, I mean, you got something good
going.
I'm fixing to join. Yeah, for sure, man.
You'll see that AI updates itself paper,
AlphaGo.
I've seen something this where it
rewrites its own weights.
And I'm really curious to read that
paper.
I wish you grow this channel fast and get
some fun spin off more contents.
That the goal right now I think it to
kind of focus in a little bit and start to
produce some content And so far everyone
been doing a great job kind of
participating and joining in so I really
appreciate that A new member here went to
spam so I going to just go ahead and
double check my spam folder so I just
reached out to the recent people who
reached out to me and they were in that
spam folder so please forgive me for that
as well
Can you go more in depth on how to make
Grok streaming work?
I think many people are looking for this
info.
Nobody's talking about Grok, even though
it's the most insane tech for AI.
Yes. OK, cool.
You know what I'm going to do? I'm going
to do a couple of things.
I'm going to reach out to them because
I'm very close to the Grok team
because they're literally in Mountain
View, down the street from my house.
I'm in Los Altos and they're very nearby.
I'm going to see if they can hook it up
or do something where I can have a session
with one of their engineers and we can
kind of talk a little bit more.
Last time the QEMI model got released on
Groq, they hit me up.
They're , yo, this thing is ready.
Can we get an engineer on your stream?
I was , hell yeah, let's go.
And so I want to do more about that
because I feel ...
Let me just... man.
OK, if I get some time at the end of the
stream, my people who are here
to clean my room are going to kick me out
in a couple of minutes.
Let me just go through the rest of the
comments
and let me just show you how I get
started real quick,
because you could just literally throw
the documentation into v0 and have it do a
quick prototype for you.
And you're , you'll be you'll be halfway
there.
And that's how I've coded my app.
Yeah, that's what's up, man.
Yo, ChordsMaze, oh my God, with the super
chat, bro.
Yesterday, here's some updates.
OK, this OK, for those who don't know,
ChordsMaze yesterday dropped in the chat.
I call ChordsMaze Mr. Three Comma Club.
If you don't know about the three commas,
that means you're a billionaire.
This guy is a billionaire but from a
tokens perspective. This guy literally
yesterday dropped 5,703,144,183 total
tokens used in Claude Code, my friends.
That's 10K, 10 bands.
The Boy Cooks. And I also want to say as
a disclaimer, he's not proxying tokens.
He's
not running this stuff overnight. He's
not doing all the crazy stuff that
Anthropix says.
He's literally locked in 1000% and he's
just up, no touching grass, just
constantly
on this thing. I don't know what type of
app you're shipping, but I told you
yesterday,
Make sure you link that below because
you're on it and you have such a small
window.
You have less than 30 days right now.
The clock is counting down, ChordsMades.
This is why I started the AI Anonymous
group yesterday because I feel you're
misunderstood.
Anthropic is misunderstanding us.
They think we're using AI to do all this
stuff.
It's , no, we're just locked in, bro.
We're locked in this.
Let's go.
Need an app engineer to talk about in
five chips.
Hey, they're not going to talk.
PDF on the right.
CleanYourRoom, NiceFlexRay, love the
content bro. I'm just living my life. I
don't know if you if you live in the Bay
Area
Housing and everything is expensive, but
engineers are paid very well, right?
if you I worked at Apple for 12 and a
half years and I also invested a lot and
You know did pretty well at Apple
But at the same time now I'm kind of ,
you know, they let me go which was insane
You know, I was dealing with long COVID
and all that crazy stuff. They're just
bye. I'm , bro
I saved you half a billion dollars how
are you gonna do that to me they're just
let
us know when you get better we'll be
happy to have hire you back I'm all right
I see you
and then I see this AI wave and it's
literally that meme it's I'm checking out
with my chick and it's wait there's this
AI wave so I'm now looking this way I'm
let's go let's go so that's that's why
I'm here I started this channel and y'all
are
showing up and this is such an amazing
moment okay I love Grok Kimmy K2 is on
fire yeah Ruben
just became a member. Thank you so much.
Y'all are the real MVPs right now in July.
Congrats on your success. I'm going to
follow for that four hour stream after
this. Yeah,
check out that four hour stream. There's
so much sauce in there. Also check out
that members only
post to get into the discord, ask some
questions. I have some sections in there.
A lot of people are
starting to hop in right now and
participate. I thought my CC usage of 1700
was an API estimation
last month was impressive. No, bro. My
guy is locked in, locked in back there. I
think
I think KordzMaze just became a member.
Yo, KordzMaze, bro, I'm so excited to see
you in the Discord right now.
, you gotta pop in and drop that comment
and just say, yo, Mr.
And I'm gonna make a tag just for you in
my Discord saying, , you're Mr. 3, Claude.
So, , three commas is gonna be, , an
actual Discord tag for anyone who's ever
broke the billion token barrier, which
looks this dude.
But then he only, he didn't do one
billion. He did two billion. He did three.
Bro, bro, , my guy is locked in, has no
chill, bro. No chill. Five billions, bro.
Five billions.
How do I get into the Discord? There's a
post that I have. Just go to my YouTube
channel. You'll see members only.
That's where you'll see the members only
content as well on the YouTube channel.
There's a members only post.
And then there's just an email that you
send with your information. YouTube
doesn't provide emails or anything that.
They just show me your name and then I
think your channel.
And so if you just provide that in the
email, that I will be able to link you up
and stuff.
Orange hands go right.
So yeah, yeah.
AI is doing something to us for sure.
Day by day, raised vibes getting darker.
I used to go over 38 million tokens alone
last night, but only 100%.
Also I think there's somebody else
asking, I think if you remember, I might
just do
and so on.
So, I'm going to go ahead and show you
how I prototype because I got to roll out
in a
sec with v0 and stuff that.
Hacking Claude Code to get that Opus LLM
download for sure, man.
Yo, man, let me see if, let me just kind
of catch up with the other chats as well.
Just make sure I didn't miss anything.
And a lot of people kind of popping in as
well and making sure we're good because we
cover quite a bit today and I just wanted
to give you that quick AI context
visualizer
type of thing.
And I think the app is technically
deployed live right now.
I'm just going to go through some
changes, but you can kind of play around
with it right now if you're just cooking.
So, yeah, it may get taken down when I
redeploy the project, so just FYI.
But this is a great way to kind of
visualize and teach your friends.
I feel if you can teach this to other
engineering friends just the way I'm
teaching you,
it's going to help you go a long way in
terms of visualizing these things.
So the biggest takeaway I'd say is, with
your contacts, try to think about,
if you don't know and it's not really
clear what you're trying to solve,
I'm coining this thing called a discovery
prompt, right? literally just say, I don't
know where to go with this. Can you help
me? , I need to break this down so I don't
eat up the token context window. Just tell
it what your problem is, right? literally
it's the confession booth. I think AI is
our confession booth. So you sit down, you
tell it, and it's going to be , okay, let
me use my AI smart brain to kind of break
this apart into different pieces.
Once it comes back to you with that, then
start the new chat window, start the new
Claude Code window, start the new cursor
window.
Take one of those concerns and just pop
it in and just cook.
Just let it cook.
Say, in your first request, before you
pop it into these new windows, you should
also tell it the tooling you have
available.
So say you have access to subagents which
can take some of these requests or these
concerns,
researching, access to the web, searching
through codebases.
They can do these tasks in parallel. So
the more you describe what that scenario
looks , and then you can just transcribe
over, you know, what your problem is or
what feature you want to ship, the better
results you're going to get.
So Opus will then take that whole thing
and then write basically what it's going
to write to the subagents for you,
basically. And so that is kind of what
you'll take.
and you can just save that to a markdown
file so that you can either hand that off
to Claude Code.
You can see how this kind of keeps
compounding, but this is going to, in a
year from now,
this is not going to be that. In a year
from now, you'll just say whatever you
want. It's going to figure all that stuff
out for you.
But this is where we're at right now. We
have to do these little things piecemeal
because the tooling is currently just not
there.
And everyone's just learning as we go.
these studies that were done by Chroma was
recently just published.
and so on.
So, everyone had a feeling that this was
the vibe around AI, but they couldn't
really pinpoint
why.
And so, now that we have this good data,
the best practices, the engineering
efforts, everything,
every single company is going to add
tooling around this.
It only gets better for us.
So that's what's up.
Let's see.
I'm just kind of catching up some more
here.
So I'm struggling with Claude Code in the
Windows terminal.
Can't paste screenshots.
Does anyone know how to do it?
I'm on the Mac.
I can do it.
Are you using it in Cursor? Because when
I use Cursor, as soon as I drag it, I hold
shift.
But another person said you can copy and
then paste it in. And basically, as long
as you have
the directory, it'll go to that directory
and scan it. And it'll even ask saying,
hey, I'm
checking out your desktop. Is that okay?
So it will put it in single quotes. It'll
say this,
this, this, and then it should be okay. I
think when you drag it in the terminal,
let me even put those quotes in there for
you. And so play around with that. But as
long as you
You put whatever the directory path for
your screenshot in quotes, it should be
able to
know what that means and do something
about it.
Claude Code needs an interface to uncheck
context you don't need.
That'd be cool.
But at that point, I think the whole goal
of it is to be more hands off, which is
kind
of why they exposed subagents to us, so
that they can try to intelligently pull
that out
from whatever we're saying.
But I'd be interested.
You got to paste it into Claude Code by
dragging from the file from the directory
view.
Really hope Claude ends up in the IDE.
Yeah, that'd be so fire. How many years
you code?
Let's see. I mean, I've been in the
industry, shipping software, probably a
decade plus.
And as far as coding, a lot of that was
more early on. So I think initially I
started with,
what was it? C? Yeah, I did C code and
then C++ and then got into some Swifty
things, Python,
I did a lot of shell scripting and then I
got into Ruby on Rails, I got into PHP
because I needed tooling and dashboards
for internal tools.
And then less of that was... I started
getting more responsibility around
shipping software and then quality and
stuff.
So then there became less coding, more
understanding the business of apps and
quality and output and how that affects
users.
So that was kind of where I started to
spend more time in that arena.
and then later on was because I knew so
much of the stack all over the company
it was just fighting fires I would be the
person that would go to to figure out
what the heck happened so debugging was
my biggest skill it was okay I know what's
going on at this part of the stack you
know you know pulling all the people in
the room to figure
this stuff out as well so yeah so just a
wide range of experiences there's some
really
nice terminal windows that are better
than the standard Windows terminal but I'm
old school I
I just the standard ZSH shell. Maybe
throw some PowerShell or something on
there, but I'm good at that.
How do I check my token usage in Claude?
You just run bun x cc usage if you have
the bun or mpx.
So cc usage is the command and then bun x
cc usage is the one I do as well.
And that will do that. Or you can just do
mpx instead of bun x if you're a thing.
I gotta head on out and I will see you on
tomorrow's stream. Thank you very much.
Peace out.
Support.
Ask follow-up questions or revisit key timestamps.
The video discusses why AI models, especially in coding, often seem to "suffocate" or get "dumber" quickly due to the rapid filling of their context windows with tokens from code, tool calls, and thinking processes. It highlights a ChromaDB study demonstrating how "distractors" or irrelevant/similar information drastically reduce AI accuracy, particularly when multiple distractors are present. The speaker emphasizes the importance of using subagents to narrow the scope of tasks, keep context windows focused, and improve accuracy. Practical advice is given on managing AI interactions, such as starting new chats for distinct tasks, using "discovery prompts," and leveraging subagents for specialized concerns. The speaker also showcases their app, RayTranscribes, which helps content creators by transcribing videos and generating timestamps to improve searchability on platforms like YouTube.
Videos recently processed by our community