OpenAI Is Slowing Hiring. Anthropic's Engineers Stopped Writing Code. Here's Why You Should Care.
637 segments
Sam Alman, CEO of OpenAI, made a
confession recently. He shared that
despite being the CEO, despite having
the best access to the most capable AI
tools on the planet, despite his own
internal data showing that AI now beats
human experts on 3/4 of well scopeed
knowledge tasks, guess what? He still
hasn't really changed how he works.
Altman admitted at a town hall recently
that he still runs his workflow in the
same way. Even though, quote, he says,
"I know that I could be using AI much
more than I am." That's Sam Alman. This
is the strange paradox at the center of
AI right now. Something fundamental
shifted in December 2025. The people
closest to technology are calling it a
phase transition, a threshold crossing,
a break in the timeline. Andre Carpathy,
who helped build open AAI and has been
writing code professionally for decades,
says his workflow has now inverted in
just a matter of a couple of weeks. From
80% manual coding to 80% AI agents.
Ethan Mollik, the Wharton professor who
tracks AI adoption, has put it really
bluntly. Projects from 6 weeks ago may
now already be obsolete. And yet, most
people, including the CEO of Open AI,
haven't caught up. The capability is
there. The adoption is not. It's just
going too fast. Understanding this gap
and what to do about it is the real
story of January 2026. So what actually
happened in December? The shift was not
just one thing and I think that by
itself is part of the story because
previously I could point to this model
released and this was the change. Not
anymore. This is a convergence of model
releases. It's a convergence of
releases, orchestration patterns and
proof points that all together crossed a
bunch of respective thresholds in the
same compressed window. This is exactly
what AI accelerationists have been
telling us is coming. Change will happen
slowly and then all at once. This is one
of those all at once moments. Start with
the models. In the space of just 6 days
late last year, three frontier releases
landed. Google's Gemini 3 Pro, OpenAI's
GPT 5.1 Codeex Max, and then 5.2 came
out soon after that. And then also
Anthropics Claude Opus 4.5. All of these
models are explicitly optimized for
something previous models could not do
well. Sustained autonomous work over
hours or days rather than minutes. GPT's
5.1 and now 5.2 class models are
designed for continuous operation more
than a day of autonomous work. Claude
Opus 4.5 has introduced an effort
parameter that lets developers dial
reasoning up or down. And Enthropic has
priced it 2/3 cheaper than the previous
version. And now we have techniques like
context compaction from both OpenAI and
Anthropic that lets the model summarize
its own work as sessions extend so that
the model can more easily maintain
coherence again over longer time frames.
Are you getting the theme? Look, the
cursor team has tested these models.
Other teams have tested these models.
We're seeing reports come back of models
being able to do a week of work
autonomously and code up to three
million lines before coming back for
more. This is not the same category of
work as we were seeing even in September
and October of 2025. It's a new
category. Things have changed all at
once. And you know what? Better models,
as much as I like them, were necessary,
but they were not sufficient. The real
unlock came from orchestration patterns
that went viral in late December. The
first was Ralph, named after the
Simpsons character known for cheerful
obliviousness. Jeffrey Huntley, an
open-source developer way out in rural
Australia, grew frustrated with Agentic
coding central limitation. Models keep
stopping to ask permission or they
report progress and they're wrong or
they're overoptimistic. And every pause
requires human attention and often
you're frustrated because you're telling
the model the same thing. So all
Jeffoffrey did is he wrote a bash script
that runs claude code in a loop using
git commits and files as memory between
these iterations and when the context
window fills up a fresh agent picks up
where the last one left off and you just
wipe the previous context window and
keep going against that task. The
technique is embarrassingly simple for
an engineer. And while the AI industry
was building elaborate multi- aent
frameworks, all Jeffree did was discover
that you can just be really persistent.
You can repeat the goal. You can wipe
the context window and you're going to
get somewhere. A loop that keeps running
until tests pass is more reliable than
very carefully choreographed agent
handoffs. Venture Beat called it the
biggest name in AI right now and they
weren't wrong. The pattern spread
because it enabled you to do much more
autonomous work for a long period of
time. The second viral piece was Gas
Town. It was released by Steve Yagi on
January 1st. While Ralph is minimalist,
Gast Town is unabashedly maximalist.
It's a completely insane workspace
manager that spawns and coordinates
dozens of AI agents working in parallel.
And honestly, Gas Town is something that
reflects Steve Yagi's brain more than it
reflects a coherent enterprise agentic
pattern. But but but it's still relevant
because both patterns share the same
core insight. The bottleneck has
shifted. You are now the manager of
however many agents you can keep track
of productively. Your productive
capacity is limited now only by your
attention span and your ability to scope
tasks well. And then things kept
changing because in late January,
Anthropic shipped Claude Code's new task
system. And suddenly even Ralph looked
like a clever workaround to a problem
that now has native infrastructure. You
don't have to get Ralph anymore. CJ
Hess, a developer who stress tests new
AI tooling, was in the middle of a large
author factor when Claude codes task
system. He pushed it to its limits,
right? He created a massive task list.
He had it orchestrate sub agents to
execute the entire thing. And he reports
that it completely nailed it. And that's
weird. We're used to things where agents
fumble, where they don't get it done.
And in this case, a simple task system
that just looks like a to-do list was
what it took to coordinate agents across
a complex multi-agent problem. Now, to
be fair, the task list that Anthropic
released is more than just a simple tick
box. Under the surface, each task can
spawn its own sub agent, and each sub
agent can get a fresh 200,000 token
context window that's completely
isolated from the main conversation. So
you can have a clean focused job for
that sub agent. Let's say agent one is
digging through authentication code and
agent two is refactoring database
queries and agent three is working
through tests. None of them are
polluting each other's context or
getting confused by what the others are
doing because they don't know each other
exists which is the same insight Yaggi
had in Gastone. The old approach was
Claude trying to hold everything in one
long threaded conversation, remembering
decisions from earlier while
implementing new things and it just got
complicated and Claude lost the plot.
That still works for small stuff, but
for stuff that's complex, context
management becomes the bottleneck and
stuff falls through the cracks. The task
system changes that architecture. Each
agent focuses on just one thing. When a
task completes, anything blocked by it
then automatically unblocks and the next
wave of agent just magically kicks off.
So you can have between seven and 10 sub
aents running simultaneously and the
system just picks the right model for
the job. Haiku for quick searches,
Sonnet for implementation, opus for
reasoning. All you do is define your
dependencies and the system handles all
of that orchestration for you. Look, the
key innovation here is the realization
that dependencies are structural.
They're not cognitive. Without them,
Claude has to hold the entire plan in
working memory. And the plan will
degrade the moment the context window
fills up. You end up reexplaining over
and over to the agent. This is what's
done. This is what's left. This is what
depends on what. But when you
externalize the dependencies, the graph
doesn't forget and doesn't drift. you
never need to reexplain to the agent
because it never got stored in memory to
begin with. It's just a task sheet. It's
going back to Ralph was a bash loop
workaround to this same problem. The
task system is Anthropic's answer. It's
a native platform infrastructure for the
same capability and it illustrates how
fast things are moving. Patterns can go
viral and just a couple of weeks later
they're obsolete because they've been
absorbed into the platform. Cursor is
carrying the flag for very large running
autonomous projects. I've talked about
their project to build a browser and how
it took 3 million lines of code. They've
written about it extensively, but
they're not done with the browser.
Cursor is running similar experiments
using AI agents to build a Windows
emulator. They're building an Excel
clone. They're building a Java language
server. These are big code bases. They
range from half a million to one and a
half million lines. They're all being
generated autonomously. Now, the point
here is not that cursor is immediately
going to start shipping Excel and
competing with Windows. The point is
that they are proving that autonomous AI
agents can build complex software. At
Davos in late January, Dario Amade
described what he called the most
important dynamic in AI today, the self
acceleration loop. And it's important
that we understand it. He said, I have
engineers at Anthropic who tell me, I
don't write code anymore. I let the
model write the code. Now, we've heard
that on Twitter a lot and the mechanism
is simple. But the fact that Anthropic
is doing it is really important to
understand because fundamentally they
are accelerating the production of the
next AI systems using AI. AI has entered
a self acceleration loop. This is also
why OpenAI is starting to slow hiring.
Just this past week, Altman announced
that OpenAI plans to dramatically slow
down hiring. And he said he did it
because of the capabilities and the span
he sees from existing engineers. Now,
they're not stopping hiring altogether,
but one of the things he shared is that
the expectation he has for new hires is
now skyhigh because of what AI tooling
can give. He said literally they're
having new hires sit down if you're in
the interview loop. and he said, "We're
asking them to do something that would
normally take weeks using AI tools in 10
or 20 minutes." That's a reasonable
request. I've shared earlier how you can
use Claude in Excel to do weeks worth of
work in 10 to 15 minutes. This is the
reality of work in 2026. And what Sam is
choosing to do is responsible because as
he said, he doesn't want to have awkward
conversations and overhire. He would
rather hire the right people, keep them
around and expand their span with AI
tooling. The numbers behind this
decision come from OpenAI's own
benchmark GDP val. It measures how often
AI output is preferred over human expert
output on a well scope knowledge work.
And we see the tipping point hitting
around this same time that the last few
weeks of the year of 2025 because GPT
thinking tied or beat humans only 38% of
the time. That was the model in the
fall. GPT 5.2 Pro which was released
much more recently at the very end of
the year early this year reached 74%. It
doubled. So on three quarters of scope
knowledge tasks the AI is now preferred.
And that is you can read that as a
general pattern for cutting edge models.
Now it's not just chat GPT. And as Sam
put it, right, if you can assign your
co-workers something that takes an hour
and you get something that's better than
what a human would do 74% of the time
and it's taking vastly less time, it's
pretty extraordinary feeling. And this
brings us back to the paradox. If models
are beating human experts like this on
scope tasks and doing it faster, why
hasn't work transformed more? Why is the
CEO of OpenAI, Sam himself, still
running his workflow, as he says, in
much the same way? This is a capability
overhang because capability has jumped
way ahead and humans don't change that
fast. Adoption hasn't. Most knowledge
workers are still using AI at I would
say a chat GPT 3.5 chat GPT4 level. Ask
a question, get an answer, move on.
Summarize this document for me. Please
draft this email. They're not running AI
agent loops overnight. They're not
assigning hour-long tasks to their AI
co-workers. They're not managing fleets
of parallel workers across their
backlog. The overhang explains why the
discourse feels so disconnected. Why it
feels like you have constant jet lag if
you are living at the edge of the
capability and you're going back to look
at how work looks today. Someone running
task loops in Anthropic or Ralph is
living in a different technical reality
than someone who queries chat GPT four
or five times a day even though they
have daily access to the exact same
underlying tools. One person is seeing
the acceleration, everything happening
all at once, the other is seeing
incremental improvement and wondering
why AI is such a big deal. This creates
a very temporary arbitrage. If you
figure out how to use these models
before your competitors do, if you can
get your teams to do that, you have a
massive edge. And if you're waiting for
AI to get smart enough before changing
the workflow, you are already behind and
you're showing that you're not using AI
well. So what does closing this overhang
that's developed especially in the last
few weeks look like? What are specific
skills that power users describe? Well,
a few patterns emerge. Number one, power
users that are really on the edge are
assigning tasks. They are not asking
questions. When you treat AI as an
oracle, you are in the wrong mental
model. The shift is very much toward
what I would call declarative spec.
Describe the end state you want. provide
the success criteria and let the system
figure out how to get there. This is
sort of a pro postprompting world. It's
still prompting, but it looks a lot more
like a specification. Number two, accept
imperfections and start to iterate.
Ralph works because it embraces failure.
The AI will produce broken code, so
we're just going to make it retry till
it fixes it. And it never gets tired and
it keeps retrying. And you go and make
coffee or lunch and you come back and
it's done. This requires abandoning the
expectation that AI should get things
right the first time. It often won't and
it doesn't matter because it doesn't get
tired. Third, invest in specification.
Invest in reviews. Invest less in
implementation. The work is shifting.
It's less time writing code. It's much
more time defining what you want. It's
much more time evaluating whether you
got there. This is a real big skill
change. Most engineers have spent years
developing their intuitions around
implementation and those are now not
super useful. The new skill is
describing the system precisely enough
that AI can build it and then writing
tests that capture the real success
criteria and reviewing AI generated code
for subtle conceptual area and then
reviewing AI generated code for subtle
conceptual errors rather than simple
syntax mistakes. The errors get very
interesting here. Maggie Appleton is a
designer who's been analyzing these
tools for a bit. I think she puts it
really well. When agents write the code,
design becomes a bottleneck. And so the
questions that slow you down are less
and less about the details of code
syntax. They're more and more about
architecture, about user experience,
about composability. What should this
feel like? Do we have the right
abstraction here? These are the
decisions that agents cannot make for
you. and they require your context and
your taste and your vision. I will say
the speed is dangerous in and of itself.
Watch out for the foot gun. You can move
really really fast with AI agents and
you can forget how much trash you are
putting out there. To be honest, if you
are not thinking through what you want
done, the speed can lead you to very
quickly build a giant pile of code that
is not very useful. That is a superpower
that everyone has been handed for better
or worse and we are about to see who is
actually able to think well. Yes, it is
time to use multiple agents in parallel.
That's another lesson. It's it's
transformative because every single one
stacks your capability. Some developers
are going from a few PRs per day to
dozens. The constraint moves from coding
to coordination. How can you scope your
tasks? How can you review outputs? Etc.
Fundamentally, even if it's tricky and
you have to figure out what review looks
like in this new world, this is where
we're all going because of the
multiplicative effect of agents pointing
in the right direction that you're
thinking about where you're going. Just
stacking up on top of each other and
solving multiple tasks at once. And this
includes letting agents run all the
time. Ralph was designed for overnight
sessions. Define the work, start the
loop, and go to bed is a new engineer's
day. Now, of course, this only works
with proper guardrails, but when it
works, you're getting productive hours
around the clock from time that was
previously idle. And look, the last
thing from power users, which I think is
true, is you got to actually try it.
This sounds incredibly obvious, but it
is the main barrier. Most people haven't
run an agent loop for more than a couple
of minutes, and the models improved a
lot in December. If you have not
revisited your AI workflow since, you're
probably operating on stale assumptions
about what is actually possible. To be
honest with you, the shape of work
itself is changing. Andre noted
something really important about the
errors that current models make. They're
not these simple syntax errors. And he
thinks, and I think he's correct, that a
hasty junior developer would make very
similar conceptual errors to the quality
of errors the models are making now. And
that's a good thing. It means the models
are getting stronger and getting to the
level of a junior developer because
they're making wrong assumptions and
they're running without checking.
They're failing to surface trade-offs
sometimes. Those are things that junior
developers do. These are supervision
problems, not capability problems. And
the solution isn't to do the work
yourself. It's to get better at your
management skills. You do have to watch
the agents, but if you do, you can catch
the moments when you've implemented a
thousand lines to solve a problem that
could have taken a hundred. And this is
something where to be quite frank, our
technical teams need to level up. So
they're able to do this kind of
management of agents and they're able to
write evals that test correctly. There
are evals you can write that test
whether the agent is writing a simple
enough solution for this problem. Those
are the kinds of eval we need to think
about, not just traditional functional
tests. This is what Sam means when he
talks about being an engineer changing
so quickly. You're not spending time
typing. You're not debugging. You're
spending most of your time, frankly, as
a manager. And yes, we should be honest,
the ability to code manually is going to
start to atrophy as a skill set because
you're just not using it as much.
Generation and discrimination are very
different skill sets, and you're using
those every day. This is not a failure
and it's not something to be embarrassed
about. It's a reallocation of very
scarce human cognitive resources toward
a skill that has higher leverage. Now,
this obviously leads to a debate. How
close should developers stay to the
code? There are widely differing
opinions by senior developers here and I
would argue that I think the right
answer is a function of what you are
building. If your risk tolerance for a
mistake is very low, you are going to
have to watch the agent coding in an IDE
and write your eval super carefully if
you want to leave it alone. If you are
trying to write really good front-end
code, that is more complicated right now
than backend code because defining what
something looks like remains a
challenge. But if you're willing to
experiment, if you're willing to
iterate, if it's a green field project
and it's a prototype, you really can
step back. And so I think what this
calls for is another level of
abstraction from engineering. We need to
think as technical leaders about where
engineers should stand in relation to
the code based on the risk profile of
that codebase itself. That becomes
something we can intentionally set as a
policy for teams. Hey, this is
production. This is not something we can
mess up. This is our expectation as
leadership for how you code with agents
against this codebase. That is something
we're going to have to start to do
because otherwise it's just going to be
a free-for-all and everyone will make
their own rules and you're going to get
all sorts of issues in production. So
where does all of this leave us? The
December convergence of models of
orchestration patterns of of tools like
Ralph established a new baseline. Models
can now maintain coherence for days.
Orchestration patterns do exist that
manage fleets of agents and the
economics absolutely work. This doesn't
mean you have to use Ralph specifically.
The point is that the problems these
tools wrestle with are fundamentally
different and point to a very rapid
change in how we work particularly in
technical domains. If you're wrestling
with context persistence and parallel
coordination and those problems suddenly
get an order of magnitude easier which
they did because of exactly what I've
described around how we handle tasks and
workflow and more capable models
designed to run long running work
patterns. Well, suddenly it's like the
ceiling lifts. Everything gets an order
of magnitude easier when you're building
big stuff. And the overhang that that
generates when this happens all at once
is real. If Amade is right and AI can
handle endtoend software engineering
tasks within 6 to 12 months, then the
gap between what we are doing today and
full automation has never felt larger.
If the overhang feels big after the last
few weeks, as you listen to what I'm
describing here, the overhang is only
going to get bigger because AI is
continuing to accelerate. Look at how
quickly Anthropic was able to turn
around and ship co-work just 10 days.
Look at how quickly they turned around
and shipped their version of Ralph that
was more natively integrated. Yes, the
people who are building this moment
sometimes aren't fully into it yet.
They're still moving their furniture
into the new AI way of working to use a
metaphor. Sam Alman admitted that about
himself. But the future is here now. And
if you can get through the overhang and
start to accelerate into a world where
you are asking the AI to do big tasks
for you, you're moving from prompting
with questions to defining
specifications. You're running multi-
aent patterns. This is going to
fundamentally change your day. If you
have not seen, on a personal note, if
you have not felt the power of having
five or six clawed code windows up on
your screen at once, it's it's hard to
get past it. Like there's nothing like
how fast you feel you can go. And the
future belongs to people who know how to
handle that speed responsibly and be
thoughtful with it. the overhang is
going to continue and the benefits to
those who can get over it are just going
to get greater and greater and greater
because these are exponential gains that
we're looking at. Every single agent you
can run in parallel multiplies your
productivity. And so this is the future
we're looking at. A future made not by
one model maker or not one breakthrough
but a collective phase transition where
model capabilities as a whole over the
last couple of weeks, five, six weeks
have moved us from a world where it was
kind of irrational to run a dozen agents
to a world where if you're not running a
dozen agents doing autonomous tasks for
days at a time, you're behind and things
are only going to go faster from here.
Good luck. I have a full write up on
this on Substack. Let me know where you
have questions and we'll all get through
it
Ask follow-up questions or revisit key timestamps.
The video discusses a significant shift in AI capabilities occurring around late 2025 and early 2026, characterized by a 'capability overhang' where AI's potential far outpaces human adoption. Despite major model releases from OpenAI, Google, and Anthropic that support autonomous work for days at a time, many users—including Sam Altman—have yet to fully transform their workflows. The narrative explores new orchestration patterns like 'Ralph' and 'Gas Town,' the move toward native task systems that manage parallel sub-agents with isolated contexts, and the emergence of a 'self-acceleration loop' where AI is used to build the next generation of AI. To bridge this gap, technical workers must transition from manual coding to a managerial role focused on high-level specification, evaluation, and coordinating fleets of autonomous agents.
Videos recently processed by our community