I Spent 200 Million Tokens Vibe Coding With Gemini 3.1 Pro
595 segments
video,
I'm going to be sharing with you what I built yesterday
after working for over 17 hours and spending two hundred
and fourteen point six million tokens on the newly released
Gemini 3.1 Pro.
So the first thing I want to talk about is the benchmarks
with this model.
And I'm going to be showing you guys the benchmarks and I'm
also going to be showing you what I built with it yesterday
after spending this amount of tokens.
So let's just dive in, take a look at the benchmarks.
One thing I want to say is I have been absolutely blown
away by how this model has performed so far.
I was expecting the model to release yesterday,
but I wasn't expecting it to be this good.
It's a massive jump from Gemini 3 Pro to Gemini 3.1 Pro.
So if you look at the Arc AGI 2, look at this,
31.1% to 77.1%.
That tests abstract reasoning puzzles.
Humandy's last exam, same thing, 7% jump,
6% jump respectively.
Live Code Bench Pro, 2,439 to 2,887.
And then Sweet Bench Pro, a very large jump of over,
it was 4.4%.
So Sweet Bench Pro,
it does score under Opus 4.6 by a little bit.
But one thing I want to say is that there are areas where
this model excels in coding and I'm going to be sharing
with you guys a couple of the instances that I had
yesterday where this model just performed incredibly well
in real world vibe coding tasks.
So with that being said, guys,
I do have a like goal of 200 likes on this video.
And if you haven't already joined the fastest growing vibe
coding community on the internet,
make sure you check the link in the description down below,
as well as in the pinned comment and join the bridge,
my discord community.
And with that being said, let's dive right into the video.
All right.
So the first thing that I want to cover is a little bit of
the results from the storage bench.
So I have put it through the creative HTML tasks.
So for example,
here is what it did for the space invaders demo.
So you can see this is what it came up with.
And what I will say is that this model,
the capability of the UI functionality of it,
its capabilities to be able to write unique in modern UI
elements is very, very noticeable.
Okay.
So you can go over here, right?
And you can go, let's just go to the lava lamp.
And I actually already have it pulled up.
So this is the Gemini 3.1 pro lava lamp.
And this is the Opus 4.6 lava lamp.
And you guys can make your judgments of which you think is
better, but I can notice right off the bat, Hey,
this is better.
This is a better lava lamp than this.
And this is Opus 4.6, right?
So now like, let's go a little bit past,
you guys can go check out the bridge bench.
I haven't yet put it in the leader board yet for the
benchmark results where we put it through 130 tasks
associated with vibe coding,
but I have put it in the creative HTML.
You guys can go check it out at bridge of mind.ai.
But I want to show you that what I did yesterday and how I
spent all of these tokens.
So the first thing is I had it completely refactor pretty
much probably I would say like 20 to 30 pages on the
website.
And I'm going to show you just a couple of the highlights.
So first of all, do you guys see this video?
This video was created using Gemini 3.1 pro.
And it was created using Gemini 3.1 pro and remotion.
And all I had to do was use cursor and say,
look at the website and look at the products in the bridge
mind vibe coding suite and create a marketing video that is
accurate and represents the bridge mind brand and theme.
And it came up with this.
And I would say that, Hey,
this is where you guys can see that as these models get
better,
the capabilities are going to go beyond just coding, right?
We're creating marketing videos now.
So that's one thing that it created.
Another thing is I want to go over to the bridge MCP and
show you guys a really interesting example.
So if you guys see this,
all this entire UI was created by Gemini 3.1 pro.
And if you look at this,
you guys see how the open clock codex cursor,
Claude windsurf, prior to using Gemini 3.1 pro,
I didn't have the actual brand assets for each of these
brands.
And it was just like placeholder icons.
But I told Gemini 3.1 pro,
I want you to go on the internet and I want you to actually
go grab the actual logos for each of these companies and
then create a unique customized component for this.
And this is what it came up with.
It was able to actually go and grab the logos off the
internet, which I think shows a lot more than just the,
you know, if you go back to the benchmarks,
that example right there is showing the intelligence
capabilities of it knowing and being able to understand,
okay, I need to go here and I need to, you know,
look at the header and grab, you know,
go to the brand assets and download this file.
And then I need to copy it over to the project and I need
to input that PNG here.
And it did it flawlessly.
You guys can see this looks great, right?
So it also was able to create another marketing video.
So this all the entire UI,
the marketing video that you're seeing was created using
Gemini 3.1 pro.
You can see this here, all of it, Gemini 3.1 pro.
This animation was created using Gemini 3.1 pro very,
very good animation,
reflecting the Kanban capabilities of the bridge mind MCP.
So let's go back up.
And another thing that I want to show you guys is just like
these other pages, right?
So look at this animation that it created.
So this used three JS and it was able to create this unique
animation that shows the capabilities of bridge space and
its ability to run 16 agents in parallel.
It created this unique component here to kind of give it
that fresh, unique look.
And here's what we're seeing from Gemini 3.1 pro in terms
of styling.
Gemini 3 pro is already good at styling.
This is a step up 100%.
I noticed the capabilities of this.
This is incredibly good at styling.
I'm not going to be using Opus 4.6 for my styling ever
again.
This is the go-to model now for styling.
Okay.
And I'm going to get into backend and database in a little
bit because I did try it there.
But I do want to continue to just show you guys what it was
able to do.
Look at this.
Another video, right?
Bridge space.
And it was able to create a unique video showing the
capabilities of bridge space for marketing purposes.
It was even able to improve the performance by compressing
the video so that it rendered faster and improve my site
speeds.
Okay.
That's another thing.
Bridge voice.
Same thing.
Look at the three JS animation.
Look how unique it is.
Look at the like, just look at what it's doing, right?
So a lot of people, they say, Oh, your,
your website looks vibe coded.
It's not unique.
And that was before we had this model.
I just went through and I revamped the website.
I had it create unique custom components.
And I've just been very,
very impressed on what it's been able to do for me.
I had it rewrite the pricing page here.
And this is what I'm seeing.
This is a very, very good model in terms of UI.
And a lot of people, they say, Oh, Gemini 3 in,
in just Gemini models, they can't be used for it,
for backend purposes, right?
They're great at front end, but not backend.
And this is one example I'll show you.
So I'll go out to cursor and I can't like,
it's not a great example,
just cause I'm showing you guys the conversation,
but I used Gemini 3.1 pro to be able to go through and
completely refactor my auth system.
I had an issue with my auth system where it was just a
complicated issue.
Okay.
I was using Opus 4.6.
It was like 1AM in the morning last night.
I was using Opus 4.6.
I was throwing everything I had with it with Opus 4.6.
It couldn't get it.
I give Gemini 3.1 pro the issue.
I put it in plan mode and cursor.
I have it generate the plan and I run the plan and it was
able to refactor the entire auth system across the API,
the bridge mind web app, the bridge mind admin portal,
and the bridge mind UI, four different repos.
It was able to refactor the entirety of the auth system,
backend, front end, auth guards, complex logic,
and be able to completely refactor it in a sensitive way
where it completely fixed the issues that I was
experiencing with the auth system in one shot.
And I can't really show that to you guys because it was
just something that I experienced offline,
but I want you to know that that is what I experienced.
So we'll see in the coming days if that continues.
But what I will say is that I'm very impressed so far with
that.
But with that being said,
I think that gives you guys like a look at like just some
of the examples, right?
Even on the bridge bench, this was all refactored.
The styling here was rewritten by Gemini 3.1 Pro.
All of this stuff is done.
It was redone by Gemini 3.1 Pro.
It completely revamped my website.
And then I even used a, it was called a copywriting skill.
So I created a copywriting skill inside of cursor and I
gave the skill to Gemini 3.1 Pro and I had it rewrite all
of the content on my website so that it better fit my
brand, right?
So even like bridge code, your terminal, your AI teammates,
if you go to bridge space to run 16 agents in parallel,
right?
It was the one doing the copywriting for this.
So it's also good at writing.
So if you are a vibe coder, hey,
if you're doing copywriting,
just create a skill in your brand voice and Gemini 3.1 is
just going to do a phenomenal job.
That's what I'm seeing.
So now that I've covered a little bit about what I did
yesterday, and there's a lot more, I just can't show it.
Like I literally can't show you guys everything that I did
yesterday would not be possible.
I did so much, but those are just some of the highlights.
Okay.
Now I want to get a little bit into what we're seeing on
the benchmarks in terms of speed.
And this is like one thing to definitely highlight is that
this model is very, very fast.
If you look at artificial analysis, the speed here,
106 on artificial analysis compared to Opus 4.6 at 73.
And then, you know, GPT, GPT, where even is it right here?
85, but look at open router.
This is actually a better place to look for speed.
Look at Google Vertex.
This is what I'm seeing 60 tokens per second on Google
Vertex.
That's the best place to look.
And then if you compare that to Sonnet 4.6,
so you're looking at 42 tokens per second there.
So it's about a 50% improvement, and that's big.
That is noticeable.
I noticed the speed improvement.
So know that it's a big speed improvement.
And then when you look at the cost,
this is another reason to be using this model.
Look at the cost $2 per million on the input,
$12 per million on the output compared with Opus 4.6 at $5
and $25.
It's like, Hey,
does it make sense to use a model that is highly performant
at half the cost, more than half the cost?
Yes, it does.
So there's a lot of people that because they've had bad
experiences with Gemini models,
they don't want to use them,
but I want to draw your attention to probably one of the
biggest improvements that I've seen with Gemini 3.1 Pro.
So if we add this model here, let's just add it real quick.
Check this out guys.
Look at the artificial coding index.
It ranks number one on the artificial analysis coding
index, which is a very, very important benchmark.
Look at this 56 compared with GPT 5.2 49 Opus 4.5.
Can I add Opus 4.6 to this list?
I think I can.
I know I can, but it's, this is having an issue here.
Here,
let's scroll all the way up here and then add it Opus for
this artificial analysis being so annoying.
It's they, they definitely vibe coded this up.
So I can't add in Opus 4.6 just because they have this bug
here with the scroll bar, but I think, what is it?
I think it's like 53.
It definitely does beat it out.
I think I have it on my X actually.
Hold on.
Let me pull my extra real quick and then let me go to this
so you guys can see it here.
So where is it?
It's right.
Hold on chat.
Okay.
Right here.
All right.
So yeah, here it is.
So Opus 4.6 got 48.
It got 48 on the artificial analysis coding index in Gemini
3.1 preview 56, which is just insane.
And I have another benchmark that I want to look at and
it's this one here.
So this is another one off of artificial analysis.
This is one of the biggest benchmarks to look at because
this is what measures the hallucination rate.
And one thing about Gemini models that a lot of people have
struggled with is that they've had these incredibly high
hallucination rates,
which means that the model hallucinates more and it just
runs into more issues, right?
You ask it something to do,
and it goes off it on a bunny trail that you never wanted
to do.
It misunderstood your prompt, right?
Look at how Gemini 3.1 Pro performs.
50%.
This is the lowest hallucination rate for a frontier model
that we are seeing because if you look at Opus 4.5 at 58%
and you look at GPT 5.2 at 78% and then you look at Gemini 3
.1 at 50% they are getting this hallucination problem
figured out and the biggest thing I look at is even the
comparison from the last iterations of the 88% to 50%.
Google is doing something like they do not count Google
out.
They are going to do very well and if you like just because
they had bad models that had high hallucination rates
previously you have to stay up to date with this stuff and
this is why.
The hallucination rate I was noticing it.
It's like when I had this one when I had it one shot that
auth issue across four different repositories.
I noticed that right.
Opus 4.6 couldn't solve my problem.
Gemini 3.1 Pro did.
So take note of these benchmarks because they are very
important.
Another important benchmark that I am going to take a look
at real quick and it is not a great benchmark to look at
but I do want to share it and just kind of share my
perspective on it.
So here is Gemini 3.1 Pro in LM Arena.
You can see that it is performing sixth but it is a little
bit preliminary.
So scores based on pre-release testing and may shift as
community prompts and votes evolve after public launch.
So it is a hundred behind Opus 4.6 which is a massive
difference right.
But one thing I will say is that we need to give it time
but that is one thing to notice is LM Arena that is where
models get put head to head.
It does not perform that well in that.
It also has not performed very well in design arena.
So if we look at Gemini 3.1 Pro it scored at 1321 which is
way behind Opus 4.6 at 1392.
So again this is very preliminary you know with some of
these benchmarks it is people voting right.
So you have to give it like a week to get a really good
accurate representation of the model.
But one thing I will say is that my personal experience
with the model and being able to create you know these
unique UI components has been very impressive so far.
Like you guys can look at this and it is like yeah I mean
the benchmarks and it being behind in you know LM Arena or
it being behind in design arena.
I think we need to give it a little bit of time because
what I've seen so far is it's very I'm very impressed.
It's very hard for models to get the bridge mind stamp of
approval.
I don't give it to a lot of models.
You know Opus 4.6 has the stamp of approval.
Sonnet 4.6 has the sample of approval.
GPT 5.3 Codex has the stamp of approval.
And I am going to give the bridge mind stamp of approval to
this Gemini 3.1 model.
I am going to be using it in my vibe coding workflows
because the capability of it to create unique UI
components, it's writing capabilities,
it's instruction following.
I've been very impressed and I'm going to be using this in
my daily vibe coding workflow.
Now I want to actually put Gemini 3.1 Pro through my vibe
coding workflow and I want to do this inside of anti
-gravity.
So I haven't used it here that I used it for fixing a video
preview bug and then I also had it redesign that bridge
bench page that I showed you guys.
I have not used it since yesterday for probably like a
couple months and the reason for that is that too up to
this point every time that I used anti-gravity it had a
bunch of issues but I did use anti-gravity yesterday and I
was thoroughly impressed by its capability to use the anti
-gravity browser tools.
I've never seen anything like it.
They have significantly improved this and I want to show
you guys what I saw yesterday.
So the first thing I'm going to do is I'm just going to add
birds mind UI.
I want you to navigate the website particularly the blog
page and I want you to review the blog page and come up
with improvements to the UI for the blog page as well as
the blog ID pages.
It needs to be very modern and unique.
So let's just drop it in like just like this right.
So it's going to probably open that up and you guys are
going to see this in a second and again I'm using bridge
voice right now.
Bridge voice is one of the tools in the bridge mind suite
of projects.
This is the this is like pretty much a near perfect
transcription time for basically voice to text.
It's faster than whisper flow.
It's cheaper than whisper flow.
I highly suggest that you guys use it so you can see here
that it immediately starts this and it's going to run anti
-gravity in the browser and be able to navigate through the
website.
So that's going to run.
I'm going to launch another conversation.
I'm going to add bridge mind UI.
Okay.
Look at this guys.
So this did just okay.
This just launched.
We need to let this launch.
So this is anti-gravity right now.
You can see that it's it's getting the Dom.
It's right now.
This is actually it says the site can't be released.
It's refused to connect.
So let's see if that figures that out.
But let's drop in bridge mind UI and I'm going to say I
want you to use the chrome dev tools MCP and I want you to
navigate through every single page of the website and
evaluate any console errors that are happening and fix
them.
So I'm going to give it that that prompt as well and I'm
going to kick off another one and I'm going to drop in the
bridge mind API.
I want you to review this NestJS project and evaluate it
for bugs and vulnerabilities, security vulnerabilities.
I also want you to evaluate it for performance improvements
that we could make in terms of making it a more performant
API.
Do an in-depth review, do not update any code,
but just output your findings after an in-depth review.
So we're going to drop in that one as well.
And then I'm also going to start another conversation.
I'm going to drop in bridge mind.
Actually, let's do bridge voice.
And what I want to do is I want to say,
I want you to review the themes and the different options
that users can use to customize bridge voice and bridge
space.
And I want you to create better themes that are more on
that are more customizable,
offer more themes that are cooler, techno,
and review what's in existence and then add new themes and
improve existing themes.
So we're going to drop in both bridge base,
Tari and bridge voice,
and we're going to improve the themes using, you know,
that styling that I talked to you guys about.
So we're going to pass in that.
I want you to make sure that the themes are consistent in
both Tari applications.
So we're going to drop that in and we're going to paste it
in and then let's launch another one and let's drop in.
We're going to do bridge mind.
Let's do bridge mind.
Let's do.
Let me see here.
Okay, I have an idea.
So this is one thing that definitely needs to be done.
So let's go to birdspace.
Tari, I'll work on this on stream today.
I want you to review the logic associated with dragging and
dropping a workspace up to the header in order to create a
new workspace with that pane.
This functionality does not work and I've tried to fix it,
but it continues to have errors and not work.
So I want to restart and remove all of the functionality
associated with grabbing a terminal pane and dragging it up
to the header to drag it and drop it and create a new
workspace.
I want to completely remove the existing functionality.
Do an in-depth review and then remove that functionality
because we are going to build that functionality from
scratch and I want all of that functionality removed.
Do an in-depth deep dive.
Create a structured plan for what code you need to remove
without breaking functionality and then update the project
respectively.
So we're going to drop that in and again you guys can see
bridge voice is pretty much near immediate transcription
times and we used opus 4.6 the other day or no it was
sonnet 4.6 that we used to be able to 10x the improvement
and the response in the performance of bridge voice bridge
voice has gotten very good.
So right now we have several we have one, two, three, four,
five agents working inside of anti-gravity and again what
I'll say is that I have noticed improvements inside of anti
-gravity.
I probably will use anti-gravity a little bit more in my
workflow.
It's been very impressive to use.
You can see here it says hey MCP versus MCP server not
found.
Let's see if it knows how to add that MCP that'll be
interesting if it understands how it's able to add that MCP
but all these agents are working and I'm going to let them
work and we'll you know we'll probably work on this on
stream but I want to what I want to talk to you guys about
is just Google anti-gravity has gotten better especially
with this model you know I think that hey if you want the
best harness for Google models for Gemini models you want
to use their native suite right it's like hey if you want
the best out of anthropic models you probably want to be
using cloud code right if you want the best out of Gemini
models you probably want to be using the Gemini CLI or anti
-gravity.
One thing to note is that Gemini 3.1 Pro is still not
available in the Gemini 3 or the Gemini CLI.
I have the Google AI Ultra plan and it's still not
available.
I got Gemini 3.1 Pro a little bit early inside of the anti
-gravity yesterday which was nice but in the Gemini CLI
it's still not available so that's one important thing to
note.
So these are all working let's see this let's see this here
so it's able to go and it wasn't able to navigate this so
it said oh I wasn't able to open this but look at this it
said hey now navigate here right so now it's navigating
here can I go back over to oh look at this yes check this
out guys so now look at this this is the cool part about it
so you're able to navigate and see like okay this is what
happened it records it it's able to navigate the site even
here do you guys see this look at how it's improving it's
already improving it but Gemini was clicking around the
website and it has full control of my browser and the
ability to click through like oh look at this so it's
literally navigating my website and like scrolling down and
evaluating it and screen recording it and evaluating it and
taking screenshots and this is something about anti-gravity
that is completely different than what you would be getting
out of something like cursor.
Cursor does not have this functionality so I'm gonna let
this work I think I actually just interrupted it and messed
it up probably but what I will say is that Google anti
-gravity do not sleep on it because that browser use tool
is very impressive like look at this you're gonna be able
to see it look at it clicked on one of the that's just like
incredible right so this is different this is definitely an
improvement and this will continue to get better but based
off of my usage of anti-gravity yesterday I just want you
guys to know that this tool anti-gravity is getting better
and especially with this new model Gemini 3.1 Pro it's
getting even more it's getting more improvement so that's
one thing that you want to be you know you know clued in on
is that this is going to get better if you want to use
Gemini 3.1 Pro models or just Gemini models in general
we're going to start using anti-gravity and the Gemini CLI
a little bit more on stream just because I think that this
model is really good again I'm putting the bridge line
stamp of approval on this model I'm going to be using it in
my vibe coding workflow I'm going to start using anti
-gravity more because look at this use case a couple months
ago this you know browser tool use this didn't work
consistently and what you guys are seeing is that now it
does so this is just one thing that I wanted to highlight
and say hey the anti-gravity is getting better chat like
this is this is getting better and we want to stay up to
date with the latest tools because you can't do this inside
a cursor cursor doesn't do this guys it uses playwright
uses some goofy browser tools anti-gravity is able to do
like screen recordings and navigations that is incredible
you could test entire UI and auth flows and pretty much
everything about your site just by prompting anti-gravity
so definitely take note of this Gemini 3.1 Pro bridge mind
approved we're going to be using anti-gravity more I'm very
impressed with how it performs I'm impressed with the speed
I'm impressed with the intelligence I'm impressed with the
cost we'll continue to use it and that could change over
time and as new models release that could change as the
workflows improve or different models come out or different
you know tools come out but what I will say is that I'm
going to be integrating this model into my vibe coding
workflow I'm going to start using the Gemini CLI more I'm
going to start using anti-gravity more because Google is
cooking and we need to stay up to date with what they're
releasing because I've been thoroughly impressed by this
latest release so with that being said guys I'm going to
end the video here if you haven't already liked and
subscribed or joined the bridge mind discord community make
sure you do so and with that being said I will see you guys
in the future
Ask follow-up questions or revisit key timestamps.
The speaker shares insights on Google's newly released Gemini 3.1 Pro model, highlighting its significant improvements over previous versions and competitors like Opus 4.6. After 17 hours and 214.6 million tokens spent, the model demonstrated substantial leaps in benchmarks such as abstract reasoning, live coding, and especially a significantly reduced hallucination rate of 50%, the lowest for a frontier model. Gemini 3.1 Pro also shows superior speed (60 tokens/sec on Google Vertex) and cost-effectiveness ($2 input/$12 output per million tokens) compared to Opus 4.6. In real-world applications, it excelled in creating unique and modern UI elements, generating marketing videos, intelligently retrieving brand logos, and flawlessly refactoring complex backend authentication systems across multiple repositories where Opus 4.6 failed. The speaker also notes its strong copywriting abilities and praises its styling capabilities. While preliminary benchmarks in LM Arena and Design Arena show it lagging slightly, the speaker gives Gemini 3.1 Pro the "bridge mind stamp of approval" due to its impressive performance in UI, writing, and instruction following, integrating it into daily workflow. Furthermore, the video highlights significant improvements in Google's native Anti-Gravity tool, which now offers advanced browser interaction, navigation, and screen recording capabilities, making it a powerful tool for testing and evaluating websites, especially when combined with Gemini 3.1 Pro.
Videos recently processed by our community