The GPT-OSS Open Source models are finally here! – Local AI Just Leveled Up
235 segments
Hello everyone, this is professor
patterns and in this video we're
covering the new GPT OSS model. Now two
models have been released. One is the
GPT OSS 20 bill parameter and we have a
120 bill parameter model. Um both of
these are Apache license so that's
great. U I didn't think it would be
Apache so I'm pretty excited about that.
Um we can see that there's a bunch of
documentation that's been released about
this and the best part is that they
actually did partner with um all of
these local part third party inference
providers. So u hugging face VLM Olama
it's available through all of these
things um from today. So, I've actually
been using this model and uh why don't
we go ahead get started by first making
sure that we can download this model and
then trying a couple of different
prompts to see how well this model
actually performs. Now, I am using Olama
for this, but honestly, I could use VLM,
I could use Llama CBP. Um, I just use
Olama cuz it's convenient. Um, the only
thing is that if you're going to be
using Olama for the first time or
something or after a really long time,
uh, you might want to install and
download the latest version of Olama or
it's probably not going to work for you.
Uh, once you download the latest
version, then you will be able to pull
the model. So, all you would do is just
open up your command prompt window is
type in Olama pull and then the name of
this model. Now, the best part is that
this should also be available on the
Olama website. So, if you actually go to
ola.com/d download and then just search
for the model here, uh you should be
able to see GPTOSS,
the one that you want is 20 bill
parameter. I mean, unless you have
something that can fit the 120 bill
parameter model. Um, personally, I have
a Nvidia 4090 GPU that has about 24
gigs. So, this model is about 16 gigs.
So, um, you can actually try to run it
on a little bit of a less beefier GPU,
but you can run it on a CPU. It's just
going to be extremely slow to run. Okay,
so all I did is I copied that command
and I went here and I just said pull and
I gave it the name of this model. In
this case, I don't want to do what the
120 bill. I'm just going to do the 20
bill parameter. Now, because I already
have this model, it's just going to say
success.
Okay. So after you download the model
now if I go to open web UI directly over
here I should be able to see this model
and interact with it. So I can say
something like hey and I should be able
to get a response here. This model does
support the reasoning capability. So you
should be able to see that as well. Uh
let's ask it to um give me some
resources for learning Python. Sure why
not? Now, in the documentation, they
said that these models are great for
things like agent use and stuff. So, um
I'm pretty excited to actually try some
of that stuff out as well. Um it said
that benchmarks to measure their
capabilities, encoding, competition,
health, agentic tool use when compared
to these other models. So, that's
something that I'm going to be trying
out like right after this video is um
how well these models actually perform
in that case. But here we can see like
it does give really good results. Um we
have a couple of different formats. So
like we have some online course and muks
that it recommended some interactive
platform some supplemental readings. Um
let's say that we wanted to maybe create
a study plan for my Python exam.
Let's see how long it actually does the
reasoning step for. So it does the
thinking. You can see the thought
process here and super fast inference.
Pin down the details. Um, so exam
formats is asking a couple of follow-up
questions and then quick start plan. So
6 weeks, 5 days a week, one to two hours
per day. And these are all the things
that it recommends. So,
okay, not bad. Like I would say like it
and let's see if any of these links
actually work. I'm guessing not. Uh, if
I export to CSV. Oh, nice. That actually
does work. Perfect. Okay. Um, let's also
now try to use see if this is available
through open router as well. So, I'm
going to go to open router and then here
I'm going to search for models and there
we go. GBT OSS and it's
pretty cheap for the 120 build parameter
model. The claude opus in comparison is
$15 per million tokens. And let's take a
look at the
um Claude Sonnet for model is about $3
per million tokens. So this is
significantly cheaper. I'm going to copy
this and I'm actually going to be trying
it for um local
agentic coding type application which
basically is through client. So, I'm
going to type this model in and 120
build parameter model. And I did try to
use the one with Lama, but it just timed
out here. Um, I'm going to try this
again here. But let's try the same exact
prompt that I had before. So, create a
website for my t-shirt store.
Okay, super fast thinking.
And then creating the index.html.
Wow, that is blazingly fast. That's the
Gemini Pro Flash model that it kind of
reminds me of. Oh my god, that that is
insane. But let's see like how good this
is actually going to be because if the
CSS doc is only like 100 lines. Yeah, I
don't know how this is going to turn out
to be. Let's
start the index.html.
Okay. Awesome. Awesome t-shirt store.
And how much did this actually cost me?
Okay. 0.0083.
Let's see if I can say something like
make the website super nice. Make it
extremely awesome. Um, add lots of
interactions. Want to make this website
look super duper fancy. Think step by
step. Well, tip.
And I'm going to say $1,000. Why not?
Okay. So it goes into this whole
thinking process. Um the inference
through open router is insanely fast.
Wow.
Okay. It added a couple more things here
to HTML, but nothing really too intense.
Let's see how much it actually updates
the CSS here.
And so far I haven't even spent a scent
yet. So that's going to be interesting.
There we go.
I'm guessing that once more people start
using this model. Um because if you take
a look at the providers here, let's
click on this
and who is actually providing this
model. So we have it through Grock
Sirius and Fireworks.
Okay. So I'm guessing the more people
that will watch this video and also like
the more people that will start using
these models, it's going to reduce the
overall latency time here. But for now,
it is running super fast.
Okay. Awesome t-shirt store. It's got
these sort of like cool interactions.
Not sure what this dark mode thing is
about. Um, yeah, not amazing, I would
say. Like I guess I can see why this
model is a little bit on the cheaper
side, but really fast. I'm trying to
think maybe I could pro probably use
this model for something like really
quick checking for bugs or something. Uh
but you know I don't really see me using
this um
as a replacement for something like
Claude Sonnet or Opus or something for
coding tasks. Now there is also like
fine-tuning um stuff that's available
here. So if you actually see there's an
article for fine-tuning with GPTO, OSS
and hugging face transformers
and they actually have a full
documentation here. Um you can set up
install all the different libraries,
prepare the data set and then fine-tune
this specific model for your
application. So I'm also going to create
another follow-up video for fine-tuning
this model. I think that would also be
interesting.
But these are just some preliminary
projects uh that I've been working with
with the GPTOSS model. Try the local
variants. There's also the if you go to
gpt
oss.com.
Um you'll also be able to see this sort
of interface where you can try out like
different things like reasoning levels,
show reasoning, don't show reasoning,
and try these two models out as well.
Give it a couple of prompts like how far
away is the sun from Earth. This way you
don't even need to download this model
locally on your computer. um it will
just kind of run here through this
interface.
But yeah, that's pretty much it for this
video. More to come. I'm going to be
covering this in a lot more detail. I
think this is a really exciting update.
Um it's not I'm not mind blown or
anything like that based on like how
well this model is performing so far.
It's decent. It is fast and I like the
fact that it's local, but does it
actually replace my existing workflow?
Probably not. But let's see. I'm still
going to be experimenting with this a
little bit more and I'll keep you
updated. Thank you all for tuning in.
This is Professor Patterns and have a
good rest of your day. Goodbye.
Ask follow-up questions or revisit key timestamps.
The video introduces the new GPT OSS models, available in 20 billion and 120 billion parameter versions, both with Apache licenses. These models are accessible through various platforms like Hugging Face, VLM, and Olama, with a focus on local inference. The presenter demonstrates downloading and using the 20 billion parameter model via Olama, showcasing its capabilities in generating Python learning resources and a study plan. The model's reasoning process and quick inference are highlighted. The video also explores using GPT OSS through OpenRouter, noting its competitive pricing compared to other models like Claude. A live demonstration shows the model generating a website for a t-shirt store, praised for its speed. However, the presenter notes limitations in the website's sophistication. The potential for fine-tuning GPT OSS models is also mentioned, with resources provided for further exploration. The presenter concludes that while the model is fast, local, and decent, it doesn't yet replace their existing workflow but warrants further experimentation.
Videos recently processed by our community