The GPT-OSS Open Source models are finally here!

The GPT-OSS Open Source models are finally here! – Local AI Just Leveled Up

Watch on YouTube

Now Playing

The GPT-OSS Open Source models are finally here! – Local AI Just Leveled Up

Transcript

235 segments

0:00

Hello everyone, this is professor

0:01

patterns and in this video we're

0:02

covering the new GPT OSS model. Now two

0:06

models have been released. One is the

0:08

GPT OSS 20 bill parameter and we have a

0:11

120 bill parameter model. Um both of

0:13

these are Apache license so that's

0:16

great. U I didn't think it would be

0:18

Apache so I'm pretty excited about that.

0:21

Um we can see that there's a bunch of

0:22

documentation that's been released about

0:24

this and the best part is that they

0:27

actually did partner with um all of

0:29

these local part third party inference

0:32

providers. So u hugging face VLM Olama

0:36

it's available through all of these

0:37

things um from today. So, I've actually

0:40

been using this model and uh why don't

0:43

we go ahead get started by first making

0:45

sure that we can download this model and

0:48

then trying a couple of different

0:50

prompts to see how well this model

0:51

actually performs. Now, I am using Olama

0:55

for this, but honestly, I could use VLM,

0:57

I could use Llama CBP. Um, I just use

0:59

Olama cuz it's convenient. Um, the only

1:03

thing is that if you're going to be

1:04

using Olama for the first time or

1:05

something or after a really long time,

1:07

uh, you might want to install and

1:09

download the latest version of Olama or

1:12

it's probably not going to work for you.

1:14

Uh, once you download the latest

1:16

version, then you will be able to pull

1:18

the model. So, all you would do is just

1:20

open up your command prompt window is

1:22

type in Olama pull and then the name of

1:24

this model. Now, the best part is that

1:26

this should also be available on the

1:29

Olama website. So, if you actually go to

1:32

ola.com/d download and then just search

1:34

for the model here, uh you should be

1:36

able to see GPTOSS,

1:38

the one that you want is 20 bill

1:40

parameter. I mean, unless you have

1:42

something that can fit the 120 bill

1:44

parameter model. Um, personally, I have

1:47

a Nvidia 4090 GPU that has about 24

1:51

gigs. So, this model is about 16 gigs.

1:54

So, um, you can actually try to run it

1:57

on a little bit of a less beefier GPU,

2:00

but you can run it on a CPU. It's just

2:02

going to be extremely slow to run. Okay,

2:05

so all I did is I copied that command

2:08

and I went here and I just said pull and

2:10

I gave it the name of this model. In

2:12

this case, I don't want to do what the

2:13

120 bill. I'm just going to do the 20

2:15

bill parameter. Now, because I already

2:17

have this model, it's just going to say

2:19

success.

2:20

Okay. So after you download the model

2:22

now if I go to open web UI directly over

2:25

here I should be able to see this model

2:27

and interact with it. So I can say

2:29

something like hey and I should be able

2:33

to get a response here. This model does

2:35

support the reasoning capability. So you

2:38

should be able to see that as well. Uh

2:40

let's ask it to um give me some

2:44

resources for learning Python. Sure why

2:46

not? Now, in the documentation, they

2:48

said that these models are great for

2:50

things like agent use and stuff. So, um

2:53

I'm pretty excited to actually try some

2:55

of that stuff out as well. Um it said

2:58

that benchmarks to measure their

3:00

capabilities, encoding, competition,

3:02

health, agentic tool use when compared

3:04

to these other models. So, that's

3:06

something that I'm going to be trying

3:07

out like right after this video is um

3:10

how well these models actually perform

3:12

in that case. But here we can see like

3:14

it does give really good results. Um we

3:17

have a couple of different formats. So

3:18

like we have some online course and muks

3:21

that it recommended some interactive

3:23

platform some supplemental readings. Um

3:26

let's say that we wanted to maybe create

3:29

a study plan for my Python exam.

3:33

Let's see how long it actually does the

3:35

reasoning step for. So it does the

3:37

thinking. You can see the thought

3:39

process here and super fast inference.

3:43

Pin down the details. Um, so exam

3:46

formats is asking a couple of follow-up

3:48

questions and then quick start plan. So

3:50

6 weeks, 5 days a week, one to two hours

3:53

per day. And these are all the things

3:55

that it recommends. So,

3:59

okay, not bad. Like I would say like it

4:01

and let's see if any of these links

4:03

actually work. I'm guessing not. Uh, if

4:05

I export to CSV. Oh, nice. That actually

4:08

does work. Perfect. Okay. Um, let's also

4:12

now try to use see if this is available

4:14

through open router as well. So, I'm

4:16

going to go to open router and then here

4:19

I'm going to search for models and there

4:20

we go. GBT OSS and it's

4:25

pretty cheap for the 120 build parameter

4:28

model. The claude opus in comparison is

4:31

$15 per million tokens. And let's take a

4:36

look at the

4:38

um Claude Sonnet for model is about $3

4:42

per million tokens. So this is

4:44

significantly cheaper. I'm going to copy

4:47

this and I'm actually going to be trying

4:49

it for um local

4:52

agentic coding type application which

4:55

basically is through client. So, I'm

4:57

going to type this model in and 120

4:59

build parameter model. And I did try to

5:03

use the one with Lama, but it just timed

5:04

out here. Um, I'm going to try this

5:06

again here. But let's try the same exact

5:09

prompt that I had before. So, create a

5:10

website for my t-shirt store.

5:17

Okay, super fast thinking.

5:20

And then creating the index.html.

5:23

Wow, that is blazingly fast. That's the

5:28

Gemini Pro Flash model that it kind of

5:32

reminds me of. Oh my god, that that is

5:35

insane. But let's see like how good this

5:37

is actually going to be because if the

5:38

CSS doc is only like 100 lines. Yeah, I

5:42

don't know how this is going to turn out

5:43

to be. Let's

5:46

start the index.html.

5:49

Okay. Awesome. Awesome t-shirt store.

5:51

And how much did this actually cost me?

5:54

Okay. 0.0083.

5:56

Let's see if I can say something like

5:57

make the website super nice. Make it

6:01

extremely awesome. Um, add lots of

6:05

interactions. Want to make this website

6:08

look super duper fancy. Think step by

6:11

step. Well, tip.

6:14

And I'm going to say $1,000. Why not?

6:19

Okay. So it goes into this whole

6:20

thinking process. Um the inference

6:23

through open router is insanely fast.

6:26

Wow.

6:29

Okay. It added a couple more things here

6:32

to HTML, but nothing really too intense.

6:35

Let's see how much it actually updates

6:36

the CSS here.

6:41

And so far I haven't even spent a scent

6:44

yet. So that's going to be interesting.

6:47

There we go.

6:49

I'm guessing that once more people start

6:51

using this model. Um because if you take

6:53

a look at the providers here, let's

6:56

click on this

6:58

and who is actually providing this

7:01

model. So we have it through Grock

7:02

Sirius and Fireworks.

7:06

Okay. So I'm guessing the more people

7:07

that will watch this video and also like

7:09

the more people that will start using

7:10

these models, it's going to reduce the

7:13

overall latency time here. But for now,

7:16

it is running super fast.

7:20

Okay. Awesome t-shirt store. It's got

7:23

these sort of like cool interactions.

7:26

Not sure what this dark mode thing is

7:28

about. Um, yeah, not amazing, I would

7:32

say. Like I guess I can see why this

7:34

model is a little bit on the cheaper

7:36

side, but really fast. I'm trying to

7:40

think maybe I could pro probably use

7:42

this model for something like really

7:44

quick checking for bugs or something. Uh

7:46

but you know I don't really see me using

7:48

this um

7:50

as a replacement for something like

7:52

Claude Sonnet or Opus or something for

7:56

coding tasks. Now there is also like

7:59

fine-tuning um stuff that's available

8:01

here. So if you actually see there's an

8:03

article for fine-tuning with GPTO, OSS

8:05

and hugging face transformers

8:07

and they actually have a full

8:08

documentation here. Um you can set up

8:12

install all the different libraries,

8:13

prepare the data set and then fine-tune

8:17

this specific model for your

8:20

application. So I'm also going to create

8:23

another follow-up video for fine-tuning

8:25

this model. I think that would also be

8:26

interesting.

8:28

But these are just some preliminary

8:30

projects uh that I've been working with

8:32

with the GPTOSS model. Try the local

8:34

variants. There's also the if you go to

8:36

gpt

8:38

oss.com.

8:40

Um you'll also be able to see this sort

8:42

of interface where you can try out like

8:44

different things like reasoning levels,

8:45

show reasoning, don't show reasoning,

8:47

and try these two models out as well.

8:49

Give it a couple of prompts like how far

8:50

away is the sun from Earth. This way you

8:52

don't even need to download this model

8:54

locally on your computer. um it will

8:57

just kind of run here through this

8:59

interface.

9:00

But yeah, that's pretty much it for this

9:02

video. More to come. I'm going to be

9:04

covering this in a lot more detail. I

9:05

think this is a really exciting update.

9:07

Um it's not I'm not mind blown or

9:11

anything like that based on like how

9:12

well this model is performing so far.

9:14

It's decent. It is fast and I like the

9:16

fact that it's local, but does it

9:19

actually replace my existing workflow?

9:22

Probably not. But let's see. I'm still

9:25

going to be experimenting with this a

9:27

little bit more and I'll keep you

9:28

updated. Thank you all for tuning in.

9:30

This is Professor Patterns and have a

9:31

good rest of your day. Goodbye.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The video introduces the new GPT OSS models, available in 20 billion and 120 billion parameter versions, both with Apache licenses. These models are accessible through various platforms like Hugging Face, VLM, and Olama, with a focus on local inference. The presenter demonstrates downloading and using the 20 billion parameter model via Olama, showcasing its capabilities in generating Python learning resources and a study plan. The model's reasoning process and quick inference are highlighted. The video also explores using GPT OSS through OpenRouter, noting its competitive pricing compared to other models like Claude. A live demonstration shows the model generating a website for a t-shirt store, praised for its speed. However, the presenter notes limitations in the website's sophistication. The potential for fine-tuning GPT OSS models is also mentioned, with resources provided for further exploration. The presenter concludes that while the model is fast, local, and decent, it doesn't yet replace their existing workflow but warrants further experimentation.