HomeVideos

The Ultimate Local AI Tier List For 2026

Now Playing

The Ultimate Local AI Tier List For 2026

Transcript

676 segments

0:00

Today, I'm ranking the most popular

0:01

local AI use cases for you based on real

0:03

engineering experience. For each

0:05

category, I'll recommend the one model

0:07

or tool that I would start with so you

0:09

can skip hours of research and just get

0:11

going. Personally, I've spent hundreds

0:13

of hours testing with my RTX 1590, but a

0:16

lot of cases work on weaker hardware,

0:18

too. So, with that being said, let's get

0:20

right into it. First up, we're going to

0:22

start with local code autocomplete. And

0:24

this is straight up S tier. Code

0:27

autocomplete is one of the first ways

0:29

that we were using AI for automated

0:31

coding with something like ghetto

0:32

copilot. When you type code, the model

0:35

finishes your line or even fills in a

0:36

function body before you can think about

0:38

it. And while agent coding seems to be

0:40

taking over the world, code autocomplete

0:42

is still extremely useful. And the great

0:44

part is that it absolutely works great

0:46

with local models. One model you could

0:49

use is something like quen 2.5 coder

0:51

because it's a small 7 billion parameter

0:53

model which can run at sub 100

0:55

millisecond latency even sometimes on

0:58

GPUs which is a couple gigabytes of

1:00

VRAM. Now you're going to sometimes even

1:02

get results that are faster than

1:04

networkbased autocompletes because

1:05

there's no round trip. You're just you

1:08

know executing this with local models

1:10

and this gives you a similar experience

1:12

as the code autocomplete models that

1:14

were state-of-the-art a couple of years

1:15

ago. You can pair your local model with

1:18

something like continue dev and use for

1:20

the inference and then you've got a

1:22

self-hosted co-pilot replacement at the

1:25

very least you know the autocomplete

1:26

features of it that costs nothing after

1:28

hardware. While this doesn't beat the

1:30

newest agentic features that you get

1:32

with things like cloud code, copilot,

1:34

codecs, it's a great start and this

1:37

actually works well locally. Local can

1:39

beat cloud models here on the metric

1:41

that matters most when it comes to

1:42

autocomplete, which is response speed,

1:44

and you can fully customize it yourself

1:46

as well. We're going to be talking about

1:47

other coding use cases later, but first,

1:49

let's move to something different, photo

1:51

enhancement. This is something that I

1:53

would comfortably place in a tier. Photo

1:56

enhancement covers things like

1:57

upscaling, face restoration, background

2:00

removal, noise reduction, colorization,

2:02

anything basically where you take an

2:04

existing image and you make it better

2:06

without generating new content. I'll

2:08

focus on upscaling for now as this is an

2:10

example that is very common but the

2:11

whole category is strong locally. A tool

2:14

that you can get started with easily is

2:16

something like upscaly. It's a free open

2:18

source desktop app that uses real ESR

2:21

GAN models. You don't have to screw

2:23

around too much with Python. You don't

2:24

need the command line yet. No custom GPU

2:26

configuration just to see what's

2:28

possible. You drag a photo in and you

2:30

will get a 4x or even 8x upscaled

2:32

version out of it. From there, because

2:35

you can see what's possible, you can go

2:36

ahead and try and play around with the

2:38

models yourself and create a custom

2:40

workflow that works for whatever you

2:41

need to do. It's great to be able to

2:44

enhance your photos without having to

2:45

rely on only a cloud-based version of

2:47

Photoshop. And any dedicated GPU with

2:50

something like 4 GB of VRAM can handle

2:53

this use case in just seconds. Another

2:56

great use case is home automation. Home

2:58

automation covers many different tasks,

3:01

but the whole category can definitely be

3:03

a tier. With home automation, you can

3:06

have use cases like having your security

3:07

cameras detect a person in your driveway

3:09

and your phone getting a notification

3:11

with a snapshot, which will all be

3:13

processed on a box in your closet. And

3:15

the great part about this is that you

3:16

will not have a cloud subscription

3:18

anymore. And maybe more importantly,

3:21

your security footage leaving your

3:23

network to be stored on maybe some

3:25

random Chinese cloud that you don't know

3:26

about. Object detection is the headline

3:29

feature here, but local home automation

3:31

can also cover voice control, presence

3:33

sensing, and even energy monitoring. In

3:35

any case, the stack that you can start

3:37

with is Frigot NVR plus Home Assistant.

3:40

This is probably the most mature local

3:42

AI ecosystem that exists right now,

3:44

especially for beginners. With this, you

3:47

can have use cases like person

3:48

detection, vehicle detection, pets,

3:50

packages, license plates, and even some

3:52

basic facial recognition as part of some

3:55

of the latest Frigot versions. And all

3:57

of this will be processed entirely on

3:59

your hardware. Home Assistant has over 2

4:02

million active installations, and Frigot

4:03

has over 30,000 stars on GitHub. So,

4:05

you're not going to be alone. It's not

4:07

just a hobby project. It will be a full

4:08

ecosystem that you can plug into. The

4:11

only thing keeping this from S tier is

4:12

the set of complexity. You often will

4:14

have to configure Docker, camera

4:15

streams, detection zones, and even if

4:18

it's running, it might break in a couple

4:19

of weeks because things update fast in

4:21

the home automation space. Now, you

4:24

might be already thinking, "This is a

4:25

nice tier list so far, but how do I get

4:27

started?" Well, I put together a

4:29

collection of my own open- source

4:30

projects for many of the local AI use

4:32

cases we're covering today. The link is

4:34

in the description, and you should

4:35

definitely check it out. A whole

4:37

different use case is video generation,

4:39

and this one is a little bit

4:40

disappointing. I'm going to put it in

4:42

the C tier because the idea behind video

4:44

generation is that you can type a prompt

4:45

and get a full video clip out. Something

4:48

you can use for product demos, chosen

4:49

reader content, or just content

4:51

visualization. This is the category

4:53

everyone wants to work locally because

4:55

cloud video generation is expensive at

4:56

rate limited. Problem is though that

4:59

it's still very expensive to generate

5:00

locally in terms of the time that it

5:02

takes and the quality is just not great.

5:04

One model you can try is WAN 2.1 from

5:07

Alibaba, which can beat Sora on several

5:09

benchmarks, which was, you know, a

5:11

state-of-the-art model a while back. The

5:13

issue is though that even on my 1590, I

5:16

can't really use the full 14 billion

5:18

model. I already have to kind of go down

5:20

to the smaller 5 billion parameter one,

5:22

and the quality is just a lot less. In

5:25

addition, because it takes a long time

5:26

to generate video, it is also a lot more

5:29

time expensive to generate many variants

5:31

to figure out how you can generate a

5:33

video clip that actually looks good

5:35

compared to image generation, which

5:36

we'll get to later. You don't just

5:38

generate one frame. To get a good video,

5:40

you need to generate many different

5:41

frames, and that just takes a long time.

5:45

locally. This might be usable for some

5:47

experimental social media clips and

5:49

concept work, but real professional

5:51

production is still cloud territory, and

5:53

that's why this is going in C tier.

5:55

Speaking of something that's related,

5:57

image generation. Now, this works much

5:59

better. I would put image generation in

6:01

the S tier. With image generation, you

6:04

describe an image and the model can

6:05

create it. You can use it for

6:07

thumbnails, marketing assets, product

6:08

mockups, and concept art. This can also

6:11

include inpainting, outpainting,

6:13

basically editing. This is Disney image

6:15

and even generating a new image variant

6:18

based on an image that you're passing to

6:19

it. Now I'll focus on text to image

6:21

generation because that's where most

6:23

people start. The model you can use here

6:25

is Flux 2D which you can run through com

6:27

UI on something like my 5090. This is a

6:30

very comfortable use case and it's very

6:32

easy for me to generate an image in just

6:34

a couple of seconds. Unlike video

6:36

generation, this means that you can

6:37

generate hundreds of variants in, you

6:39

know, a short amount of time, which

6:41

increases the chances that you actually

6:43

get an image that you're happy with. And

6:45

in fact, Flux is a really good model

6:47

because in some blind tests, it achieved

6:49

a 71% win rate over some of the older

6:52

midjourney versions for editorial photo

6:55

realism. So, the models are really

6:57

getting much better. And with image

6:59

generation, I've also found that there's

7:01

a really great community where folks

7:03

have fine-tuned a lot of models for

7:05

custom characters and styles. And

7:07

there's even a lot of image generation

7:09

models that have less content filters

7:11

than the one in the cloud. It's good

7:12

that cloud has content filters,

7:14

especially for copyright, but sometimes

7:16

they go a little bit too far and they

7:17

become a bit unusable to be honest. And

7:19

actually training a custom Laura model

7:22

or actually fine-tuning one only takes

7:24

15 to 20 images. And you can do that on

7:26

consumer hardware as well. Now, one

7:28

thing that I've learned from using AI

7:30

image generation for my own content is

7:32

that iterative editing is where it gets

7:33

a bit tricky. Image properties like

7:36

gradients and complex textures make it

7:38

very difficult for those local models to

7:40

edit images without losing quality.

7:42

Personally, this is still where I use

7:44

the latest NA banana model or just use

7:46

Photoshop to just make sure that I can

7:48

use some of the cloud features. So, even

7:50

though it's S tier, there are some use

7:52

cases where it's going to fall behind a

7:54

little bit, but it is really getting

7:55

better every single month. A completely

7:57

different but fun use case is voice

7:59

agents. Now, this one is something that

8:01

I'm going to place in C tier because of

8:03

the wide variety of use cases which can

8:06

work in a variety of ways. Basically,

8:08

the idea behind voice agents is that you

8:10

can talk to your computer and it will

8:11

talk back like a local Alexa or customer

8:13

support bot that runs on your hardware.

8:15

But ideally, it should also be able to,

8:17

you know, do actions for you. For

8:19

example, you can use it to do hands-free

8:21

home automation. The best open source

8:23

option right now would be something like

8:24

pipecat which can chain together a

8:26

speechto text model an lm and then a

8:29

texttospech model into one single

8:31

pipeline and that will allow you to get

8:33

something like sub 800 millisecond voice

8:35

to voice latency on some standard

8:38

hardware on something like Mac OS. The

8:41

problem is that the model size

8:43

constraint means that the AS responses

8:45

locally are noticeably dumber than you

8:47

can get from the best local models that

8:49

you interact with over chat and of

8:51

course the best cloud models. In fact,

8:53

this is even an issue with cloud voice

8:55

agents. They can achieve 200 to 500

8:57

millisecond latency, but their

8:58

intelligence is more around GBT40 level

9:01

and not the latest models that are out

9:03

there. And of course, you know, this

9:05

intelligence gap is much higher for

9:07

local voice agents. That being said, if

9:10

you just have a singular command that

9:11

you want to run, voice agents can work

9:13

pretty well. The main issue where they

9:15

lack quality is in longer conversations

9:17

because with local models, you're just

9:19

much more constrained on your context

9:21

window, which means that as you have a

9:23

long conversation with a voice agent, it

9:25

will simply go off track and you don't

9:27

have that issue if your voice agent is

9:29

just there to process one command and

9:31

execute it. So that would be the

9:33

recommended use case I would say. Now,

9:35

one component of voice agents is a text

9:37

to speech model and you can use text to

9:39

speech on its own. And on its own, this

9:41

is definitely a tier. With text to

9:43

speech, you feed text in and you get

9:44

natural sounding audio out. You can use

9:46

this for audiobook narration, voice over

9:48

for video, accessibility features in

9:50

your app, or even trying to clone your

9:52

own voice for content, which I've tried,

9:54

but that one is a bit ambitious. It

9:55

doesn't work very well. Now, this

9:57

category also corporates music

9:58

generation and sound effects because

10:00

it's not really my specialty. In general

10:02

though, voice synthesis is where local

10:04

models have made the biggest gains. Now,

10:06

text to speech has had the most dramatic

10:08

transformation in my opinion of any

10:10

local AI category in the past 18 months

10:12

because I would say that it's almost a

10:14

solved problem at this point, especially

10:16

for the English language. The model that

10:18

I would start with is Chatterbox from

10:20

resemble AI because it beats 11 laps in

10:23

a couple of blind tests with over 60%

10:26

listener preference rates. And the base

10:28

model might be English only, but you can

10:30

even go to chatterbox multilingual which

10:32

covers 23 plus languages. And so the gap

10:35

versus a cloud offering like 11 labs has

10:38

closed for a lot of use cases. And in

10:40

some cases the gap is gone entirely.

10:43

There are some gotchas. A lot of models

10:45

will hallucinate or really degrade in

10:47

quality past about a thousand

10:49

characters. But there's a lot of ways to

10:51

solve that. You know, you can split up

10:52

your text in multiple batches and just

10:54

process them one by one. Now, if you're

10:56

finding this tier list useful, make sure

10:58

to hit subscribe because over 90% of you

11:00

are missing out on the latest AI

11:02

engineering news that's based on reality

11:04

and not hype. So, make sure to join the

11:06

club. Next, we're going to be talking

11:08

about speech to text, which is another

11:10

component of voice agents that you can

11:11

extract on its own, and it's another

11:13

solved problem in my opinion. Speech to

11:16

text is absolutely S tier. You record

11:19

audio or you pass an audio file, you can

11:21

get a text transcript back for meeting

11:23

notes, podcast transcription, subtitle

11:25

generation, or just turning your voice

11:28

memos into searchable text. I use this

11:30

constantly myself for processing my own

11:32

YouTube video content. And for English,

11:34

it works very well. A model you can use

11:37

is faster whisper with large V3 Turbo,

11:40

which gives you over four times speed

11:41

over the original Whisper model. And the

11:43

real workflow that I use is a two-stage

11:46

pipeline. I use whisper to get a fast

11:48

accurate raw transcription and then I

11:51

use local LLM to clean up filler words

11:53

and extract the core meaning and this

11:55

way I can store the notes into something

11:57

like my Obsidian vault to reference

11:59

later. The transcription itself is

12:01

almost instant but the LM cleanup step

12:03

does add some time. The thing is though

12:05

I can just run it in the background so

12:07

that's not an issue at all for me. The

12:09

gap of cloud models is mainly with

12:10

speaker diorization, which means that

12:13

you can split up some audio context into

12:16

the different speakers. So let's say you

12:18

have a meeting with 10 people, you want

12:20

to be able to know who's speaking which

12:22

sentences, right? Cloud models are a

12:23

little bit better at doing that, but I

12:25

fully believe that local models will

12:26

catch up very quickly. The core use case

12:28

of transcribing audio just works very

12:31

well nowadays with local models. Next,

12:33

we're going to be talking about OCR,

12:35

optical character recognition, which I'm

12:37

going to comfortably place as our first

12:38

B tier. OCR covers table extraction,

12:41

formula recognition, and just converting

12:43

scan documents into structured data,

12:45

which is useful for so many use cases.

12:47

One tool to start with is Sura, or it

12:50

can use a deepse OCR model that has been

12:52

released recently. Now, a lot of the use

12:53

cases are a little bit more on the

12:54

boring side, like being able to transfer

12:56

and process invoices, but it does work

12:59

pretty well. Next, we're going to be

13:01

talking about agentic coding, which is

13:02

much more interesting than just doing

13:03

code autocomplete and seems to be hyped

13:06

up more and more for local models. Now,

13:08

the one disappointing thing about agent

13:10

coding is that it doesn't work that well

13:12

unless you have very good hardware. A

13:14

lot of YouTube videos, they show Agenta

13:16

coding and then they show how it can

13:18

generate Python code and that's good and

13:19

all, but with Agenta coding, you really

13:21

want a model to be competent enough to

13:23

read your entire codebase, write code,

13:25

run tests, and iterate until your

13:27

feature works, the feature that you

13:29

asked for with text. The issue is though

13:32

that most absolutely most hardware is

13:34

too incompetent for genting. Now, I've

13:37

created many videos on my channel

13:39

covering how performant agent coding is

13:42

on my 5090, and it's definitely getting

13:44

closer, but it is nowhere near as useful

13:47

as using code autocomplete, nor

13:49

something like an AI chat, because

13:51

eventually local AI agents choke on

13:53

larger projects. That being said, this

13:56

is something that improves every single

13:57

month. So definitely check out the

13:59

videos that I have in my channel which

14:00

I'll put in the link in the description

14:02

below to get the most out of your local

14:04

setup because it's very difficult to set

14:06

up properly and you really need a bit of

14:08

an expert guide to do this right with

14:10

aentic coding you might be able to match

14:11

something like GBT40 locally but Claude

14:13

Opus 4.6 and similar models have really

14:16

up the game here and there is just a

14:18

huge gap between what you can do locally

14:19

versus with the state-of-the-art cloud

14:22

models. Even if your local model is

14:23

intelligent enough, the problem is that

14:25

it will get slow as the context window

14:27

fills up. And unlike with these other

14:29

use cases where you can clear out the

14:30

context window after you improve small

14:33

bits of the task, with a coding, you

14:36

cannot clear the context window with

14:37

every step. You need the coding model to

14:40

have a good idea of how your codebase

14:41

works. So in the end, these models are

14:44

going to work very slowly on your local

14:46

hardware, especially compared to cloud

14:48

models, unless you have a very beefy PC.

14:51

Most people promoting local models with

14:52

Agenta coding are not using it for

14:54

serious projects. And if you don't

14:55

believe me, check out some of my master

14:57

classes on this channel to learn the

14:59

truth. If you do want to get started

15:01

with a coding, I recommend some of the

15:03

latest Quen models. But again, I cover

15:05

that extensively in the videos on my

15:06

channel already, so go check those out.

15:08

As models get stronger, I expect it to

15:10

move to A or S tier, especially as local

15:13

models will catch up as well. You know

15:15

what isn't A tier yet, though? AI chats.

15:17

A traditional use case. With AI chats,

15:20

you can ask questions, brainstorm ideas,

15:22

summarize documents, draft emails, the

15:24

same stuff you would use something like

15:25

chat GPT for, but it can run entirely on

15:28

your machine with no data leaving your

15:29

network. I would recommend new models

15:32

like something like Quen 330B through LM

15:34

Studio. And this is a pretty optimized

15:37

model that can run pretty fast, but

15:38

honestly, there's many choices for AI

15:40

chat models. You can use a newer Mistral

15:42

model or even check out the older but

15:45

still competent Open- source OpenAI

15:47

model that's out there. In any case,

15:49

locally you can really mimic what GPT40

15:51

was able to do a year plus ago and now

15:53

fully locally. You can use it for so

15:56

many different purposes that it's just a

15:57

great use case altogether and a very

15:59

beginner friendly one because everyone

16:00

understands the concept of an AI chat,

16:02

right? The great part too is that you

16:04

have many choices for the right user

16:06

interface here. You can use something

16:08

like LM Studio which is a desktop app

16:09

that already has a chat UI or even

16:12

create your own web app and customize it

16:13

to your liking. There are so many open

16:15

source repos and again check out the

16:17

link in the description if you want to

16:18

make a good start on that because I've

16:20

got many templates for you to customize.

16:22

A specific subset of AI chats is rack

16:25

retrieval augmented generation which is

16:27

a very fancy word that some of you

16:28

probably already know about which is the

16:30

fact that you can point an AI at your

16:31

own files, company documents, research

16:33

papers or notes and it will answer

16:35

questions based on that content. It's

16:37

pretty similar to AI chat but the thing

16:39

is is that generally it's a bit more

16:41

expensive and complex to set up. You

16:42

might have to put your documents into a

16:44

vector database and you have to make

16:47

sure that those documents are retrieved

16:48

properly to make sure that the AI chats

16:51

can answer questions about them. But if

16:52

you do it right, well, this can work

16:54

very well for your own use cases and to

16:57

make sure that these local models are up

16:59

to date with the latest knowledge in

17:01

your use case, which is not in the

17:03

training data of the model, especially

17:04

because some of these open source models

17:06

have been trained half a year or a year

17:08

ago. So, rag is a very important

17:10

paradigm and I'm putting it in beats

17:12

here because of the technical complexity

17:13

to set it up, but it is kind of a

17:15

requirement for a lot of real AI chat

17:17

use cases. So, it's definitely something

17:19

you want to learn to set up yourself.

17:21

One tool you can start with is something

17:22

like open web UI, which gives you a full

17:25

rag pipeline out of the box with a clean

17:26

interface. And the nice part about this

17:29

is that because it's all running

17:30

locally, data privacy is the consistent

17:32

top reason enterprises will use these

17:35

kinds of self-hosted LLMs. So if you

17:37

learn these skills, you can definitely

17:38

get an AI engineering job with that as

17:40

well. So we have talked about some AI

17:42

use cases now like a chat, a rack chat,

17:45

we have agentic coding, code

17:46

autocomplete. But what about the ability

17:48

to create any AI agent of your dream

17:50

where I'm defining an agent as a system

17:53

that can autonomously make decisions,

17:55

execute actual actions for you, and be

17:57

able to solve a problem in many

17:59

different ways. Well, the thing is a

18:01

true AI agent is very difficult to run

18:04

locally. I'm going to put it in Ctier,

18:05

but I wanted to make sure I explained

18:07

this to you because you've probably seen

18:08

many videos here on YouTube talking

18:10

about AI agents. The problem is though

18:12

that most of those videos are not real

18:13

agents. They're more deterministic

18:15

workflows on something like any 10 with

18:17

a small LM component in the middle that

18:19

might make one or two decisions. But a

18:22

true AI agent that can run like cloud

18:24

code simply requires a very good

18:26

language model or else it will get

18:28

confused about all the tools it has

18:30

access to. it will not be able to run

18:32

autonomously without you pushing it into

18:34

the right direction and you will just

18:36

find that there are many issues with it

18:37

in general. That being said, I'm not

18:39

saying that AI agents are incapable to

18:42

work locally. It just depends on your

18:44

use case and most people that I've

18:45

talked to who want to build their first

18:47

AI agent are a little bit too ambitious.

18:49

They might want to build a research

18:50

agent that works better than something

18:52

you can use with OpenAI GPT Pro. But to

18:55

be quite honest, it's very difficult to

18:57

build a better AI agent than using a

18:59

platform that you can access over the

19:01

cloud. Again, something like cloud code

19:04

because even with cloud code, you can

19:05

point it to a local model, but it just

19:07

won't perform in the same autonomous

19:09

way. That being said, AI agents in

19:12

particular change all the time, and I

19:13

expect that as models get better, this

19:15

will move up to the B tier and the A

19:17

tier eventually. But even then, AI

19:20

agents simply won't run on weak hardware

19:23

because of literal mathematical

19:25

constraints that I've covered in other

19:27

videos on my channel. So depending on

19:28

your use case, this is just not going to

19:30

work very well. But if you think that

19:33

that's not true and you have had good

19:34

experiences creating local AI agents, I

19:37

would love to hear it from you in the

19:38

comments down below. But please tell me

19:40

what use case you have because most

19:42

people trying to run local AI agents

19:44

aren't really running through agents.

19:46

They're just running regular workflows

19:47

with LLMs sprinkled somewhere in

19:49

between. But how about a more simple AI

19:52

assistant that you can run locally like,

19:54

you know, OpenClaw, an always on

19:56

personal AI that manages your calendar,

19:58

tries your email, summarizes your day,

20:00

and handles whatever you throw at it for

20:02

personal use cases. The local version of

20:04

Siri or Google Assistant, but running

20:06

private and 24/7 on your own hardware.

20:09

So tools like OpenClaw aim to be exactly

20:11

this and sound great on paper. The issue

20:14

is though that setting them up properly

20:15

in a secure way that makes sure that you

20:17

actually keep your account secure is

20:19

pretty difficult. It does require a

20:21

little bit of security knowledge.

20:23

Because of that complexity, I would

20:24

right now put AI assistance into the

20:26

beats here because it can work quite

20:27

well if you are very aware of the

20:29

security problems with something like

20:31

OpenClaw. Now, personally, I would still

20:33

use OpenClaw with a state-of-the-art

20:35

cloud model because they're much better

20:36

protected against things like

20:37

jailbreaks. But there may be one

20:39

exception where I would use local

20:41

models, which is for well-defined cron

20:43

jobs, which are things that run on a

20:45

schedule like every 8 hours. This might

20:47

be something where you are summarizing

20:49

your feed into a daily digest or just

20:51

classifying your incoming emails. For

20:53

something like this, a local 14 billion

20:55

model works just fine. And yes, you can

20:58

use a local model with that. So far, we

21:00

have a lot of use cases and they all are

21:02

pretty competent, right? Even the ones

21:03

in Ctier, they're very usable and

21:05

they're getting better every single day.

21:07

But then what is the Dtier for? Well, to

21:09

be honest, that's for vibe coding. Vibe

21:12

coding with local models just doesn't

21:14

work. The idea with vibe coding is that

21:16

you just describe an app in plain

21:17

English and the AI will build the whole

21:19

thing and you never read the code. You

21:21

just judge whether the result works. And

21:23

this is different from agentic coding

21:24

because you are not explicitly reviewing

21:26

or steering anything. Now, honestly,

21:28

this workflow needs a frontier model to

21:30

cover for the fact that you're not

21:32

reviewing at all what it writes. A small

21:34

language model can code quite well with

21:36

the right guidance in a couple of files.

21:38

And for models under 14 billion

21:40

parameters, you cannot even use tool

21:42

calling properly, which is a requirement

21:44

for proper agenda coding, let alone vibe

21:47

coding where you're not guiding the

21:48

model at all. Vive coding with cloud

21:50

models already has serious problems.

21:52

Researchers found security

21:53

vulnerabilities in one out of 10 lovable

21:56

generated apps, for example. But with

21:58

weaker local models, those problems

22:00

multiply. So, let's recap this tier list

22:03

a little bit. The three S tier use cases

22:06

generally match or sometimes even beat

22:08

cloud models. Code autocomplete, image

22:10

generation, and speech to text. The

22:12

pattern here is that some of the more

22:13

boring use cases consistently outperform

22:16

the hyped ones for local models.

22:18

However, as models get better, some of

22:20

the more complex workflows like AI

22:22

agents as well as voice agents will just

22:25

get better over time and hopefully

22:27

everything will be in the B tier or

22:29

above in a couple of years from now. And

22:32

if you want to get started with local AI

22:34

projects, you should check out the link

22:35

in the description below and get started

Interactive Summary

The video ranks popular local AI use cases based on engineering experience, recommending specific models or tools to get started. S-tier applications include code autocomplete, image generation, and speech-to-text, which often match or surpass cloud models, especially in terms of speed and privacy. A-tier use cases cover photo enhancement, home automation, and text-to-speech, offering strong local performance and benefits. B-tier contains OCR, AI chats (mimicking GPT-40 from a year ago), Retrieval Augmented Generation (RAG), and AI assistants, which are competent but might involve more complexity or security considerations. C-tier includes video generation, voice agents, and true AI agents, which currently face challenges with quality, time, model intelligence, or hardware requirements locally. Finally, D-tier is reserved for 'vibe coding,' which is deemed unfeasible with current local models due to a lack of guidance and inherent security risks. The speaker emphasizes that "boring" use cases often outperform hyped ones for local models, but complex workflows are continuously improving.

Suggested questions

12 ready-made prompts