The Ultimate Local AI Tier List For 2026
676 segments
Today, I'm ranking the most popular
local AI use cases for you based on real
engineering experience. For each
category, I'll recommend the one model
or tool that I would start with so you
can skip hours of research and just get
going. Personally, I've spent hundreds
of hours testing with my RTX 1590, but a
lot of cases work on weaker hardware,
too. So, with that being said, let's get
right into it. First up, we're going to
start with local code autocomplete. And
this is straight up S tier. Code
autocomplete is one of the first ways
that we were using AI for automated
coding with something like ghetto
copilot. When you type code, the model
finishes your line or even fills in a
function body before you can think about
it. And while agent coding seems to be
taking over the world, code autocomplete
is still extremely useful. And the great
part is that it absolutely works great
with local models. One model you could
use is something like quen 2.5 coder
because it's a small 7 billion parameter
model which can run at sub 100
millisecond latency even sometimes on
GPUs which is a couple gigabytes of
VRAM. Now you're going to sometimes even
get results that are faster than
networkbased autocompletes because
there's no round trip. You're just you
know executing this with local models
and this gives you a similar experience
as the code autocomplete models that
were state-of-the-art a couple of years
ago. You can pair your local model with
something like continue dev and use for
the inference and then you've got a
self-hosted co-pilot replacement at the
very least you know the autocomplete
features of it that costs nothing after
hardware. While this doesn't beat the
newest agentic features that you get
with things like cloud code, copilot,
codecs, it's a great start and this
actually works well locally. Local can
beat cloud models here on the metric
that matters most when it comes to
autocomplete, which is response speed,
and you can fully customize it yourself
as well. We're going to be talking about
other coding use cases later, but first,
let's move to something different, photo
enhancement. This is something that I
would comfortably place in a tier. Photo
enhancement covers things like
upscaling, face restoration, background
removal, noise reduction, colorization,
anything basically where you take an
existing image and you make it better
without generating new content. I'll
focus on upscaling for now as this is an
example that is very common but the
whole category is strong locally. A tool
that you can get started with easily is
something like upscaly. It's a free open
source desktop app that uses real ESR
GAN models. You don't have to screw
around too much with Python. You don't
need the command line yet. No custom GPU
configuration just to see what's
possible. You drag a photo in and you
will get a 4x or even 8x upscaled
version out of it. From there, because
you can see what's possible, you can go
ahead and try and play around with the
models yourself and create a custom
workflow that works for whatever you
need to do. It's great to be able to
enhance your photos without having to
rely on only a cloud-based version of
Photoshop. And any dedicated GPU with
something like 4 GB of VRAM can handle
this use case in just seconds. Another
great use case is home automation. Home
automation covers many different tasks,
but the whole category can definitely be
a tier. With home automation, you can
have use cases like having your security
cameras detect a person in your driveway
and your phone getting a notification
with a snapshot, which will all be
processed on a box in your closet. And
the great part about this is that you
will not have a cloud subscription
anymore. And maybe more importantly,
your security footage leaving your
network to be stored on maybe some
random Chinese cloud that you don't know
about. Object detection is the headline
feature here, but local home automation
can also cover voice control, presence
sensing, and even energy monitoring. In
any case, the stack that you can start
with is Frigot NVR plus Home Assistant.
This is probably the most mature local
AI ecosystem that exists right now,
especially for beginners. With this, you
can have use cases like person
detection, vehicle detection, pets,
packages, license plates, and even some
basic facial recognition as part of some
of the latest Frigot versions. And all
of this will be processed entirely on
your hardware. Home Assistant has over 2
million active installations, and Frigot
has over 30,000 stars on GitHub. So,
you're not going to be alone. It's not
just a hobby project. It will be a full
ecosystem that you can plug into. The
only thing keeping this from S tier is
the set of complexity. You often will
have to configure Docker, camera
streams, detection zones, and even if
it's running, it might break in a couple
of weeks because things update fast in
the home automation space. Now, you
might be already thinking, "This is a
nice tier list so far, but how do I get
started?" Well, I put together a
collection of my own open- source
projects for many of the local AI use
cases we're covering today. The link is
in the description, and you should
definitely check it out. A whole
different use case is video generation,
and this one is a little bit
disappointing. I'm going to put it in
the C tier because the idea behind video
generation is that you can type a prompt
and get a full video clip out. Something
you can use for product demos, chosen
reader content, or just content
visualization. This is the category
everyone wants to work locally because
cloud video generation is expensive at
rate limited. Problem is though that
it's still very expensive to generate
locally in terms of the time that it
takes and the quality is just not great.
One model you can try is WAN 2.1 from
Alibaba, which can beat Sora on several
benchmarks, which was, you know, a
state-of-the-art model a while back. The
issue is though that even on my 1590, I
can't really use the full 14 billion
model. I already have to kind of go down
to the smaller 5 billion parameter one,
and the quality is just a lot less. In
addition, because it takes a long time
to generate video, it is also a lot more
time expensive to generate many variants
to figure out how you can generate a
video clip that actually looks good
compared to image generation, which
we'll get to later. You don't just
generate one frame. To get a good video,
you need to generate many different
frames, and that just takes a long time.
locally. This might be usable for some
experimental social media clips and
concept work, but real professional
production is still cloud territory, and
that's why this is going in C tier.
Speaking of something that's related,
image generation. Now, this works much
better. I would put image generation in
the S tier. With image generation, you
describe an image and the model can
create it. You can use it for
thumbnails, marketing assets, product
mockups, and concept art. This can also
include inpainting, outpainting,
basically editing. This is Disney image
and even generating a new image variant
based on an image that you're passing to
it. Now I'll focus on text to image
generation because that's where most
people start. The model you can use here
is Flux 2D which you can run through com
UI on something like my 5090. This is a
very comfortable use case and it's very
easy for me to generate an image in just
a couple of seconds. Unlike video
generation, this means that you can
generate hundreds of variants in, you
know, a short amount of time, which
increases the chances that you actually
get an image that you're happy with. And
in fact, Flux is a really good model
because in some blind tests, it achieved
a 71% win rate over some of the older
midjourney versions for editorial photo
realism. So, the models are really
getting much better. And with image
generation, I've also found that there's
a really great community where folks
have fine-tuned a lot of models for
custom characters and styles. And
there's even a lot of image generation
models that have less content filters
than the one in the cloud. It's good
that cloud has content filters,
especially for copyright, but sometimes
they go a little bit too far and they
become a bit unusable to be honest. And
actually training a custom Laura model
or actually fine-tuning one only takes
15 to 20 images. And you can do that on
consumer hardware as well. Now, one
thing that I've learned from using AI
image generation for my own content is
that iterative editing is where it gets
a bit tricky. Image properties like
gradients and complex textures make it
very difficult for those local models to
edit images without losing quality.
Personally, this is still where I use
the latest NA banana model or just use
Photoshop to just make sure that I can
use some of the cloud features. So, even
though it's S tier, there are some use
cases where it's going to fall behind a
little bit, but it is really getting
better every single month. A completely
different but fun use case is voice
agents. Now, this one is something that
I'm going to place in C tier because of
the wide variety of use cases which can
work in a variety of ways. Basically,
the idea behind voice agents is that you
can talk to your computer and it will
talk back like a local Alexa or customer
support bot that runs on your hardware.
But ideally, it should also be able to,
you know, do actions for you. For
example, you can use it to do hands-free
home automation. The best open source
option right now would be something like
pipecat which can chain together a
speechto text model an lm and then a
texttospech model into one single
pipeline and that will allow you to get
something like sub 800 millisecond voice
to voice latency on some standard
hardware on something like Mac OS. The
problem is that the model size
constraint means that the AS responses
locally are noticeably dumber than you
can get from the best local models that
you interact with over chat and of
course the best cloud models. In fact,
this is even an issue with cloud voice
agents. They can achieve 200 to 500
millisecond latency, but their
intelligence is more around GBT40 level
and not the latest models that are out
there. And of course, you know, this
intelligence gap is much higher for
local voice agents. That being said, if
you just have a singular command that
you want to run, voice agents can work
pretty well. The main issue where they
lack quality is in longer conversations
because with local models, you're just
much more constrained on your context
window, which means that as you have a
long conversation with a voice agent, it
will simply go off track and you don't
have that issue if your voice agent is
just there to process one command and
execute it. So that would be the
recommended use case I would say. Now,
one component of voice agents is a text
to speech model and you can use text to
speech on its own. And on its own, this
is definitely a tier. With text to
speech, you feed text in and you get
natural sounding audio out. You can use
this for audiobook narration, voice over
for video, accessibility features in
your app, or even trying to clone your
own voice for content, which I've tried,
but that one is a bit ambitious. It
doesn't work very well. Now, this
category also corporates music
generation and sound effects because
it's not really my specialty. In general
though, voice synthesis is where local
models have made the biggest gains. Now,
text to speech has had the most dramatic
transformation in my opinion of any
local AI category in the past 18 months
because I would say that it's almost a
solved problem at this point, especially
for the English language. The model that
I would start with is Chatterbox from
resemble AI because it beats 11 laps in
a couple of blind tests with over 60%
listener preference rates. And the base
model might be English only, but you can
even go to chatterbox multilingual which
covers 23 plus languages. And so the gap
versus a cloud offering like 11 labs has
closed for a lot of use cases. And in
some cases the gap is gone entirely.
There are some gotchas. A lot of models
will hallucinate or really degrade in
quality past about a thousand
characters. But there's a lot of ways to
solve that. You know, you can split up
your text in multiple batches and just
process them one by one. Now, if you're
finding this tier list useful, make sure
to hit subscribe because over 90% of you
are missing out on the latest AI
engineering news that's based on reality
and not hype. So, make sure to join the
club. Next, we're going to be talking
about speech to text, which is another
component of voice agents that you can
extract on its own, and it's another
solved problem in my opinion. Speech to
text is absolutely S tier. You record
audio or you pass an audio file, you can
get a text transcript back for meeting
notes, podcast transcription, subtitle
generation, or just turning your voice
memos into searchable text. I use this
constantly myself for processing my own
YouTube video content. And for English,
it works very well. A model you can use
is faster whisper with large V3 Turbo,
which gives you over four times speed
over the original Whisper model. And the
real workflow that I use is a two-stage
pipeline. I use whisper to get a fast
accurate raw transcription and then I
use local LLM to clean up filler words
and extract the core meaning and this
way I can store the notes into something
like my Obsidian vault to reference
later. The transcription itself is
almost instant but the LM cleanup step
does add some time. The thing is though
I can just run it in the background so
that's not an issue at all for me. The
gap of cloud models is mainly with
speaker diorization, which means that
you can split up some audio context into
the different speakers. So let's say you
have a meeting with 10 people, you want
to be able to know who's speaking which
sentences, right? Cloud models are a
little bit better at doing that, but I
fully believe that local models will
catch up very quickly. The core use case
of transcribing audio just works very
well nowadays with local models. Next,
we're going to be talking about OCR,
optical character recognition, which I'm
going to comfortably place as our first
B tier. OCR covers table extraction,
formula recognition, and just converting
scan documents into structured data,
which is useful for so many use cases.
One tool to start with is Sura, or it
can use a deepse OCR model that has been
released recently. Now, a lot of the use
cases are a little bit more on the
boring side, like being able to transfer
and process invoices, but it does work
pretty well. Next, we're going to be
talking about agentic coding, which is
much more interesting than just doing
code autocomplete and seems to be hyped
up more and more for local models. Now,
the one disappointing thing about agent
coding is that it doesn't work that well
unless you have very good hardware. A
lot of YouTube videos, they show Agenta
coding and then they show how it can
generate Python code and that's good and
all, but with Agenta coding, you really
want a model to be competent enough to
read your entire codebase, write code,
run tests, and iterate until your
feature works, the feature that you
asked for with text. The issue is though
that most absolutely most hardware is
too incompetent for genting. Now, I've
created many videos on my channel
covering how performant agent coding is
on my 5090, and it's definitely getting
closer, but it is nowhere near as useful
as using code autocomplete, nor
something like an AI chat, because
eventually local AI agents choke on
larger projects. That being said, this
is something that improves every single
month. So definitely check out the
videos that I have in my channel which
I'll put in the link in the description
below to get the most out of your local
setup because it's very difficult to set
up properly and you really need a bit of
an expert guide to do this right with
aentic coding you might be able to match
something like GBT40 locally but Claude
Opus 4.6 and similar models have really
up the game here and there is just a
huge gap between what you can do locally
versus with the state-of-the-art cloud
models. Even if your local model is
intelligent enough, the problem is that
it will get slow as the context window
fills up. And unlike with these other
use cases where you can clear out the
context window after you improve small
bits of the task, with a coding, you
cannot clear the context window with
every step. You need the coding model to
have a good idea of how your codebase
works. So in the end, these models are
going to work very slowly on your local
hardware, especially compared to cloud
models, unless you have a very beefy PC.
Most people promoting local models with
Agenta coding are not using it for
serious projects. And if you don't
believe me, check out some of my master
classes on this channel to learn the
truth. If you do want to get started
with a coding, I recommend some of the
latest Quen models. But again, I cover
that extensively in the videos on my
channel already, so go check those out.
As models get stronger, I expect it to
move to A or S tier, especially as local
models will catch up as well. You know
what isn't A tier yet, though? AI chats.
A traditional use case. With AI chats,
you can ask questions, brainstorm ideas,
summarize documents, draft emails, the
same stuff you would use something like
chat GPT for, but it can run entirely on
your machine with no data leaving your
network. I would recommend new models
like something like Quen 330B through LM
Studio. And this is a pretty optimized
model that can run pretty fast, but
honestly, there's many choices for AI
chat models. You can use a newer Mistral
model or even check out the older but
still competent Open- source OpenAI
model that's out there. In any case,
locally you can really mimic what GPT40
was able to do a year plus ago and now
fully locally. You can use it for so
many different purposes that it's just a
great use case altogether and a very
beginner friendly one because everyone
understands the concept of an AI chat,
right? The great part too is that you
have many choices for the right user
interface here. You can use something
like LM Studio which is a desktop app
that already has a chat UI or even
create your own web app and customize it
to your liking. There are so many open
source repos and again check out the
link in the description if you want to
make a good start on that because I've
got many templates for you to customize.
A specific subset of AI chats is rack
retrieval augmented generation which is
a very fancy word that some of you
probably already know about which is the
fact that you can point an AI at your
own files, company documents, research
papers or notes and it will answer
questions based on that content. It's
pretty similar to AI chat but the thing
is is that generally it's a bit more
expensive and complex to set up. You
might have to put your documents into a
vector database and you have to make
sure that those documents are retrieved
properly to make sure that the AI chats
can answer questions about them. But if
you do it right, well, this can work
very well for your own use cases and to
make sure that these local models are up
to date with the latest knowledge in
your use case, which is not in the
training data of the model, especially
because some of these open source models
have been trained half a year or a year
ago. So, rag is a very important
paradigm and I'm putting it in beats
here because of the technical complexity
to set it up, but it is kind of a
requirement for a lot of real AI chat
use cases. So, it's definitely something
you want to learn to set up yourself.
One tool you can start with is something
like open web UI, which gives you a full
rag pipeline out of the box with a clean
interface. And the nice part about this
is that because it's all running
locally, data privacy is the consistent
top reason enterprises will use these
kinds of self-hosted LLMs. So if you
learn these skills, you can definitely
get an AI engineering job with that as
well. So we have talked about some AI
use cases now like a chat, a rack chat,
we have agentic coding, code
autocomplete. But what about the ability
to create any AI agent of your dream
where I'm defining an agent as a system
that can autonomously make decisions,
execute actual actions for you, and be
able to solve a problem in many
different ways. Well, the thing is a
true AI agent is very difficult to run
locally. I'm going to put it in Ctier,
but I wanted to make sure I explained
this to you because you've probably seen
many videos here on YouTube talking
about AI agents. The problem is though
that most of those videos are not real
agents. They're more deterministic
workflows on something like any 10 with
a small LM component in the middle that
might make one or two decisions. But a
true AI agent that can run like cloud
code simply requires a very good
language model or else it will get
confused about all the tools it has
access to. it will not be able to run
autonomously without you pushing it into
the right direction and you will just
find that there are many issues with it
in general. That being said, I'm not
saying that AI agents are incapable to
work locally. It just depends on your
use case and most people that I've
talked to who want to build their first
AI agent are a little bit too ambitious.
They might want to build a research
agent that works better than something
you can use with OpenAI GPT Pro. But to
be quite honest, it's very difficult to
build a better AI agent than using a
platform that you can access over the
cloud. Again, something like cloud code
because even with cloud code, you can
point it to a local model, but it just
won't perform in the same autonomous
way. That being said, AI agents in
particular change all the time, and I
expect that as models get better, this
will move up to the B tier and the A
tier eventually. But even then, AI
agents simply won't run on weak hardware
because of literal mathematical
constraints that I've covered in other
videos on my channel. So depending on
your use case, this is just not going to
work very well. But if you think that
that's not true and you have had good
experiences creating local AI agents, I
would love to hear it from you in the
comments down below. But please tell me
what use case you have because most
people trying to run local AI agents
aren't really running through agents.
They're just running regular workflows
with LLMs sprinkled somewhere in
between. But how about a more simple AI
assistant that you can run locally like,
you know, OpenClaw, an always on
personal AI that manages your calendar,
tries your email, summarizes your day,
and handles whatever you throw at it for
personal use cases. The local version of
Siri or Google Assistant, but running
private and 24/7 on your own hardware.
So tools like OpenClaw aim to be exactly
this and sound great on paper. The issue
is though that setting them up properly
in a secure way that makes sure that you
actually keep your account secure is
pretty difficult. It does require a
little bit of security knowledge.
Because of that complexity, I would
right now put AI assistance into the
beats here because it can work quite
well if you are very aware of the
security problems with something like
OpenClaw. Now, personally, I would still
use OpenClaw with a state-of-the-art
cloud model because they're much better
protected against things like
jailbreaks. But there may be one
exception where I would use local
models, which is for well-defined cron
jobs, which are things that run on a
schedule like every 8 hours. This might
be something where you are summarizing
your feed into a daily digest or just
classifying your incoming emails. For
something like this, a local 14 billion
model works just fine. And yes, you can
use a local model with that. So far, we
have a lot of use cases and they all are
pretty competent, right? Even the ones
in Ctier, they're very usable and
they're getting better every single day.
But then what is the Dtier for? Well, to
be honest, that's for vibe coding. Vibe
coding with local models just doesn't
work. The idea with vibe coding is that
you just describe an app in plain
English and the AI will build the whole
thing and you never read the code. You
just judge whether the result works. And
this is different from agentic coding
because you are not explicitly reviewing
or steering anything. Now, honestly,
this workflow needs a frontier model to
cover for the fact that you're not
reviewing at all what it writes. A small
language model can code quite well with
the right guidance in a couple of files.
And for models under 14 billion
parameters, you cannot even use tool
calling properly, which is a requirement
for proper agenda coding, let alone vibe
coding where you're not guiding the
model at all. Vive coding with cloud
models already has serious problems.
Researchers found security
vulnerabilities in one out of 10 lovable
generated apps, for example. But with
weaker local models, those problems
multiply. So, let's recap this tier list
a little bit. The three S tier use cases
generally match or sometimes even beat
cloud models. Code autocomplete, image
generation, and speech to text. The
pattern here is that some of the more
boring use cases consistently outperform
the hyped ones for local models.
However, as models get better, some of
the more complex workflows like AI
agents as well as voice agents will just
get better over time and hopefully
everything will be in the B tier or
above in a couple of years from now. And
if you want to get started with local AI
projects, you should check out the link
in the description below and get started
Ask follow-up questions or revisit key timestamps.
The video ranks popular local AI use cases based on engineering experience, recommending specific models or tools to get started. S-tier applications include code autocomplete, image generation, and speech-to-text, which often match or surpass cloud models, especially in terms of speed and privacy. A-tier use cases cover photo enhancement, home automation, and text-to-speech, offering strong local performance and benefits. B-tier contains OCR, AI chats (mimicking GPT-40 from a year ago), Retrieval Augmented Generation (RAG), and AI assistants, which are competent but might involve more complexity or security considerations. C-tier includes video generation, voice agents, and true AI agents, which currently face challenges with quality, time, model intelligence, or hardware requirements locally. Finally, D-tier is reserved for 'vibe coding,' which is deemed unfeasible with current local models due to a lack of guidance and inherent security risks. The speaker emphasizes that "boring" use cases often outperform hyped ones for local models, but complex workflows are continuously improving.
Videos recently processed by our community