the steps I use to solve any Linux issue
582 segments
Yo, what's up? I'm going to give you
three steps to follow to troubleshoot,
diagnose, and solve any particular Linux
issue that you are encountering. I'm
also going to go over the most common
issues that I have personally
encountered and how I would approach
troubleshooting those particular issues.
My goal with this video, my my approach,
if you want to call it that, is
essentially narrowing down the scope of
the problem until we figure out what the
actual problem is. Whether that's going
to be, you know, a particular program or
a particular configuration or whatever
else. Because my logic is once you
actually know what the problem is, it's
generally pretty trivial to solve
because chances are somebody else has
already encountered that problem before
you and they've probably posted their
solution on the internet for anybody to
go find. So once you actually have that
problem figured out, that's going to be
the the majority of the headache of
troubleshooting. It's actually figuring
out what is the source of the problem.
Once you know the source of the problem,
chances are you can solve it from there
just by looking through a wiki or a
forum or whatever else. So anyways, this
is going to be a pretty long video. I
will put timestamps on as much as
possible and I'm also going to bother
with editing the video for once. The
first step is identifying the precise
problem that you are experiencing. So no
generalized, oh my audio doesn't work or
my display doesn't turn on. No, you have
to figure out what exactly is going
wrong. And I'm going to show you a
couple examples to illustrate how you
want to be doing this. Um, to start off,
there's an example on the Arch forums. I
was just looking through the recently
posted threads to find a decent example
of somebody giving a detailed
explanation of the problems they are
actually experiencing. So, this user had
a bunch of strange problems happening
with their system that didn't
necessarily seem to be correlated at
first at least. Um, so they had, you
know, stuff refusing to launch, but I
wanted to point out a couple different
things that they did really well in
terms of analyzing the problems going
wrong with their system. Um, so for
example, when they were using kill all
Thunar and then they were relaunching
Thunar, it works but only one time,
which is really important to note
something like this. It works but only
one time or anything else that involves
a logical order of causation. So for
example, this thing happens, then this
next thing happens, and only then does
this other thing not work because that's
going to point to, okay, either
something is impacting all three of
those things or the first two are
impacting the third. Um, this just gets
into the logic of how these things might
potentially work. So it's very important
to as you are going through to try to
diagnose your issue, note down what is
happening and in what order. So that's
the first thing that this user did
really well. Um, something else that I
think is really important to note is
that they actually tested it on their
laptop. Um, which is really useful to
try to test the same thing if you can on
another system or if you don't have
another system to test on, you can
always just use a fresh user account on
your current system. Um, you could
actually plug in just a live USB drive
of any distribution to go ahead and test
on that. or you can back up your current
configuration, blank out your
configuration files and retest just on
your current user account if that's
applicable to your situation. Um, for
this person, it turns out the issue is
just the latest uh version of their
window manager had a bug with it and
they downgraded and it was fine. But
anyways, I think they did a really good
job searching through and figuring out,
okay, what are all of the various things
going wrong here and making a careful
list of what was going wrong. I also
want to talk about error messages a
little bit since I think a lot of us
have a tendency to kind of uh shut off
our brain when there is some huge error
message you know just getting spit out.
Um me included I often will kind of skim
through an error message before I
actually try to go carefully read it.
And I think it is important to really
carefully read what the error message is
telling you because a lot of the time
it's going to tell you exactly what went
wrong. So I did a few commands here and
screenshotted the error messages. I I
specifically had to try to break some
stuff since I didn't have anything like
broken at the time. So I, you know, I
had to I had to break some stuff to get
some screenshots. Anyways, so uh for
example, the classic if you run two
Pac-Man instances at once, well, you
can't run it twice at the same time. So
the database lock is present, it's going
to give you the path of that database
lock, and then it's just going to tell
you, okay, there's another Pac-Man
instance running. So imagine if you had
this error message, but you know, maybe
there's 50 lines of text before and
after it that are telling you all sorts
of various things. Okay, that's when it
starts to get, you know, a little bit
confusing to kind of sift through all of
that text. But if you can look for a
line that either gives you a file path
to something relevant in the error or it
tells you, okay, there could be this
particular thing going wrong, that is
what you want to be looking for. Um, the
same thing with these other two error
messages. So, for example, trying to run
MPD when I already have MPD running,
well, it's going to fail to bind the
socket because the address is already in
use. MPD is already running on that
address. Um, same thing with trying to
start X um when I'm, you know, currently
in an X server. Okay, it it can't
connect to the X server because there is
one already running. And that is because
um this is in a graphical terminal here.
Um I'm not on the, you know, TTY
console. So, it tells me only console
users are allowed to run the XS server.
Step two, you want to gather and note
down any relevant context or
information. And in my opinion, one of
the first things that you want to start
with, assuming it is relevant to your
problem, is diagnosing whether this is
potentially a hardware issue, which is
incredibly important and will save you a
lot of headache if it turns out that
yes, it is a hardware issue and you
don't have to go through all of the
various system troubleshooting just to
hit a dead end of realizing, oh, this
was a hardware issue. Um, and I want to
give you an example of this, which u is
a couple weeks ago. I had a drive I was
working with and I kept getting a ton of
readr errors. Like I would plug it in, I
would try to transfer some files,
halfway through I start getting
readwrite errors and I'm like what is
going on? You know, I couldn't figure
out, okay, is my drive the issue? Do I
have something wrong with my system here
that I don't know about? It turns out it
was the cable. And by figuring out that
it was the cable uh within like 20
minutes or something, I saved myself
potentially hours of trying to diagnose
some problem that didn't even exist
because it turns out, okay, it was just
hardware. It was just the cable having
issues. So that's the first thing you
want to start with when you're gathering
your your context and your information,
assuming that is actually relevant. Of
course, if it is some very clear issue
with a particular program, then yes,
it's it's unlikely to be your hardware.
I don't want to say it's never going to
have anything to do to do with your
hardware, but if it's a particular
program having issues, potentially
unlikely and is probably related to the
software. So the next thing that you
want to ask yourself is, has this ever
worked in the past? And if it has, have
you changed any relevant configuration
files? Have you accidentally or
intentionally modified system files that
could impact this particular program?
Have you installed any new hardware or
software that could somehow be impacting
this? And you notice that as I say, you
know, this could be impacting. You might
have done this. Um, I'm using that
wording because a lot of the time you
might modify something that
unintentionally impacts a bunch of other
stuff. And I've had this experience a
lot where I will be trying to do one
particular thing and then that
unfortunately has unintended
consequences elsewhere and I managed to
mess up something that would seemingly
be unrelated because I modified one
thing that somehow impacted a bunch of
other things. So it's important to keep
in mind, okay, even if you modify one
thing that you think is unrelated to
everything else, it could still be
related. Um, have you experienced
correlated issues? That's essentially
what I was just discussing. or have you
performed any system updates that may
have impacted what you're trying to
figure out here? Whatever is going
wrong, if you've performed any updates
that have impacted packages that could
be related to that, well, you might want
to check your update logs and see if
anything relevant was there. Because one
of the early on steps you might want to
do is just downgrade any relevant
packages. See if that fixes the issue.
And if so, figure out, okay, is this a
bug with the package? Did I, you know,
somehow perform a partial upgrade?
figure out if updates is relevant in
your particular situation.
You also want to check okay is this a
systemwide issue or is this a user
specific issue. Um this is something
pretty easy to test. You can either
create a new user account and test it on
that user account or you could carefully
go on your root account and test it
there. However, do keep in mind
obviously the root account is
essentially the admin account. So that's
going to have all sorts of other various
permissions um that could impact, you
know, whether something does or doesn't
work. For example, it might work on root
but not work on your user account. So in
general, it's better to just test on a
brand new fresh user account. You could
always test if you have the option on
alternate hardware. If you have, you
know, a separate system or a laptop that
you could test on, that's generally
useful, especially if you're hitting
dead ends on your current system. And I
also want to stress you only want to
change one thing at a time. You don't
want to be changing a bunch of different
things at once because two two things
with that. Okay, first of all, say you
actually do find something that fixes
your problem. Well, you won't know what
it was if you just changed like five
things and one of those other five
things could have impacted something you
didn't want to impact. or if you're just
changing things randomly and you're just
changing so many different things you
don't even know what you're doing. Well,
you could be doing all sorts of damage
to your system. So, carefully change one
thing at a time and note down all of the
changes you're making and really take
notes on this whole process as much as
possible if you're trying to
troubleshoot something super complex.
Step number three, now that you have
narrowed down hopefully where your issue
is actually happening, check any
pertinent logs and configuration, either
program logs for relevant programs,
relevant configuration files, either
user configuration or global program
configuration. And if you're on a
distribution that uses systemd, which is
most major distributions out of the box,
check with journalctl. Um, learn to use
journal properly. Um, and the ArchWiki
as usual has a great article about it if
you want to learn how to read through
journal Ctl logs and you know follow
through understand exactly what it is
telling you. As usual, the Arch Wiki
does a really good job of explaining how
to use the systemd related commands. I
also want to mention you should just
research as much as possible um as
you're trying to solve your issue and
also to figure out what the right
solution should be because you want to
find a solution, not just a workaround.
Um, a workaround can work in a pinch if
you need to solve something so that you
can like get on a work call in two
hours. Well, yes, a workaround uh better
do it for now, but uh longterm you do
want to be finding real solutions rather
than just, you know, duct tape
workarounds. Um, of course, you know, I
I say this, um, as I currently have my
microphone sitting on a camera battery
because my microphone stand is broken.
So, um, anyways, find solutions, not
workarounds, guys. Uh but you should uh
search through wikis, search through
forums and even search engines. You can
actually search through a search engine
and filter out particular sites or exact
matches to your search terms. So it is
worth uh learning how to use search
engines really well. Um especially in
the age of a lot of search engines being
really trash and flooding everything
with AI results, which of course is
super annoying. But you can still use
search engines if you filter out exact
matches, you search on specific sites,
etc. And of course, I did want to point
out you can search the Arch forums and
you can search through for keywords and
in the particular uh forum area that you
want to search in. Um I'm highlighting
the Arch forums here since I'm
personally using Arch, but of course
search on whatever forums are relevant
to either your distribution or your
program or whether it is in GitHub
issues that you need to be searching or
even on just some other unrelated forum
where people are offering assistance. Um
I also wanted to talk about later on.
I'm going to get to this. I I kind of
want to turn this comment section if we
can into like a little bit of a help
exchange, you know, back and forth, but
um I'm going to get to talking about
that a little bit later. Anyways, so if
all else fails, make a post on forums
and chances are if you have done all of
the work on your end to figure out
exactly what is going wrong and you know
where the problem is happening, somebody
will likely be inclined to help you.
Let's talk through a bunch of the common
issues and solutions to those issues
that I have encountered. I want to start
with a file that I made a while ago to
go with a video essentially going
through Arch related problems and
solutions. But the thing is a lot of
these apply to much more than just Arch,
especially stuff like booting. Like
booting issues, for example, you want to
use Super Grub to get back into your
system. What is Super Grub? It is an ISO
that you can slap onto a USB disc and it
is essentially going to detect any
potential boot entries and allow you to
get back into your system in case of
some serious issues where you're just
unable to get in. So, Super Grub is what
you want to go get. I will link that in
the description. Um, I also have a bunch
of other boot troubleshooting steps in
this guide. So, I'm just going to link
this guide directly. Um, and I'm also
going to link the video where I go over
this guide in a lot more depth. Um, I
wanted to point out uh another thing in
this guide, which is Pac-Man and package
issues. Um, mirror related issues
specifically. If you have mirror related
issues, reync your mirrors with
reflector. And beyond that, you probably
want to resync your mirrors pretty
regularly. Um, regularly as in every few
months to every year or so. I've found
that this actually helps me avoid errors
uh with mirrors. It's possible that
maybe in my particular location, the
mirrors that I am getting to often have
errors. So, I'm not going to say that's
a hard rule, but I have found that if I
reync my mirror list every so often, you
know, every few months when I remember
to, um, that is generally pretty helpful
for avoiding mirror related errors.
Anyways, I will link this uh file in the
description as well as my video where I
went over it in depth. Um, just because
I'm not going to waste your time and
remake a video on something that I've
already made that chances are a bunch of
you have already seen already. Anyways,
let's go to some other errors that are
not in that list. So, if you have lots
of IO errors, um, as I said earlier, you
want to check your cable. That's
probably the first thing that you want
to check if you're having weird errors
with any piece of hardware. So, for
example, like your keyboard is like
double typing or your mouse is moving
weirdly or say, you know, for example,
my webcam here, how it was having this
issue where it just kept shutting off.
Um, it's actually a camera plugged in
with a cable, but anyways, it was the
cable that was the issue. So, um, always
check your cables. Um, moral of the
story here, but if you are having a
bunch of weird IO errors on a drive and
it's not the cable, then run FSDK on the
drive and diagnose if you have any
issues with your drive. Because if your
drive is failing, you want to make sure
you back up all of that data as much as
you can and then you get a new drive so
you don't run into a situation where
you're losing data. If a file system is
remounted as readon by the kernel, you
don't want to immediately remount that
as readr. you want to actually see why
the kernel did that because the kernel
is generally trying to protect your
drive if it's going to remount it as
readon. So it is to pre prevent damage
on your drive. Um you want to run fsdk
as I said to find errors but uh you want
to make sure okay is this a filly drive
or do you just have some random bad
sectors after you had you know some sort
of a crash or power loss. So, it's
important to distinguish between the two
since obviously if you have a failing
drive, well then you need to get a new
drive. But if you just have some bad
sectors after a power loss, you can
generally repair those. It's also
important to check if you're having
mount issues overall. Is your Etsy/ FS
tab file still correct? Do you need to
be updating this file to um regenerate
your FS tab entries? If you have a
generally slow system for no apparent
reason, you should see if your temp
directory is full. Um, I encountered
this a couple times. The last time I was
working on a huge video editing project,
my temp directory kept getting full and
I kept having, you know, a super slow
system and I was like, "What what is
going on here?" Well, it turns out the
temp directory was full. So, um, if you
have a really slow system and you have
no clue what is possibly going wrong
because there's no, you know, rogue
programs or anything like that, then
check to see if your temp directory is
full. Um, if you have a library mismatch
error or a library not found error as
you are trying to run a program, uh,
something I should mention by the way,
if you are trying to run a graphical
program, it doesn't launch, run it
directly from the command line because
that will show you if there are any
errors as it is trying to launch.
Anyways, if you have a library mismatch
or the library isn't found, you want to
reinstall both the package and the
relevant library and that generally will
fix it. Um, I've generally found
sometimes even when um I reinstall
something or sorry not when I reinstall
when I update something but I haven't
reinstalled another program that uses
that library, I have to then reinstall
that other program because the library
was updated. So just make sure you have
everything updated and everything
reinstalled if you're getting any sort
of weird version mismatch errors. If
you're getting uh permission denied
errors, you probably know how to solve
those. you either want to change the
ownership of the file or you want to run
the command as root. Um, of course, be
careful when you're running commands as
root. Only run things that you trust as
root. But if you're having weird
permission related errors, especially if
you're trying to do something uh when it
comes to hardware. So an example with my
camera again, um I have to run the
command to mount my camera with pseudo.
It it has to be done with root. So
sometimes you will have to do things as
root. So if you're having all sorts of
permission denied errors, just try
running it as root. Just make sure you
trust the program that you're running.
If you have high CPU or memory usage,
um, check and see if you have any rogue
processes or a memory leak somewhere.
Um, I was actually working on a bash
script a while ago that I had
accidentally left something in that was
just launching process after process
after process and my memory was just
going up and up and up and I was like,
what is going wrong? Well, it turned out
I had like a thousand processes running
that I had launched from that shell
script that I was working on. So, um, be
careful you don't do that. But overall,
if you do have some weirdly high CPU or
memory usage, um, check all of the
various Linux commands to figure out,
okay, what is going on with my
processes? What is using high memory?
What is using high CPU? And then kill
the process as needed using the relevant
kill signal. So, if you really need to
kill a process that is not responding,
use kill signal 9. Otherwise, look
through the various available kill
signals and figure out what is the best
matched kill signal for your use case.
Okay, so as I was mentioning, it would
be really cool to try to turn this
particular comment section into kind of
an exchange to help each other learn to
troubleshoot. Um, not necessarily for
very particular issues. Obviously,
different distributions and different
programs have dedicated forums to go to
for help. But if anybody is just trying
to in general learn how to troubleshoot
better on Linux, those of us who are
using Linux or just in general whatever
operating system, I think the
troubleshooting concepts are really the
same across the board. If anybody has
troubleshooting questions, I would
really encourage those of you who know a
lot about Linux, which I'm sure is many
of you. I'm sure a lot of you probably
know a lot more about Linux than I do.
So, I would encourage anyone who already
has that knowledge and is already really
experienced with troubleshooting to help
anyone with questions who has less
knowledge and is trying to learn how to
do stuff. Anyways, I hope this guide was
helpful to you and I will see you next
time. Peace.
Ask follow-up questions or revisit key timestamps.
This video provides a three-step approach to troubleshooting, diagnosing, and solving Linux issues. The core strategy is to narrow down the scope of the problem until its source is identified, making it easier to find a solution. The video details each step: 1. Precisely identify the problem, noting specific symptoms and their order. 2. Gather relevant information, considering hardware issues, recent changes (software/hardware/configuration), system updates, and differentiating between system-wide and user-specific problems. It emphasizes changing only one thing at a time and taking meticulous notes. 3. Check pertinent logs and configurations, including program logs, systemd journalctl (if applicable), and researching solutions thoroughly to find fixes rather than workarounds. The video also covers common issues like boot problems, package manager errors, I/O errors, file system remounts, slow system performance, library mismatches, permission errors, and high CPU/memory usage, offering specific troubleshooting steps for each. Finally, it encourages the community to use the comment section as a help exchange for learning troubleshooting skills.
Videos recently processed by our community