HomeVideos

the steps I use to solve any Linux issue

Now Playing

the steps I use to solve any Linux issue

Transcript

582 segments

0:00

Yo, what's up? I'm going to give you

0:02

three steps to follow to troubleshoot,

0:04

diagnose, and solve any particular Linux

0:06

issue that you are encountering. I'm

0:08

also going to go over the most common

0:10

issues that I have personally

0:12

encountered and how I would approach

0:13

troubleshooting those particular issues.

0:16

My goal with this video, my my approach,

0:18

if you want to call it that, is

0:19

essentially narrowing down the scope of

0:22

the problem until we figure out what the

0:24

actual problem is. Whether that's going

0:26

to be, you know, a particular program or

0:28

a particular configuration or whatever

0:30

else. Because my logic is once you

0:33

actually know what the problem is, it's

0:35

generally pretty trivial to solve

0:37

because chances are somebody else has

0:39

already encountered that problem before

0:41

you and they've probably posted their

0:43

solution on the internet for anybody to

0:44

go find. So once you actually have that

0:47

problem figured out, that's going to be

0:49

the the majority of the headache of

0:51

troubleshooting. It's actually figuring

0:53

out what is the source of the problem.

0:55

Once you know the source of the problem,

0:56

chances are you can solve it from there

0:59

just by looking through a wiki or a

1:01

forum or whatever else. So anyways, this

1:03

is going to be a pretty long video. I

1:05

will put timestamps on as much as

1:07

possible and I'm also going to bother

1:09

with editing the video for once. The

1:10

first step is identifying the precise

1:13

problem that you are experiencing. So no

1:15

generalized, oh my audio doesn't work or

1:18

my display doesn't turn on. No, you have

1:20

to figure out what exactly is going

1:22

wrong. And I'm going to show you a

1:24

couple examples to illustrate how you

1:26

want to be doing this. Um, to start off,

1:28

there's an example on the Arch forums. I

1:29

was just looking through the recently

1:31

posted threads to find a decent example

1:33

of somebody giving a detailed

1:35

explanation of the problems they are

1:37

actually experiencing. So, this user had

1:40

a bunch of strange problems happening

1:42

with their system that didn't

1:43

necessarily seem to be correlated at

1:45

first at least. Um, so they had, you

1:47

know, stuff refusing to launch, but I

1:49

wanted to point out a couple different

1:51

things that they did really well in

1:53

terms of analyzing the problems going

1:55

wrong with their system. Um, so for

1:57

example, when they were using kill all

2:00

Thunar and then they were relaunching

2:01

Thunar, it works but only one time,

2:04

which is really important to note

2:06

something like this. It works but only

2:08

one time or anything else that involves

2:11

a logical order of causation. So for

2:13

example, this thing happens, then this

2:16

next thing happens, and only then does

2:18

this other thing not work because that's

2:20

going to point to, okay, either

2:22

something is impacting all three of

2:24

those things or the first two are

2:26

impacting the third. Um, this just gets

2:29

into the logic of how these things might

2:31

potentially work. So it's very important

2:33

to as you are going through to try to

2:35

diagnose your issue, note down what is

2:37

happening and in what order. So that's

2:40

the first thing that this user did

2:41

really well. Um, something else that I

2:43

think is really important to note is

2:44

that they actually tested it on their

2:46

laptop. Um, which is really useful to

2:48

try to test the same thing if you can on

2:51

another system or if you don't have

2:53

another system to test on, you can

2:55

always just use a fresh user account on

2:57

your current system. Um, you could

2:59

actually plug in just a live USB drive

3:02

of any distribution to go ahead and test

3:04

on that. or you can back up your current

3:06

configuration, blank out your

3:08

configuration files and retest just on

3:10

your current user account if that's

3:12

applicable to your situation. Um, for

3:14

this person, it turns out the issue is

3:16

just the latest uh version of their

3:18

window manager had a bug with it and

3:20

they downgraded and it was fine. But

3:22

anyways, I think they did a really good

3:24

job searching through and figuring out,

3:25

okay, what are all of the various things

3:27

going wrong here and making a careful

3:30

list of what was going wrong. I also

3:32

want to talk about error messages a

3:34

little bit since I think a lot of us

3:35

have a tendency to kind of uh shut off

3:37

our brain when there is some huge error

3:39

message you know just getting spit out.

3:41

Um me included I often will kind of skim

3:43

through an error message before I

3:45

actually try to go carefully read it.

3:47

And I think it is important to really

3:48

carefully read what the error message is

3:50

telling you because a lot of the time

3:52

it's going to tell you exactly what went

3:54

wrong. So I did a few commands here and

3:57

screenshotted the error messages. I I

3:58

specifically had to try to break some

4:00

stuff since I didn't have anything like

4:02

broken at the time. So I, you know, I

4:04

had to I had to break some stuff to get

4:05

some screenshots. Anyways, so uh for

4:07

example, the classic if you run two

4:09

Pac-Man instances at once, well, you

4:11

can't run it twice at the same time. So

4:13

the database lock is present, it's going

4:15

to give you the path of that database

4:17

lock, and then it's just going to tell

4:18

you, okay, there's another Pac-Man

4:20

instance running. So imagine if you had

4:22

this error message, but you know, maybe

4:24

there's 50 lines of text before and

4:27

after it that are telling you all sorts

4:29

of various things. Okay, that's when it

4:31

starts to get, you know, a little bit

4:32

confusing to kind of sift through all of

4:34

that text. But if you can look for a

4:36

line that either gives you a file path

4:39

to something relevant in the error or it

4:41

tells you, okay, there could be this

4:43

particular thing going wrong, that is

4:46

what you want to be looking for. Um, the

4:48

same thing with these other two error

4:49

messages. So, for example, trying to run

4:51

MPD when I already have MPD running,

4:53

well, it's going to fail to bind the

4:55

socket because the address is already in

4:57

use. MPD is already running on that

4:59

address. Um, same thing with trying to

5:02

start X um when I'm, you know, currently

5:04

in an X server. Okay, it it can't

5:06

connect to the X server because there is

5:08

one already running. And that is because

5:11

um this is in a graphical terminal here.

5:13

Um I'm not on the, you know, TTY

5:15

console. So, it tells me only console

5:17

users are allowed to run the XS server.

5:20

Step two, you want to gather and note

5:22

down any relevant context or

5:24

information. And in my opinion, one of

5:26

the first things that you want to start

5:28

with, assuming it is relevant to your

5:29

problem, is diagnosing whether this is

5:32

potentially a hardware issue, which is

5:34

incredibly important and will save you a

5:36

lot of headache if it turns out that

5:38

yes, it is a hardware issue and you

5:40

don't have to go through all of the

5:42

various system troubleshooting just to

5:44

hit a dead end of realizing, oh, this

5:46

was a hardware issue. Um, and I want to

5:47

give you an example of this, which u is

5:49

a couple weeks ago. I had a drive I was

5:51

working with and I kept getting a ton of

5:53

readr errors. Like I would plug it in, I

5:56

would try to transfer some files,

5:57

halfway through I start getting

5:58

readwrite errors and I'm like what is

6:00

going on? You know, I couldn't figure

6:01

out, okay, is my drive the issue? Do I

6:04

have something wrong with my system here

6:05

that I don't know about? It turns out it

6:07

was the cable. And by figuring out that

6:09

it was the cable uh within like 20

6:11

minutes or something, I saved myself

6:13

potentially hours of trying to diagnose

6:15

some problem that didn't even exist

6:17

because it turns out, okay, it was just

6:19

hardware. It was just the cable having

6:21

issues. So that's the first thing you

6:23

want to start with when you're gathering

6:25

your your context and your information,

6:27

assuming that is actually relevant. Of

6:29

course, if it is some very clear issue

6:31

with a particular program, then yes,

6:33

it's it's unlikely to be your hardware.

6:35

I don't want to say it's never going to

6:36

have anything to do to do with your

6:38

hardware, but if it's a particular

6:40

program having issues, potentially

6:42

unlikely and is probably related to the

6:44

software. So the next thing that you

6:46

want to ask yourself is, has this ever

6:48

worked in the past? And if it has, have

6:50

you changed any relevant configuration

6:52

files? Have you accidentally or

6:55

intentionally modified system files that

6:57

could impact this particular program?

7:00

Have you installed any new hardware or

7:02

software that could somehow be impacting

7:04

this? And you notice that as I say, you

7:07

know, this could be impacting. You might

7:09

have done this. Um, I'm using that

7:11

wording because a lot of the time you

7:13

might modify something that

7:14

unintentionally impacts a bunch of other

7:16

stuff. And I've had this experience a

7:18

lot where I will be trying to do one

7:20

particular thing and then that

7:22

unfortunately has unintended

7:24

consequences elsewhere and I managed to

7:26

mess up something that would seemingly

7:29

be unrelated because I modified one

7:31

thing that somehow impacted a bunch of

7:33

other things. So it's important to keep

7:34

in mind, okay, even if you modify one

7:36

thing that you think is unrelated to

7:38

everything else, it could still be

7:40

related. Um, have you experienced

7:42

correlated issues? That's essentially

7:43

what I was just discussing. or have you

7:46

performed any system updates that may

7:48

have impacted what you're trying to

7:50

figure out here? Whatever is going

7:52

wrong, if you've performed any updates

7:53

that have impacted packages that could

7:55

be related to that, well, you might want

7:57

to check your update logs and see if

7:59

anything relevant was there. Because one

8:01

of the early on steps you might want to

8:02

do is just downgrade any relevant

8:04

packages. See if that fixes the issue.

8:07

And if so, figure out, okay, is this a

8:09

bug with the package? Did I, you know,

8:11

somehow perform a partial upgrade?

8:13

figure out if updates is relevant in

8:15

your particular situation.

8:17

You also want to check okay is this a

8:20

systemwide issue or is this a user

8:22

specific issue. Um this is something

8:24

pretty easy to test. You can either

8:26

create a new user account and test it on

8:29

that user account or you could carefully

8:31

go on your root account and test it

8:33

there. However, do keep in mind

8:35

obviously the root account is

8:36

essentially the admin account. So that's

8:38

going to have all sorts of other various

8:40

permissions um that could impact, you

8:42

know, whether something does or doesn't

8:44

work. For example, it might work on root

8:46

but not work on your user account. So in

8:48

general, it's better to just test on a

8:50

brand new fresh user account. You could

8:53

always test if you have the option on

8:55

alternate hardware. If you have, you

8:57

know, a separate system or a laptop that

8:58

you could test on, that's generally

9:00

useful, especially if you're hitting

9:02

dead ends on your current system. And I

9:05

also want to stress you only want to

9:07

change one thing at a time. You don't

9:09

want to be changing a bunch of different

9:10

things at once because two two things

9:13

with that. Okay, first of all, say you

9:14

actually do find something that fixes

9:16

your problem. Well, you won't know what

9:18

it was if you just changed like five

9:20

things and one of those other five

9:22

things could have impacted something you

9:24

didn't want to impact. or if you're just

9:26

changing things randomly and you're just

9:28

changing so many different things you

9:29

don't even know what you're doing. Well,

9:31

you could be doing all sorts of damage

9:33

to your system. So, carefully change one

9:35

thing at a time and note down all of the

9:37

changes you're making and really take

9:39

notes on this whole process as much as

9:41

possible if you're trying to

9:42

troubleshoot something super complex.

9:45

Step number three, now that you have

9:47

narrowed down hopefully where your issue

9:49

is actually happening, check any

9:51

pertinent logs and configuration, either

9:54

program logs for relevant programs,

9:56

relevant configuration files, either

9:58

user configuration or global program

10:01

configuration. And if you're on a

10:03

distribution that uses systemd, which is

10:05

most major distributions out of the box,

10:07

check with journalctl. Um, learn to use

10:10

journal properly. Um, and the ArchWiki

10:13

as usual has a great article about it if

10:15

you want to learn how to read through

10:16

journal Ctl logs and you know follow

10:19

through understand exactly what it is

10:21

telling you. As usual, the Arch Wiki

10:23

does a really good job of explaining how

10:25

to use the systemd related commands. I

10:28

also want to mention you should just

10:30

research as much as possible um as

10:33

you're trying to solve your issue and

10:34

also to figure out what the right

10:36

solution should be because you want to

10:38

find a solution, not just a workaround.

10:40

Um, a workaround can work in a pinch if

10:43

you need to solve something so that you

10:44

can like get on a work call in two

10:46

hours. Well, yes, a workaround uh better

10:49

do it for now, but uh longterm you do

10:52

want to be finding real solutions rather

10:53

than just, you know, duct tape

10:55

workarounds. Um, of course, you know, I

10:57

I say this, um, as I currently have my

10:59

microphone sitting on a camera battery

11:01

because my microphone stand is broken.

11:02

So, um, anyways, find solutions, not

11:05

workarounds, guys. Uh but you should uh

11:08

search through wikis, search through

11:09

forums and even search engines. You can

11:12

actually search through a search engine

11:13

and filter out particular sites or exact

11:16

matches to your search terms. So it is

11:18

worth uh learning how to use search

11:21

engines really well. Um especially in

11:23

the age of a lot of search engines being

11:25

really trash and flooding everything

11:26

with AI results, which of course is

11:28

super annoying. But you can still use

11:30

search engines if you filter out exact

11:32

matches, you search on specific sites,

11:34

etc. And of course, I did want to point

11:36

out you can search the Arch forums and

11:38

you can search through for keywords and

11:40

in the particular uh forum area that you

11:43

want to search in. Um I'm highlighting

11:44

the Arch forums here since I'm

11:46

personally using Arch, but of course

11:48

search on whatever forums are relevant

11:50

to either your distribution or your

11:52

program or whether it is in GitHub

11:54

issues that you need to be searching or

11:56

even on just some other unrelated forum

11:58

where people are offering assistance. Um

12:00

I also wanted to talk about later on.

12:02

I'm going to get to this. I I kind of

12:03

want to turn this comment section if we

12:05

can into like a little bit of a help

12:07

exchange, you know, back and forth, but

12:09

um I'm going to get to talking about

12:10

that a little bit later. Anyways, so if

12:13

all else fails, make a post on forums

12:15

and chances are if you have done all of

12:17

the work on your end to figure out

12:19

exactly what is going wrong and you know

12:22

where the problem is happening, somebody

12:24

will likely be inclined to help you.

12:26

Let's talk through a bunch of the common

12:28

issues and solutions to those issues

12:30

that I have encountered. I want to start

12:32

with a file that I made a while ago to

12:35

go with a video essentially going

12:37

through Arch related problems and

12:38

solutions. But the thing is a lot of

12:40

these apply to much more than just Arch,

12:43

especially stuff like booting. Like

12:45

booting issues, for example, you want to

12:47

use Super Grub to get back into your

12:49

system. What is Super Grub? It is an ISO

12:51

that you can slap onto a USB disc and it

12:54

is essentially going to detect any

12:55

potential boot entries and allow you to

12:57

get back into your system in case of

13:00

some serious issues where you're just

13:02

unable to get in. So, Super Grub is what

13:04

you want to go get. I will link that in

13:06

the description. Um, I also have a bunch

13:08

of other boot troubleshooting steps in

13:10

this guide. So, I'm just going to link

13:12

this guide directly. Um, and I'm also

13:13

going to link the video where I go over

13:15

this guide in a lot more depth. Um, I

13:18

wanted to point out uh another thing in

13:20

this guide, which is Pac-Man and package

13:22

issues. Um, mirror related issues

13:24

specifically. If you have mirror related

13:26

issues, reync your mirrors with

13:28

reflector. And beyond that, you probably

13:31

want to resync your mirrors pretty

13:33

regularly. Um, regularly as in every few

13:36

months to every year or so. I've found

13:39

that this actually helps me avoid errors

13:41

uh with mirrors. It's possible that

13:43

maybe in my particular location, the

13:45

mirrors that I am getting to often have

13:46

errors. So, I'm not going to say that's

13:48

a hard rule, but I have found that if I

13:50

reync my mirror list every so often, you

13:52

know, every few months when I remember

13:54

to, um, that is generally pretty helpful

13:56

for avoiding mirror related errors.

13:59

Anyways, I will link this uh file in the

14:01

description as well as my video where I

14:02

went over it in depth. Um, just because

14:04

I'm not going to waste your time and

14:06

remake a video on something that I've

14:08

already made that chances are a bunch of

14:09

you have already seen already. Anyways,

14:11

let's go to some other errors that are

14:13

not in that list. So, if you have lots

14:16

of IO errors, um, as I said earlier, you

14:18

want to check your cable. That's

14:20

probably the first thing that you want

14:21

to check if you're having weird errors

14:23

with any piece of hardware. So, for

14:25

example, like your keyboard is like

14:27

double typing or your mouse is moving

14:29

weirdly or say, you know, for example,

14:32

my webcam here, how it was having this

14:33

issue where it just kept shutting off.

14:35

Um, it's actually a camera plugged in

14:37

with a cable, but anyways, it was the

14:39

cable that was the issue. So, um, always

14:41

check your cables. Um, moral of the

14:43

story here, but if you are having a

14:45

bunch of weird IO errors on a drive and

14:48

it's not the cable, then run FSDK on the

14:51

drive and diagnose if you have any

14:53

issues with your drive. Because if your

14:55

drive is failing, you want to make sure

14:56

you back up all of that data as much as

14:59

you can and then you get a new drive so

15:00

you don't run into a situation where

15:02

you're losing data. If a file system is

15:06

remounted as readon by the kernel, you

15:08

don't want to immediately remount that

15:10

as readr. you want to actually see why

15:12

the kernel did that because the kernel

15:14

is generally trying to protect your

15:16

drive if it's going to remount it as

15:18

readon. So it is to pre prevent damage

15:22

on your drive. Um you want to run fsdk

15:24

as I said to find errors but uh you want

15:27

to make sure okay is this a filly drive

15:29

or do you just have some random bad

15:31

sectors after you had you know some sort

15:33

of a crash or power loss. So, it's

15:36

important to distinguish between the two

15:38

since obviously if you have a failing

15:39

drive, well then you need to get a new

15:41

drive. But if you just have some bad

15:43

sectors after a power loss, you can

15:44

generally repair those. It's also

15:46

important to check if you're having

15:48

mount issues overall. Is your Etsy/ FS

15:51

tab file still correct? Do you need to

15:53

be updating this file to um regenerate

15:56

your FS tab entries? If you have a

15:59

generally slow system for no apparent

16:01

reason, you should see if your temp

16:03

directory is full. Um, I encountered

16:05

this a couple times. The last time I was

16:07

working on a huge video editing project,

16:10

my temp directory kept getting full and

16:12

I kept having, you know, a super slow

16:14

system and I was like, "What what is

16:15

going on here?" Well, it turns out the

16:17

temp directory was full. So, um, if you

16:19

have a really slow system and you have

16:20

no clue what is possibly going wrong

16:23

because there's no, you know, rogue

16:24

programs or anything like that, then

16:26

check to see if your temp directory is

16:28

full. Um, if you have a library mismatch

16:32

error or a library not found error as

16:35

you are trying to run a program, uh,

16:37

something I should mention by the way,

16:38

if you are trying to run a graphical

16:40

program, it doesn't launch, run it

16:42

directly from the command line because

16:43

that will show you if there are any

16:45

errors as it is trying to launch.

16:47

Anyways, if you have a library mismatch

16:49

or the library isn't found, you want to

16:51

reinstall both the package and the

16:53

relevant library and that generally will

16:55

fix it. Um, I've generally found

16:57

sometimes even when um I reinstall

17:00

something or sorry not when I reinstall

17:02

when I update something but I haven't

17:04

reinstalled another program that uses

17:06

that library, I have to then reinstall

17:08

that other program because the library

17:11

was updated. So just make sure you have

17:13

everything updated and everything

17:15

reinstalled if you're getting any sort

17:16

of weird version mismatch errors. If

17:19

you're getting uh permission denied

17:21

errors, you probably know how to solve

17:22

those. you either want to change the

17:24

ownership of the file or you want to run

17:27

the command as root. Um, of course, be

17:29

careful when you're running commands as

17:30

root. Only run things that you trust as

17:32

root. But if you're having weird

17:34

permission related errors, especially if

17:36

you're trying to do something uh when it

17:38

comes to hardware. So an example with my

17:40

camera again, um I have to run the

17:42

command to mount my camera with pseudo.

17:44

It it has to be done with root. So

17:46

sometimes you will have to do things as

17:48

root. So if you're having all sorts of

17:50

permission denied errors, just try

17:51

running it as root. Just make sure you

17:53

trust the program that you're running.

17:55

If you have high CPU or memory usage,

17:58

um, check and see if you have any rogue

18:00

processes or a memory leak somewhere.

18:03

Um, I was actually working on a bash

18:05

script a while ago that I had

18:06

accidentally left something in that was

18:08

just launching process after process

18:10

after process and my memory was just

18:12

going up and up and up and I was like,

18:14

what is going wrong? Well, it turned out

18:16

I had like a thousand processes running

18:18

that I had launched from that shell

18:20

script that I was working on. So, um, be

18:22

careful you don't do that. But overall,

18:24

if you do have some weirdly high CPU or

18:27

memory usage, um, check all of the

18:29

various Linux commands to figure out,

18:31

okay, what is going on with my

18:32

processes? What is using high memory?

18:34

What is using high CPU? And then kill

18:37

the process as needed using the relevant

18:39

kill signal. So, if you really need to

18:41

kill a process that is not responding,

18:43

use kill signal 9. Otherwise, look

18:45

through the various available kill

18:46

signals and figure out what is the best

18:49

matched kill signal for your use case.

18:51

Okay, so as I was mentioning, it would

18:53

be really cool to try to turn this

18:55

particular comment section into kind of

18:57

an exchange to help each other learn to

18:59

troubleshoot. Um, not necessarily for

19:01

very particular issues. Obviously,

19:04

different distributions and different

19:05

programs have dedicated forums to go to

19:08

for help. But if anybody is just trying

19:10

to in general learn how to troubleshoot

19:12

better on Linux, those of us who are

19:14

using Linux or just in general whatever

19:16

operating system, I think the

19:18

troubleshooting concepts are really the

19:20

same across the board. If anybody has

19:22

troubleshooting questions, I would

19:23

really encourage those of you who know a

19:26

lot about Linux, which I'm sure is many

19:27

of you. I'm sure a lot of you probably

19:29

know a lot more about Linux than I do.

19:31

So, I would encourage anyone who already

19:34

has that knowledge and is already really

19:35

experienced with troubleshooting to help

19:38

anyone with questions who has less

19:40

knowledge and is trying to learn how to

19:41

do stuff. Anyways, I hope this guide was

19:44

helpful to you and I will see you next

19:46

time. Peace.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

This video provides a three-step approach to troubleshooting, diagnosing, and solving Linux issues. The core strategy is to narrow down the scope of the problem until its source is identified, making it easier to find a solution. The video details each step: 1. Precisely identify the problem, noting specific symptoms and their order. 2. Gather relevant information, considering hardware issues, recent changes (software/hardware/configuration), system updates, and differentiating between system-wide and user-specific problems. It emphasizes changing only one thing at a time and taking meticulous notes. 3. Check pertinent logs and configurations, including program logs, systemd journalctl (if applicable), and researching solutions thoroughly to find fixes rather than workarounds. The video also covers common issues like boot problems, package manager errors, I/O errors, file system remounts, slow system performance, library mismatches, permission errors, and high CPU/memory usage, offering specific troubleshooting steps for each. Finally, it encourages the community to use the comment section as a help exchange for learning troubleshooting skills.

Suggested questions

10 ready-made prompts