99% of Developers Don't Get Docker

Watch on YouTube

Now Playing

Transcript

407 segments

0:00

99% of developers don't get Docker.

0:02

You've seen the meme. A developer tells

0:04

a project manager, "It works on my

0:06

machine." And the PM responds, "Great.

0:08

We'll ship your machine." Many

0:10

developers think Docker is just a way to

0:12

wrap their messy code and dependencies

0:14

in a box. But they're still wondering

0:16

why it breaks in production. You're

0:18

probably building 2 GB images for a

0:20

simple node app, hard- coding

0:22

environment variables, and treating

0:24

containers like lightweight virtual

0:26

machines. But a container isn't a

0:28

virtual machine. It doesn't need a

0:29

hypervisor or a bloated guest OS. It's a

0:32

process shared directly with the host

0:34

kernel through namespaces and control

0:36

groups. Today, we're stripping away the

0:38

blackbox mystery. We're going to do a

0:40

deep dive of how Docker engines, images,

0:42

and layers actually work so you can stop

0:45

shipping bloat and start shipping

0:46

infrastructure. Starting with the

0:48

foundations, hardware versus OS

0:50

virtualization. To understand Docker, we

0:53

have to understand the evolution of

0:54

isolation and where its predecessor, the

0:56

virtual machine, fell short. Hardware

0:58

virtualization is a technology that uses

1:01

a software layer called a hypervisor

1:03

like ESXi or KVM to simulate physical

1:06

hardware CPU memory storage and network.

1:10

Each virtual machine runs its own

1:12

independent operating system and

1:14

applications acting like an independent

1:16

computer unaware of other virtual

1:18

machines on the same hardware or

1:19

physical machine. The hypervisor sits on

1:22

the physical hardware and carves it into

1:24

multiple isolated virtual computers,

1:26

abstracting software from the underlying

1:28

infrastructure to allow virtual machines

1:30

to operate independently. To the guest

1:32

operating system sitting on top, it

1:34

thinks it is running on real bare metal

1:37

hardware. This can be full

1:38

virtualization where the guest operating

1:40

system is unaware it's virtualized or

1:43

para virtualization where the operating

1:45

system is modified to communicate with

1:47

the hypervisor for better performance.

1:49

But why were virtual machines not

1:51

enough? The first is due to resource

1:53

tax. Because every virtual machine

1:55

carries its own full kernel, you lose

1:57

significant RAM and CPU just to keep the

1:59

OS alive. Running 10 VMs means 10 copies

2:02

of the Linux kernel and 10 sets of

2:04

background drivers, wasting resources.

2:07

There's also startup latency. Booting a

2:09

full operating system takes minutes. In

2:11

a modern microservices world, waiting

2:14

minutes to scale up for a traffic spike

2:16

is unacceptable. And finally, size. A

2:18

virtual machine image is usually several

2:20

gigabytes, making it heavy to store and

2:23

slow to move across a network. Now,

2:25

let's discuss the shift to operating

2:27

system virtualization. Docker

2:29

virtualizes the operating system

2:30

instead. Virtualizing the operating

2:32

system means using containerization to

2:34

isolate applications by sharing the host

2:37

machine's OS kernel rather than

2:39

virtualizing full hardware like a

2:41

virtual machine. Containers act as

2:43

isolated user spaces, allowing multiple

2:45

lightweight applications to run

2:46

independently on one OS. Containers

2:49

create isolated user spaces or name

2:51

spaces for processes, libraries, and

2:54

dependencies, making them faster and

2:56

more resource efficient than VMs.

2:58

They're also highly portable. The

3:00

application and its dependencies are

3:01

packaged together, ensuring they run

3:03

consistently across any environment.

3:05

Docker also provides resource efficiency

3:08

because they don't require a separate

3:09

OS. Containers use significantly less

3:12

RAM and CPU compared to full hardware

3:14

virtualization. The lineage of isolation

3:17

is built on concepts that existed long

3:19

before Docker. There was chroot in 1979.

3:22

This is the oldest ancestor. It's short

3:25

for change route. It allows you to

3:27

change the apparent root directory for a

3:29

process and its children. While it

3:31

isolates the file system, effectively

3:33

creating a chroot jail where a process

3:36

cannot see files outside its assigned

3:38

directory, it is leaky. It doesn't

3:40

isolate networking users or process ids

3:43

and a root user can easily break out of

3:45

a chroot environment using a second

3:47

chroot call. And then there were free

3:49

BSD jails in 2000. This was a massive

3:52

leap forward that introduced the concept

3:54

of OS level virtualization. Jails didn't

3:57

just isolate the file system. They

3:59

partitioned the network stack giving

4:01

each jail its own IP, the user subsystem

4:04

and the process tree. Each jail has its

4:07

own root user and host name, but they

4:10

all share the same FreeBSD kernel. It

4:12

proved that you could have high density

4:15

isolation without the overhead of a VM.

4:17

Docker took these concepts to the Linux

4:19

kernel to create a workflow where a

4:22

container carries exactly what it needs

4:24

like a specific version of Node.js16 and

4:27

OpenSSL without touching the host's

4:29

global libraries. So what is Docker

4:32

exactly? At its core, Docker is a

4:35

platform designed to package,

4:36

distribute, and run applications in

4:38

standardized units called containers. It

4:40

acts as the translation layer between

4:43

your code and the infrastructure,

4:45

providing a consistent interface for

4:46

managing the software life cycle. When

4:48

people say Docker, they are usually

4:50

referring to four distinct things.

4:52

First, the Docker engine, also known as

4:55

the heart of Docker. It is a client

4:57

server application consisting of a

4:59

longunning background demon process

5:01

dockerd APIs that specify interfaces for

5:04

programs to talk to the demon and a CLI

5:07

client. Then there is the docker file.

5:10

This is a textbased manifest that

5:12

defines the source of truth for your

5:14

environment. It documents every step

5:16

required for your app to run making your

5:18

environment infrastructure as code.

5:21

Third is images. This is the blueprint

5:23

of your application. An image is a

5:26

readonly executable package that

5:28

includes everything needed to run an

5:30

application code, runtime, libraries,

5:33

environment variables, and config files.

5:36

When you run an image, it becomes a

5:38

container. And fourth, Docker Hub. This

5:41

is a centralized registry similar to

5:43

GitHub, but for binary images that hosts

5:46

official security scanned images for

5:48

databases like PostgreSQL, web servers

5:51

like EngineX, and runtimes like NodeJS.

5:54

Before we look at how the Linux kernel

5:56

actually draws these isolation

5:57

boundaries, let's talk about another

5:59

place you're quietly bleeding

6:01

engineering cycles, your CI/CD pipeline.

6:04

Just like how wrestling with environment

6:06

drift is a massive waste of your time,

6:08

so is staring at a terminal waiting 20

6:10

minutes for a Docker image to build.

6:12

That context switching completely

6:14

destroys developer momentum. If you are

6:16

building multi-architecture container

6:18

images in standard CI environments like

6:20

GitHub actions, you already know the

6:22

pain of relying on slow buggy software

6:24

emulation like QMU. That's why you need

6:27

to look at depot, the sponsor of this

6:29

video. Depot is a drop-in replacement

6:32

for your Docker builds that makes them

6:34

up to 40 times faster. Instead of

6:36

wrestling with standard CI runners,

6:38

Depot routes your builds to remote

6:40

machines equipped with native Intel and

6:42

ARM processors, completely eliminating

6:45

the need for slow emulation. But the

6:47

real secret weapon is their caching.

6:49

Instead of wasting time saving,

6:51

compressing, and loading cache layers

6:52

over a network, which sometimes takes

6:55

longer than the build itself, depot uses

6:57

a shared, blazingly fast NVMe cache that

7:01

is instantly available across all of

7:03

your parallel builds. There is no

7:04

complex migration. You literally just

7:06

swap Docker Build for Depot build in

7:09

your workflow and your pipeline is

7:11

instantly faster. Stop burning money on

7:13

CI minutes and shattering your team's

7:16

focus. Go to depot.dev to get your time

7:18

back. Now, back to how the kernel makes

7:21

Docker possible. Let's talk about the

7:22

anatomy of isolation and kernel

7:25

primitives. The Linux kernel draws these

7:27

boundaries using two secret weapons.

7:30

First is namespaces. We're going to talk

7:32

about isolation or what a process can

7:34

see. Namespaces provide the virtual

7:36

reality goggles for a process. In Linux,

7:39

many resources are global like the list

7:42

of all running processes or the network

7:44

card. Namespaces wrap these global

7:46

resources in an abstraction so that a

7:48

process inside a namespace thinks it has

7:51

its own private isolated instance of

7:53

that resource. First, let's talk about

7:55

the PID namespace. On your host, your

7:58

app might be process ID4502.

8:00

But inside the container's P namespace,

8:03

the app sees itself as P1, the system's

8:06

innit process. It cannot see or interact

8:08

with any processes outside its own

8:10

bubble. Second, there's the net

8:13

namespace. It provides a private network

8:15

stack. This includes its own IP address,

8:18

routing table, and firewall rules. This

8:20

is why you can run three separate

8:22

containers all listening on port 80

8:24

without a port already in use error.

8:26

Each is on its own private network. Then

8:28

we have the mnt or mount namespace. This

8:31

isolates the mount points. The process

8:34

sees a completely different file system

8:35

root or slash than the host machine. It

8:38

cannot see the hosts/etc/

8:41

shadow or/home folders unless you

8:43

specifically mount them. And then we

8:45

have the UTS namespace. This allows the

8:47

container to have its own host name and

8:50

domain name separate from the host

8:52

machine. A container host name is just a

8:54

unique label assigned to a container

8:56

within a network identifying it for

8:58

intercontainer communication typically

9:00

defaulting to the container ID in

9:02

Docker. And then we have control groups

9:04

or croups. If namespaces are about what

9:07

you can see, Croups are about what you

9:09

can use. They set hard limits on RAM and

9:12

CPU which prevent a single buggy

9:14

container from crashing your entire prod

9:17

node. Now let's talk about the union

9:19

file system. A Docker image is tiny

9:21

because of the union file system. Images

9:23

are composed of layers. Each instruction

9:26

in a Docker file creates a new layer.

9:28

These are stacked and treated as one

9:31

system. If you have three apps based on

9:33

Ubuntu 24.04, Docker stores one readonly

9:36

layer of Ubuntu on your disk shared by

9:39

everyone. When a container is launched,

9:41

Docker adds a thin unique writable layer

9:44

on top of the immutable image layers.

9:46

All changes made within that specific

9:48

container are written to this top layer.

9:50

This is the copy on write strategy. If

9:52

the app changes a file, Docker copies it

9:55

to the top layer and modifies it there

9:58

leaving the base image untouched. Let's

10:00

dive a little bit deeper into the copy

10:02

on write mechanism. Firstly for reading.

10:04

If an application needs to read a file,

10:06

Docker first looks in the writable

10:08

layer. If it's not there, it accesses

10:10

the file from the shared readonly layers

10:13

below. And then we have writing. The

10:15

first time an application modifies an

10:17

existing file from a lower layer.

10:19

Docker's storage driver performs a copy

10:22

up operation, copying the file to the

10:24

container's unique writable layer. The

10:26

modification then occurs on this copied

10:28

file in the writable layer, leaving the

10:31

original file in the readonly layer

10:33

untouched and available for other

10:35

containers to use. This ensures each

10:37

container has its own isolated data

10:39

state while maximizing efficiency. And

10:42

this in turn provides reduced storage

10:44

consumption because images are shared.

10:47

Launching 10 containers based on a 1

10:49

GBTE image does not require 10 GB of

10:51

storage. It only requires the base 1

10:54

GBTE plus small individual write layers.

10:56

It also provides fast container startup.

10:59

Containers start almost instantly

11:00

because they do not need to copy the

11:02

entire file system image, only create a

11:04

new empty writable layer. Now, let's

11:06

talk about designing the build or

11:08

optimizing the Docker file. A

11:10

professional Docker file is optimized

11:12

via layer caching. Docker caches each

11:15

line and if you want to change a line,

11:17

Docker invalidates the cache for that

11:19

line and everything below it. Let's look

11:21

at this sample Docker file. The from

11:24

instruction sets the base image. This is

11:26

the foundation of your stack. Choosing a

11:28

small base like Alpine significantly

11:30

reduces your image size and attack

11:32

surface. The work dur sets the execution

11:36

context. Any following run cmd or copy

11:39

instructions will happen inside this

11:41

folder ensuring your file structure is

11:43

predictable. Copy moves files from the

11:45

host machine into the image. By copying

11:48

package.json JSON before the rest of the

11:50

code. We ensure npm install only reruns

11:53

if our dependencies actually change. Run

11:56

executes commands in a new layer on top

11:58

of the current image and commits the

12:00

results. This is used to install

12:02

packages or build your application.

12:04

Envals production sets an environment

12:07

variable named nodeen to the value

12:09

production within the image. This is a

12:11

standard practice for Node.js JS

12:13

applications as it triggers several

12:14

optimizations such as improved

12:17

performance, disabled development only

12:19

warnings and loggings, and the omission

12:21

of development dependencies during the

12:22

installation process, and this produces

12:24

a smaller final image size. Expose 3000

12:28

informs Docker that the container is

12:29

expected to listen for network traffic

12:31

on TCP port 3000. CMD provides the

12:34

default for an executing container.

12:37

Unlike run which executes during the

12:39

build, cmd is the entry point of the app

12:42

once the container is actually launched.

12:44

In this case, it just runs the server.js

12:46

file as a NodeJS program. Now, let's

12:49

talk about data persistence and

12:50

orchestration. Starting with Docker

12:52

volumes. By default, data inside a

12:55

container is ephemeral. If the container

12:57

is deleted, the data is gone. Volumes

13:00

are the preferred mechanism for

13:01

persisting data generated by and used by

13:04

Docker containers. They are stored on

13:06

the host but managed by Docker allowing

13:08

you to swap out containers without

13:10

losing your database records or uploads.

13:12

For orchestration, let's touch on Swarm

13:14

versus K8. As you scale from one

13:17

container to hundreds, you need

13:18

orchestration. Docker Swarm is Docker's

13:21

native orchestration tool. It's easier

13:23

to set up and great for smaller, simpler

13:25

clusters. Kubernetes, also known as K8s,

13:28

is the industry standard. It is highly

13:30

complex but offers immense power for

13:32

automated scaling, self-healing, and

13:35

managing massive distributed workloads.

13:37

There's also the question of Docker

13:39

versus Podman. Docker uses a client

13:41

server architecture with a centralized

13:43

demon known as Docker D. Podman on the

13:46

other hand is demonless. It uses a fork

13:48

exec model where the CLI directly

13:50

interacts with the OCI or open container

13:53

initiative runtime. For Docker, the

13:54

demon typically runs as root, posing a

13:57

single point of failure and higher

13:59

security risk. Podman, on the other

14:01

hand, is rootless by default,

14:02

significantly reducing the potential

14:04

attack surface. For Docker, all

14:06

containers are managed by a single

14:08

persistent background service, while for

14:10

Podman, each container runs as a regular

14:13

isolated user process manageable with

14:15

standard Linux tools like systemd.

14:17

Docker primarily uses Docker Compose for

14:19

multicontainer orchestration which

14:21

requires separate YAML file creation.

14:24

Podman is built with Kubernetes in mind

14:26

and it supports native pods and can

14:28

automatically generate Kubernetes

14:30

compatible YAML files from local

14:32

workloads. But what are the actual next

14:34

steps in your Docker journey? First is

14:36

to realize that Docker is not just a

14:38

tool. It's a mental shift from managing

14:40

servers to managing artifacts. By

14:42

mastering namespaces, layers, and

14:45

volumes, you gain total predictability

14:47

over your software's life cycle. If you

14:49

want to begin your journey in becoming a

14:51

10x engineer, I highly recommend

14:53

checking out Code Crafters, where you'll

14:54

learn how to build Git, Docker, Reddus

14:58

from scratch. They are hands down the

15:00

best project-based coding platform out

15:01

there. Check out my link below for 40%

15:04

off. As always, thank you very much for

15:05

watching this video and happy coding.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

This video provides a deep dive into how Docker works under the hood, moving beyond the 'it works on my machine' meme. It explains the core differences between virtual machines and containers, detailing how the Linux kernel uses namespaces for process isolation and cgroups for resource limitation. It also covers the union file system, image layering, Dockerfile optimization, and common concepts like volumes and orchestration before comparing Docker to Podman.