Frontend is HARDER for AI than backend (here's how to fix it)

Watch on YouTube

Now Playing

Transcript

170 segments

0:00

There's a weird problem with AI coding

0:01

agents at the moment, which is that they

0:03

are way better at writing back-end code

0:05

than they are at writing front-end code.

0:07

I'm not talking about producing like

0:09

front-end slop like basic marketing

0:11

pages and just, you know, the demos that

0:13

you see people oneshotting. I'm talking

0:14

about complicated user interactions,

0:16

multi-step forms, annoying animation

0:19

things. If you ever tried using an LM

0:21

for this, then you will find it is not

0:23

very good at handling that stuff. The

0:24

reason for this is that the feedback

0:26

loops that the LLM is using are way more

0:29

precise on the back end than they are on

0:31

the front end. On the back end, you can

0:32

write some code, right? And then that

0:34

code is often tested by more code. The

0:37

outputs of those tests are written in

0:39

text and the entire loop, the entire

0:42

environment is encoded in text. Even the

0:44

information about the documentation that

0:46

you're using is written in text. And the

0:48

LLMs are really good at writing and

0:50

understanding text. The difficulty comes

0:52

when you start adding visual elements to

0:55

this. A feedback loop in a front end. If

0:57

you've ever been a front-end developer,

0:58

you know this is to write some code,

1:00

then look at the UI to see what changed,

1:02

go back, write a bit more code, look at

1:03

the UI again. In other words, as humans,

1:05

we're not just looking at text. We're

1:07

also taking in visual information. We're

1:09

able to use our innate design sense and

1:12

get a feel for what is supposed to be

1:13

where and try out user interactions

1:15

ourselves. And in fact, we're not just

1:17

processing images, we're processing

1:19

video as well. We're trying to feel how

1:20

things scroll, how an animation feels

1:23

when we hover over it. In other words,

1:24

so much of this feedback loop is visual,

1:27

not textual. And this is a problem for a

1:29

lot of AI coding agents. With a back-end

1:31

feedback loop, you can add unit tests,

1:33

you can add linting, you can add type

1:35

checking. You can add these in a

1:36

pre-commit hook to make sure when the

1:38

coding agent commits its code, it's

1:40

going to see the errors come up if there

1:42

are any. We can hook up these same

1:44

things here to the feedback loop, but

1:46

they're all textual, right? We still

1:48

need a visual element to make the AI

1:50

coding agent understand what it's done

1:52

and whether what it's done actually

1:54

works. I've been working with AI to

1:55

rebuild my blog recently and it kind of

1:58

looks like this. Pretty basic, but I

1:59

quite like the minimalist design. I'm

2:01

going to get claw code to do some QA for

2:03

me. I'm going to ask it to double check

2:04

if light mode and dark mode are working

2:06

okay on the homepage and check if all

2:08

the content is rendered acceptably on

2:10

both. I'm then just going to pass it my

2:11

local server which is localhost 5175.

2:14

What it's going to do here is it's going

2:15

to check the homepage both light and

2:17

dark mode using Chrome DevTools. Its

2:19

first tool uses to spin up a new page

2:21

and I'm going to say yes and don't ask

2:23

again. It's now going to take a

2:25

screenshot of that with a full page true

2:27

and it took a screenshot of the full

2:29

current page. It spotted I don't see a

2:31

visible toggle. The page likely uses

2:33

system preference. Let me emulate dark

2:35

mode using a strict script to check the

2:37

prefers color scheme behavior. And it

2:39

continues in this vein doing more

2:40

screenshotting, checking all the pages.

2:42

Adding the dark class didn't change

2:43

anything. The site is using media based

2:46

query dark mode. Very good. It finally

2:48

figures out that we're using media

2:49

prefers color scheme dark and then it

2:51

manages to emulate that. Takes a

2:53

screenshot. Both modes look good and

2:55

cleans up and provides a summary. The

2:56

way I've got this set up is with an

2:57

MCP.json inside my project which is

3:00

running the Chrome DevTools MCP server

3:02

pointing it at a running browser URL.

3:05

Alongside this, I also have a skill

3:08

inside here for Chrome debugging. If you

3:10

don't know, a skill is kind of like

3:11

something you would put in your

3:12

agents.md file, except the agent can

3:15

pull it in on demand when it needs to.

3:17

This one, for instance, says, "Before

3:18

using Chrome DevTools MCP, you must

3:20

launch Chrome in headless mode with

3:21

remote debugging enabled." And that's

3:23

it. That's the entire setup. So,

3:24

alongside the textbased feedback loops

3:26

like testing, linting, typeing, we've

3:28

now got an actual browser that the LLM

3:31

can go and use. And having the browser

3:32

there means the LLM is much more like a

3:35

human coder. They get access to all of

3:37

the feedback loops that you have, too.

3:38

And bear in mind, the LLM is not

3:40

actually writing tests for this stuff or

3:43

writing end-to-end tests you need to run

3:45

on every commit. It's doing the same

3:46

kind of testing that human devs do,

3:49

which is ad hoc testing the feature that

3:51

they built, actually works. Now, I

3:53

posted this on X and I got some really

3:55

great responses. Basically saying that

3:57

the Playright MCP, which was what I was

3:59

using at the time, is not very good. And

4:00

there are several fighting in the same

4:02

space that are maybe a little bit

4:04

better. The one that Damian pointed out

4:05

here, dev browser, is pretty good, but I

4:07

have not yet tried it. And the other one

4:09

that he pointed out, which was this

4:11

playwriter, I think, yeah, this thing

4:13

here, another MCP, is also supposed to

4:16

be really good, too. What people say

4:17

about the Playright MCP server and the

4:19

Chrome DevTools server, too, is that

4:21

they're pretty context hoggy. They take

4:23

up a lot of tokens in order to pump

4:25

their screenshots back into the LLM. and

4:28

their MCP servers are configured in a

4:29

way that makes them really context heavy

4:32

like really detailed tool descriptions,

4:34

lots of different tools, that sort of

4:36

thing. You can also use Claude code

4:37

directly with Chrome using the Claude

4:39

Chrome extension. I can't use it at the

4:42

moment because I'm on WSL and it's not

4:44

supported on WSL. The last thing to say

4:45

here is that this front-end feedback

4:47

loop makes AFK AI coding a lot more

4:50

powerful. If you're looping an AI, for

4:52

instance, in the Ralph Wigum setup, then

4:54

plugging a browser into your front end

4:55

or full stack work will be such a

4:59

massive improvement because without it,

5:00

your LLM is essentially flying blind. It

5:03

can't see the execution environment in

5:05

which the changes are being made. So,

5:07

I've noticed an enormous improvement

5:09

when I plug in a visual feedback loop

5:11

into this setup. If you dig this, then

5:12

you will dig all the stuff I'm putting

5:13

out on AI Hero. There are a ton of free

5:15

tutorials and free guides that help you

5:18

start building with AI and especially

5:20

putting AI into applications which is

5:23

where things start getting really gnarly

5:25

and really fun. But thank you so much

5:26

for watching and I will see you in the

5:27

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The video addresses the issue where AI coding agents are significantly better at generating back-end code than front-end code. This disparity arises because back-end development utilizes text-based feedback loops that LLMs excel at, while front-end development heavily relies on visual feedback and human design intuition, which current LLMs struggle to process. The speaker demonstrates a solution by integrating a browser-based visual feedback loop using Chrome DevTools MCP. This allows the AI agent to "see" and evaluate UI changes, such as checking light and dark modes on a blog, effectively performing ad hoc visual testing akin to human developers. While acknowledging that current tools like Playright MCP and Chrome DevTools can be "context hoggy," consuming many tokens for screenshots and detailed tool descriptions, the speaker emphasizes that adding this visual feedback loop vastly improves the AI's capability for front-end tasks, preventing it from "flying blind."

Recently Distilled

Videos recently processed by our community