Developing Taste in Coding Agents: Applied Meta Neuro-Symbolic RL

Developing Taste in Coding Agents: Applied Meta Neuro-Symbolic RL — Ahmad Awais, CommandCode

Watch on YouTube

Now Playing

Developing Taste in Coding Agents: Applied Meta Neuro-Symbolic RL — Ahmad Awais, CommandCode

Transcript

517 segments

0:00

Well, hello there. Today I am really,

0:02

really excited to both launch and share

0:05

with you what we have been working on

0:06

for maybe over an year now. It's called

0:09

Command Code, a coding agent with taste.

0:11

So, who am I? Um, I am Ahmed, creator of

0:14

Command Code, CEO and founder of

0:17

Langbase. Um, I've been around this blog

0:19

for I don't know like 20 years building

0:22

one thing after another. I've written

0:24

hundreds of open source packages with

0:26

millions of downloads. Maybe you like my

0:28

shades of purple code theme. I love the

0:30

color purple and I I've I'm I'm an

0:32

engineer at the end of the day. I write

0:34

a lot of code and I've been building in

0:36

the LLM space for about five years now.

0:39

Um and I think the one of the first

0:41

tools that I actually ended up building

0:42

was a coding agent and at the end of the

0:45

day like I'm very technical. I got to

0:47

contribute to the NASA Mars Engineer

0:49

helicopter mission. My code lives on

0:50

Mars. So when I'm writing code, no

0:54

matter what LLM or what coding agent I'm

0:56

using, I want it to learn from me. I

0:59

want to I wanted to learn that how I am

1:01

editing its code. I wanted to understand

1:04

my preferences and continuously adopt to

1:07

that uh you know preference set in

1:09

invisible architecture of choices that I

1:11

have and that is what I'm excited to

1:13

demo today. Right? So uh the story

1:16

actually begins in 2020 uh when Greg

1:19

Brookkeman gives me access to GP3 and I

1:22

tell him like the one of the first

1:23

things this is like three years before

1:25

chat GPD and a year before uh you know

1:27

GitHub copilot I tell him that I want to

1:29

build something with GP3 that suggests

1:32

suggests the next line of code right so

1:34

let's jump into a demo right away right

1:36

let's let's look at what this actually

1:38

looks like and then I'll I'll probably

1:40

explain you know how we ended up here

1:44

so on On the left here you see uh you

1:46

know cloud code and this is command code

1:49

right this is what we are building as

1:50

you can see it is continuously learning

1:52

taste is on this is what we call it and

1:55

uh I've been building a lot of CLI as

1:58

you know like you know if you know

1:59

anything about me you know that I'm all

2:00

about automation and I have been

2:02

building a lot of you know CLI over the

2:05

course of my career so let's uh build a

2:08

CLI and command here actually knows how

2:11

I built a CLI yesterday right or before

2:14

that it kind of understands my

2:16

preferences of building a CLI. So let's

2:18

give both of them uh this thing right uh

2:22

make me a CLI that can tell date in ISO

2:28

format

2:30

right so look at what is happening here

2:32

so one of the first things that happen

2:34

here is uh command kind of picks up on

2:37

my test file and I'm I'm going to share

2:40

a little bit more about it but you see

2:42

what is happening here and I'm going to

2:44

probably you know enable all these

2:45

settings so let's give both of these

2:48

coding agents

2:50

uh you know a steps on and you can see

2:53

what command is doing it's it's using

2:55

t-up it's using uh typescript and it's

2:59

uh building an ASI art you know banner

3:02

it's npm linking uh it's going to help

3:04

LPM link this particular CLI as well and

3:07

the these are all the things that I kind

3:08

of care about and while cloud has done

3:11

something really good it it's very fast

3:13

but h I don't know man this is not what

3:16

I wanted it's like a console log of uh

3:18

uh this or that like I I I I I when I

3:21

build a CLI, I don't want to build a

3:23

CLI, you know, a CLI like this. I I want

3:25

to build something like, you know,

3:27

please uh use uh Typescript and I want

3:31

TUP, right? Um and what else? I want uh

3:36

Commander because I like to uh you know

3:39

have more control over my CLIs. And what

3:42

else? I want a lowercase

3:46

uh version number uh with hyphen v

3:51

because I know you know commander does

3:52

this hyphen capital v thing like I have

3:54

so many preferences here and by this

3:57

time uh command has already done what I

4:00

wanted it to do. How about we actually

4:02

jump uh into code and see you know what

4:05

it has actually done right let let's

4:06

let's open this up into VS code

4:12

and this is what command did for me

4:15

right so it is using tap it is using

4:17

typescript it knows pmppn uh that I

4:20

prefer pnpm uh I completely forgot to

4:22

tell that to uh claude and if we go into

4:26

this particular uh CLI here uh you can

4:29

see what it is kind kind of doing right

4:31

like it is using hyphen v uh for version

4:35

it is not like hard coding a package

4:38

version in here and one more thing it

4:40

should have picked up is like I want all

4:42

of these commands to be in separate

4:44

directory called commands so there you

4:46

go the date command is here so when I

4:48

grow this CLI into like you know tell me

4:50

human date or whatnot it is going to put

4:52

all of these commands here it's very

4:54

very easy to test that way I wonder if

4:56

it is also using vitest there you go

4:57

because I prefer Vest for uh you know

5:00

writing uh a lot of tests and one of the

5:02

those things you know it it is using 0

5:04

0.0.1

5:06

version I like to start here instead of

5:08

1.0.0

5:10

right and that is probably not what you

5:13

know uh uh claude was doing on this side

5:16

right if I were to open the same uh CLI

5:19

that claude built for me you will see

5:21

that you know 1.0 O and it's like again

5:24

not using vit like every single

5:26

preference that I have it is probably

5:28

not going to do that and then again this

5:30

thing everything is here I don't want it

5:32

like this uh this is kind of again it

5:36

cla knows cloud is a is an amazing model

5:39

but it knows what to do and with command

5:42

right now we are also using cloud but

5:44

it's it's kind of like I have to steer

5:46

it so much that I kind of feel like it

5:48

should be learning from me and by the

5:50

way it's it is quite transparent And if

5:52

you look at this, we have a command code

5:54

folder in here. And if you see in here,

5:56

there's a taste file. And if you go

5:58

inside of it, there's a, you know, CLI

6:00

taste that it has picked up. And these

6:02

are all my preferences. I can assure

6:05

you, none of this is written by me. So

6:07

command code is continuously learning

6:09

from me and it is creating a lot of

6:12

these taste like things. This is not

6:15

spec. This is not scale. It's like my

6:17

intuition uh built into a metaano

6:19

symbolic uh model, an architecture model

6:22

that is more deterministic that kind of

6:24

figures out it's more like a reix of my

6:27

preferences and it figures out like this

6:29

is what I want when I'm using and

6:32

building uh you know with uh writing

6:35

with AI code or whatnot. So let's step

6:37

back in and let's take a step back uh

6:39

why and how we got here right and I'm

6:42

going to share we are going to publish a

6:43

paper about it as well. I'm going to uh

6:46

share a little bit more about like where

6:47

we are and how we are going to think

6:50

about it, why this kind of matters and

6:52

what is the architecture behind all of

6:54

this. So again I started in 2020. Uh the

6:56

first thing I built was a coding agent

6:58

and that led to so many things. I ended

7:00

up building Langbase and we raised $5

7:03

million from all these amazing people.

7:05

In fact, uh founder of GitHub uh led our

7:09

uh round in you know founders of all

7:10

these amazing company companies kind of

7:13

supported uh you know our mission here

7:16

and the idea that we were we were trying

7:18

to fix was memory and this memory was

7:20

not rag it was like a serverless rack

7:23

store which can reason over your data

7:26

reason over how to help you and

7:28

continuously learn and we saw a lot of

7:31

things like I think this is the biggest

7:33

problem in AI I think the best thing

7:35

that [laughter] AI has kind of learned

7:37

from humans is that humans are lazy and

7:39

that is what AI is. AI is lazy by

7:42

default. It's very sloppy. If you ask

7:44

for a you know horse on a staircase

7:46

banister, this is kind of what you get

7:48

and then you have to uh you know prompt

7:50

it again and again and again to get to

7:51

this left side of things. You know this

7:53

is sort of what you saw me do with

7:55

Claude when I was trying to build that

7:57

CLI. Right? To fix this problem, we

7:59

basically launched a bunch of

8:01

primitives. of threads, workflows,

8:02

memory, what have you. And our hope was

8:05

that people will start building amazing

8:07

agents. And then we saw uh you know like

8:10

we doing like I think 700 terabytes and

8:13

1.2 billion agent runs a month. So we

8:15

saw major scale but we saw another

8:17

problem. We we studied that problem and

8:20

you can go to stateofiagents.com.

8:22

You can study all of our uh research

8:24

into how people were building agents.

8:26

This is all public by the way. [snorts]

8:29

And we figured out like even agents uh

8:32

were very way very sloppy like you know

8:33

I'm like I think like I I use AI for

8:36

everything except for when I am writing

8:39

right because every time I build an

8:41

agent uh to write or every time I use an

8:44

LLM to write something this sort of slob

8:48

I kind of get back right so we have a

8:50

collaborative dev tool can you write me

8:52

a fun headline for it and what I'd get

8:55

back is like power of synergistic

8:56

teamwork or whatn not and this is my

8:58

friend I actually saw him do this and he

9:00

was like, "Oh god, no. Please fix it."

9:01

And it got even worse, right? Uh to fix

9:04

this, we we tried this command. We

9:07

launched it as chai. And rebranded to

9:09

command new in last five months. This

9:12

was an agent of agents. You would give

9:14

it a prompt like this is the kind of

9:15

agent I want to build. It will provision

9:17

and create all of the infrastructure for

9:19

you. And I shared a talk about it as

9:21

well. In five months, we have seen

9:23

150,000 agents vip coded with it. But

9:26

there's just something missing, right?

9:28

Vibe coding I think is better than slob

9:30

but it's not better than the rules and

9:33

choices that I have made that I have

9:35

kind of built my career around right so

9:38

we started to fix this problem again and

9:41

this is sort of again this is my five

9:43

years of learning is around this I think

9:46

by default AI is sloppy this is the

9:48

default setting of almost every LLM

9:50

they're trying to be correct and they're

9:52

trying to be correct as soon as possible

9:55

that I think doesn't really work with

9:57

code and then we get this vibe coding

10:00

thing where somebody does the context

10:02

engineering you know everybody has a

10:04

different name for it you know uh behind

10:06

the scene it's context engineering

10:08

memory and a bunch of prompts and you

10:10

know you most of the times you don't

10:12

really have a lot of control over it and

10:14

to seek that control what a lot of

10:15

developers do is they they start writing

10:17

these rules files like cloudmd

10:20

agents.mmd and rules are never enough I

10:23

I I often tell I often joke about this

10:27

that our justice system sucks because

10:29

our rules are not enough and then we

10:31

have to go out with this human lawyer

10:34

and a human you know judge and a jury of

10:37

humans to figure out what to do in that

10:40

particular situation right so I feel

10:42

like uh there should be something that

10:44

is learning rules from us and it should

10:47

be learning our taste of writing code

10:50

and that is why I've put this thing

10:51

taste here what what does that look like

10:53

let me let me like uh like I I think

10:56

this should be something that is

10:58

acquiring our taste. So, uh, command

11:01

code a coding agent with taste or

11:05

if I if I'm bold enough to say it's it's

11:07

something that is a coding agent with an

11:09

acquired taste. It learns what is your

11:12

taste of writing code. And this is sort

11:14

of what it looks like. So, I know this

11:17

might be a very silly and bad example. I

11:19

didn't want to put a lot of text here.

11:21

But when I look at this code which is AI

11:23

generated, I'm like, no, no, no. This is

11:25

not good. I want JavaScript uh object

11:28

parameters. Anytime there are more than

11:31

two parameters, I want that. But AI

11:33

won't uh you know listen to me. LLMs

11:35

won't know my preferences of this thing.

11:38

So again, when I ask for make me a

11:40

sum.js function, this is again a very

11:43

dumbed down version of an example. U

11:46

cloud code won't do what I want it to

11:48

do. in command just naturally knows this

11:50

is what I prefer because it has seen me

11:52

go and edit AI code and fix it this way

11:55

right and similarly we kind of saw this

11:58

happen when I asked to build a date CLI

12:01

this is you know claude basically

12:02

started with here's a console and I had

12:04

to tell it no I I want PNPM I want I

12:08

want TypeScript and all of that fun

12:09

stuff whereas command just kind of knows

12:12

that I prefer commander I prefer all of

12:14

those things that I just you know demoed

12:16

earlier in this particular talk Okay.

12:21

So to sum it up, I think when

12:24

programmers talk about good code,

12:26

they're not talking about code that is

12:28

correct. They're talking about this

12:31

invisible architecture of choices that

12:34

they have made throughout the course of

12:36

their career to make their code, you

12:38

know, kind of like readable,

12:40

maintainable and humane and more like,

12:42

you know, you which is which is I think

12:44

what is stopping me to write a lot of

12:46

code. I want to generate my mission is

12:49

like what if I could do a lot of things

12:52

in one day. What if I can have like a

12:54

thousand poll requests merged to main uh

12:58

you know and my review time would just

13:00

go down by 90% or 99%. If an LLM if a

13:04

coding agent was doing what I wanted to

13:07

do right if it is not just picking up

13:09

some sloppy code from 2015 Stack

13:12

Overflow and slapping it to you know

13:15

every request I have and I don't have

13:17

time to teach it all the rules. I can

13:19

either write code or I can teach it to

13:22

write code. I I cannot be the one who's

13:25

uh you know telling it when I'm using

13:26

Nex.js GS or oh no this even though

13:29

those both those both of those are you

13:30

know creating API route files what is

13:33

the difference when I'm in this project

13:35

and that project it should just learn

13:37

that in this situation this is the

13:40

confidence level it has around the

13:42

conflicts that uh you know that arise

13:44

from different rules and different

13:46

projects right so I I I don't I don't

13:48

think I can do that again this is this

13:50

excites the hell out of me I think this

13:52

is the invisible architecture of choices

13:54

that every programmer is making and that

13:57

is that is what we are trying to build

13:59

here uh you know a meta neuros symbolic

14:03

reasoning space with reinforcement

14:05

learning. This is this is a very dumbed

14:08

down version uh a formula of how we have

14:11

set this objective. Uh if if you don't

14:14

know trans you know neurosymbolic

14:16

architecture is a more deterministic

14:18

inexplainable architecture than

14:19

transformers. Transformers are

14:22

generative. They they they are very

14:23

probabilistic right. So what we are

14:25

trying to do here is we are trying to I

14:28

think claude and GPD are good enough

14:29

really they are really good and you can

14:31

use whatever LLM with command code but

14:34

that LLM will be combined with your

14:36

taste which is built up o upon this meta

14:39

neurosy symbolic space you can think of

14:41

it like uh you know a reax of your uh

14:45

you know choices in petrit right and we

14:47

have a kale divergence loop here as you

14:49

can see like if you do end up doing

14:51

something wrong we want the lm to you

14:54

know [clears throat] correct you as

14:54

Well, it's it's this amazing continuous

14:58

learning tool that is both learning from

15:01

your explicit and your implicit

15:03

feedback. And then again, it is creating

15:05

that neuros symbolic space to enforce

15:07

that invisible logic uh around your

15:11

choices. The architecture that is in

15:13

your head, it is in your brain like oh

15:15

yeah, when I'm building uh you know a

15:17

TypeScript project, this is the type of

15:19

thing I do, right? that kind of thing

15:21

that can never really like you you your

15:23

brain can never really translate that

15:25

into a you know rules file otherwise

15:27

like you won't be writing code you'll be

15:29

writing a lot of rules files right and

15:31

then again uh at the end to use the new

15:35

neural part the LLM part we have

15:37

reflective context engineering which is

15:39

self-aware which is continuously

15:41

learning and adopting like oh this guy

15:43

used to use meow for writing CLI and I

15:47

don't know what happened but two months

15:48

ago it's he switched to commander I'm

15:50

talking about this guy by the way. This

15:52

literally happened, right? And it will

15:55

automatically update my rules, my uh

15:58

learning from me, my taste that now

16:01

Emmeth prefers to use commander over

16:04

meow. I don't need to go and teach it. I

16:06

should be writing code at I don't know

16:08

god speed and it should be learning all

16:11

of this from me. And over time we've

16:14

believed that this will turn it uh into

16:17

a skill of intuition that command code

16:19

will have that you can share with your

16:21

team. Our mission is to build a huge

16:24

ecosystem around this. Imagine if you

16:26

could if you really like a developer out

16:28

there uh whose react uh uh you know code

16:32

is amazing, right? I I love what Tanner

16:34

is doing at Tenner St with ten stack,

16:36

right? So what if I could have tanner

16:39

taste when I'm writing React code? You

16:42

can do that with command code. What if

16:44

like one of the things that I have been

16:45

using it a lot for like my design

16:47

engineer has a much better design skill

16:50

than I do. uh whenever I'm writing any

16:52

kind of front-end code, I actually

16:54

borrow the design engineer taste I have

16:57

which is which is messy like all sort

17:00

all those margins and paddings and uh

17:02

amazing tiny little details in his taste

17:05

that I don't need to now care about but

17:08

my LLM in my command code my coding

17:10

agent kind of puts that LLM and that

17:13

meta neurosy symbolic design taste

17:15

alongside my request like build me a

17:17

model that does this but it does it with

17:20

my design engineer st which is

17:22

unbelievable right so uh this is this is

17:25

this is where we are today uh today we

17:28

are launching command code you can you

17:29

can you know feel free to go to

17:31

commandcode.ai AI you know check it out

17:34

this is the very beginning of all of it

17:36

and I think large language models have

17:39

captured the world stacks everything out

17:42

there all of the stack overflow and

17:43

whatnot and I believe what we are

17:46

building with taste models is the

17:49

world's intuition right and their

17:52

intentions right what do you intend to

17:54

do and how do you generally do it what

17:56

are the patterns what is your taste in

17:58

that taste with your preferred LLM

18:02

is I think the next frontier of coding,

18:05

right? Taste I totally believe is going

18:08

to really really speed up how we write

18:11

code. really really create that neuros

18:14

symbolic uh guard rails or your you know

18:17

again invisible architecture of choices

18:19

that you have as a team as a project as

18:23

a famous library or I don't know maybe

18:26

you are an enterprise who care about

18:28

doing things in a particular way right

18:30

that is the kind of thing that you would

18:32

be able to build taste around and share

18:35

it with uh uh you know as an open source

18:37

taste or share it with uh just your team

18:40

like for example uh for example if you

18:42

go sign up. Uh again, this is very very

18:44

new. Uh this is potentially it will look

18:46

like right. Uh we've already kind of

18:48

moved away from uh sharing all of this

18:50

and we are figuring out I would love

18:52

your help to figure out what is the

18:54

right mix of uh having all of this

18:56

metalarning uh you know uh be part of

19:01

your projects. Right now it kind of ends

19:03

up as more of a you know what should I

19:06

say a transparent markdown file but it

19:09

could exist in any which way. It's a

19:11

metano symbolic space in a model that is

19:14

continuously learning your preferences

19:17

and we can dump that learning in any

19:19

particular form. Right now this is

19:21

potentially what it looks like. You

19:23

should be able to you know npx taste and

19:25

install my CLI taste and then you can

19:28

use command code and the CLI that you

19:30

will build will be very very close to

19:33

you know how I would build that CLI

19:35

using your favorite LLMs. So yeah,

19:38

that's pretty much it. As you can as you

19:41

can see, I am pretty excited. Uh you

19:43

know, uh our our biggest gains that we

19:46

have seen uh internally at Langbase are

19:50

we have probably 10xed the amount of

19:53

code that we are merging uh uh in our

19:56

main repository, right, in our maiden

19:59

branch, right? which is generally we

20:01

joke about it like when we disagree and

20:03

compare to main the amount of that

20:05

happening has increased 10x and um I I

20:10

I'm feeling a lot more confident uh when

20:12

I'm reviewing a lot of code right so our

20:15

review uh time for any kind of coding

20:18

pull requests have gone down

20:19

significantly and I can't wait to see

20:22

you know what everybody out there builds

20:23

with it again we're very excited we want

20:26

that LLMs should continuously be

20:29

learning from our taste of writing code

20:32

and I would love to see uh you know what

20:34

you build with command code u that's

20:36

pretty much it uh feel free to reach out

20:39

and uh maybe you know uh send me a tweet

20:41

or post or whatever you call uh we call

20:43

it these days uh and I would love to see

20:45

you know what everyone builds this is me

20:48

uh thanks for having me ciao peace

Interactive Summary

Ask follow-up questions or revisit key timestamps.

Ahmed, the founder of Langbase, introduces Command Code, a new coding agent designed to learn and adopt a developer's specific coding 'taste'. Unlike traditional coding assistants that often generate generic or 'sloppy' code, Command Code observes how a developer corrects and refines AI-generated suggestions to build a persistent, neuro-symbolic model of their preferences. This allows the tool to automatically align with a user's architectural choices, such as preferred libraries, project structures, and testing frameworks, thereby increasing productivity and reducing review times. The vision for Command Code includes an ecosystem where developers can share these 'taste' profiles, allowing others to leverage the expertise and coding styles of experienced peers.