HomeVideos

Why Testing Is Hard and How to Fix It with Will Wilson

Now Playing

Why Testing Is Hard and How to Fix It with Will Wilson

Transcript

3259 segments

0:03

Welcome to Signals and Threads, in-depth

0:05

conversations about every layer of the

0:07

tech stack from Jane Street. I'm Ron

0:09

Minsky. All right, it is my pleasure to

0:11

introduce Will Wilson, who's the

0:12

co-founder and CEO of Antithesis,

0:14

someone who started out studying math

0:16

and then somehow found himself working

0:18

on distributed databases and now running

0:20

a startup that is trying to change how

0:22

we all do testing, hopefully for the

0:24

better. Uh, Gene is actually both a

0:26

customer of antithesis and an investor,

0:28

something I want to talk about a little

0:29

bit further in, but thanks for joining

0:31

me.

0:32

>> Yeah, hopefully for the better, but I

0:33

think it would be hard to make it a

0:34

whole lot worse.

0:37

>> Fair.

0:39

>> So, let's just talk a little bit about

0:41

kind of how you got here. You started

0:42

off studying mathematics. You've done a

0:45

bunch of other things. You're now doing

0:46

a lot of what what seems to me is really

0:47

hardcore systems work.

0:49

>> Um,

0:50

>> tell us a little more about that

0:51

journey.

0:52

>> Sure. So when I got to college, um it

0:54

was, you know, it was the time when

0:56

everybody was super super excited about

0:58

computer science. Like Facebook was new,

1:01

Google was new, everybody was going off

1:02

and joining those companies and, you

1:04

know, making a lot of money and and

1:05

doing really cool stuff. And you know, I

1:08

basically made a a very large mistake,

1:10

which was I got to college and I was

1:11

like, "Wow, that computer science stuff

1:13

seems really cool. Too bad it's over,

1:15

right? Too bad. Too too bad. Too bad all

1:18

the interesting problems have been

1:19

solved already. Like look, somebody's

1:21

already made Google. like what else

1:22

could there be to do? Um, so I basically

1:24

ran kind of in the opposite direction. I

1:26

knew a little bit about how to program.

1:28

I taught myself when I was a kid, but I,

1:30

you know, I basically avoided studying

1:32

computer science at all and ran into

1:35

like the most abstr forms of

1:37

mathematics, which just seemed, you

1:39

know, more intellectually interesting

1:41

and also like nobody was going to run

1:44

out of math anytime soon.

1:46

>> That's true. Although this whole thing

1:47

of like maybe AIS will run us out of

1:48

math, but that's like a

1:50

>> Yeah. Yeah. Well, if I were

1:51

>> if I were making that decision again

1:52

today, I might have I might have picked

1:54

something different that AI is not so

1:55

good at.

1:56

>> So, when you say like obstuse

1:57

mathematics, like what kind of stuff

1:58

were you interested in?

1:59

>> Um, I was, you know, I did a bunch of

2:01

different things. I liked a lot

2:02

something called representation theory,

2:04

which is something very useful in

2:05

mathematical physics. It's basically the

2:07

study of like homorphisms from general

2:11

abstract groups into vector spaces,

2:14

either finite or infinite dimensional.

2:15

Um, it's pretty neat. That was actually

2:17

a little bit too useful. that was a

2:19

little bit too applied. So I also, you

2:21

know, I also got into some like

2:22

mathematical

2:23

>> there like actual matrices there,

2:24

>> right? Well, there's actual matrices and

2:26

you can actually use this to like, you

2:27

know, do particle physics, which, you

2:29

know, I don't know. So I I I also did a

2:32

little bit of set theory. I got into

2:33

something called large cardinal theory,

2:35

which is so abstract it almost sounds

2:37

like a parody, right? It's basically

2:40

what you know what new forms of

2:42

mathematics can we develop if we add

2:45

assumptions that certain very large

2:47

infinite numbers exist and the Wikipedia

2:50

pages on this stuff are a total hoot if

2:52

you want to look at it.

2:53

>> I have I have sadly looked more than a

2:55

little bit at large cardinal theory and

2:57

it is fun and wild and indeed not the

2:59

most practical of all.

3:01

>> This is the only podcast I can imagine

3:02

where the host might say that as a

3:04

response.

3:06

>> All right. So, you had like a promising

3:08

start of a career in mathematics. Why

3:09

did that not go anywhere? Oh, well, you

3:11

know, I basically I got to my senior

3:13

year and I did actually apply to grad

3:15

school and I actually got into grad

3:16

school a few different places and I was

3:18

all set to go off and do my PhD in math

3:20

and then I just looked around and I

3:23

looked at my fellow classmates who were

3:25

going to grad school and I looked at my

3:28

professors and I looked at myself and I

3:31

had a very important moment of

3:33

self-realization which was that I am

3:36

never going to be a world-class

3:37

mathematician because basically I mean

3:40

basically for the same reason that I'm

3:41

never going to dunk, right? I'm never

3:43

going to be a world-class basketball

3:44

player. There's a certain measure of

3:46

natural talent and random variation that

3:49

is just required. And like, yes, you can

3:51

definitely get better at basketball or

3:53

better at math by working very very

3:54

hard, but these are both professions

3:57

with this like incredibly skewed return

3:59

distribution where if you're not in the

4:02

top 0.00001%

4:04

of people, you're just never actually

4:06

going to have a great time. And so, you

4:09

know, I I realized I realized that I

4:12

could spend six years in grad school or

4:15

longer and, you know, eventually get

4:17

some job, you know, teaching somewhere

4:19

as an adjunct or something and, you

4:21

know, or I could not do that and I could

4:23

sort of bail out of this process sooner

4:26

and I just I I realized that was what I

4:28

had to do.

4:29

>> Got it. And then you transitioned into

4:31

what what did you do from there? Well,

4:33

you know, I basically I actually

4:35

initially was off doing a little bit of

4:37

biomedical research. I had uh I had

4:39

interned when I was in college and and

4:42

actually before college at a small

4:43

biotech startup and I'd done a bit of

4:46

that and I'd sort of and then I after

4:48

that I bopped around in a few different

4:50

sort of deadendish jobs and it was at

4:53

one of those that I had this crucial

4:55

realization and the crucial realization

4:58

was that actually my ability to write a

5:00

janky Python script was unbelievably

5:02

economically valuable. Right? Like I was

5:05

sitting at my job and you know my boss

5:08

had assigned me some like enormous pile

5:10

of drudgery and you know I I looked at

5:12

it and I so I wrote a Python script and

5:14

it took me 45 minutes and it automated

5:16

the enormous pile of drudgery and I was

5:17

like okay here I'm done and he looked at

5:20

me with this like expression of dread

5:21

and was like that was supposed to be

5:22

your work for the next 3 months like and

5:25

that was that made me maybe something in

5:27

my head I was like ah interesting maybe

5:30

I should get better at this programming

5:31

thing that seems you know that seems

5:33

like it could be good. So, you know, I

5:34

went and I I I taught myself how to code

5:36

for real and I, you know, did some

5:38

online classes and then I eventually got

5:41

my way into a number of tech startups.

5:43

>> So, how do you actually learn how to

5:44

program? My overall sense of the world

5:46

is that the world is actually very bad

5:47

at teaching people how to program.

5:49

Universities, I feel like, are

5:50

especially bad at it. Uh, they do this

5:52

weird form of performance art where you

5:54

like professors hand out assignments and

5:55

then students fill in and resolve it and

5:58

then it's given back and looked at once

6:00

and then it vanishes like a puff of

6:01

smoke. It's like the eancence is part of

6:03

the art of it all. Um, and real software

6:05

is nothing like that, right? It's a

6:07

thing where you the the kind of

6:09

permanent evolving state of the software

6:10

is like part of what's important about

6:12

it. Part of what you need to like

6:14

optimize for when you're writing

6:15

software are these not like just the

6:18

functional properties of what the

6:19

software does, but the non-functional

6:20

properties around how extensible is it

6:23

and how easy will it be for be for

6:26

people in the future to understand and

6:27

what kind of performance problems are

6:29

you creating in the future and all these

6:30

things that like don't show up in the

6:32

kind of very smallcale fake environments

6:34

where you learn how to code and and you

6:36

need to do very different things to

6:37

learn to be good at it and and what do

6:39

you do?

6:40

>> Yeah. No, that is super super true. And

6:42

so I will I will I I did actually try to

6:45

solve that problem a little bit, but I

6:46

will also qualify my answer by saying

6:48

that my main goal was to get hired at a

6:50

software company, not to become a great

6:52

engineer yet. I think I knew somewhere

6:55

in the back of my head that becoming a

6:56

great engineer would require working

6:58

with other great engineers and you know

6:59

being mentored by them as indeed it did.

7:02

Um but basically what I did was was I I

7:05

I followed two tracks and I was on

7:06

paternity leave at the time which made

7:08

it easier because I could sort of do

7:09

this nights and weekends and like you

7:11

know basically I I studied a lot of

7:14

academic knowledge right all the stuff

7:16

that I had missed in college. I went and

7:18

learned about complexity theory and I

7:20

learned about the theory of algorithms

7:22

and I learned what a data structure is

7:24

and like all the stuff that everybody

7:25

else learns their sophomore year. Um, so

7:28

I sort of, you know, I jammed all that

7:29

into my head, you know, using a bunch of

7:31

YouTube videos and, you know, online

7:33

resources and and so on, which there's a

7:36

lot of these days. And then I also just

7:38

tried building things and I mostly

7:42

focused on things that were interesting

7:44

to me and things that were hard. And I

7:46

tried to pick a pretty broad set of

7:48

things that would force me to to learn

7:51

different skills. So, you know, I wrote

7:53

my own little ray tracer and it was like

7:55

a classic

7:56

>> pretty crappy ray tracer, but like I did

7:58

learn C++, you know, and I did learn a

8:01

lot about, you know, how to how to do

8:03

object-oriented programming and how to

8:04

do memory management and so on in the

8:06

course of that. Then I wrote a little

8:08

toy compiler, you know, and I, you know,

8:10

I wrote a little computer game and I

8:11

wrote, you know, I wrote like a bunch of

8:12

different I wrote a little graph

8:14

database. Um, you know, I did I did this

8:16

>> precient that was

8:18

>> Yeah, that's right. That's right. Turns

8:19

out turns out that those those well

8:21

those were actually a fad. They never

8:22

really took off.

8:23

>> Sure. Graph databases has not really

8:25

taken off but you know there's a lot of

8:26

database theory.

8:27

>> There's a lot of database theory. That's

8:28

right. Um and that actually is part of

8:30

what got me interested in databases and

8:32

what eventually led me to working at

8:34

Foundation DB which is where I did find

8:36

really great engineers who were able to

8:38

mentor me and and who made me actually

8:40

somewhat competent.

8:42

>> Got it. And then somehow from the work

8:44

at Foundation DB you ended up eventually

8:47

founding antithesis. Mhm.

8:49

>> So tell us about that.

8:50

>> Yeah, so Foundation DB was a magical

8:52

place. Um it was I mean I think in some

8:54

ways a little bit like Jane Street,

8:55

right? Like it's just one of these

8:57

places that you walk into and everybody

8:59

is brilliant and everybody is incredibly

9:01

humble and everybody is incredibly nice

9:06

and good at their jobs and it just hums

9:08

with this extraordinary energy. And one

9:12

of the brilliant things that had

9:15

happened at Foundation DB, it's a thing

9:18

that should happen in more software

9:19

projects, I think. You know, they sat

9:21

down and were like, we're going to build

9:22

a new kind of database. This is a kind

9:24

of database which at the time people

9:27

believed was literally physically

9:28

impossible to build because of a

9:30

misunderstanding of something called the

9:31

CAP theorem. And we we can get into that

9:34

more if you want. Um, but basically

9:36

basically they were like, okay, we're

9:37

going to try and build this new kind of

9:38

database. what do we need to have in

9:41

order to build this database? And they

9:43

realized that in order to build such a

9:45

system, you would be totally foolish to

9:48

do it without a powerful deterministic

9:51

simulation framework that could sort of

9:53

test the database in every possible

9:55

configuration, in every possible mode of

9:58

operation, you know, in all possible

10:00

network conditions and failure

10:01

conditions and so on, you know, with any

10:03

amount of concurrent user activity and

10:05

have that all be replayable

10:06

deterministically. And if you think

10:08

about for a second, it's like, yeah, you

10:09

would be foolish to build a database

10:10

without that. But, you know, they were

10:13

the only people I knew of who had

10:14

actually acted on that insight. And so,

10:17

they built this extraordinary system and

10:19

say like what is a deterministic

10:20

simulation framework,

10:22

>> right?

10:22

>> Right. There's like a few words there,

10:23

deterministic, simulation. I feel like

10:25

understanding how those play out is

10:27

maybe useful.

10:28

>> Right. Right. Right. Sure. So, um,

10:30

basically, let's let's start by talking

10:32

about property based testing in general

10:34

in the abstract. um like you know quick

10:37

check right from Haskell or I think

10:39

Okamel has its own property based

10:41

testing system right

10:42

>> every functional programming language

10:43

has at least three of them

10:45

>> right and then in Python you've got

10:47

hypothesis um so property based testing

10:50

the basic idea of it is I have some

10:53

piece of code rather than sit there and

10:55

write a bunch of unit tests that do like

10:57

particular things that I've thought of

10:59

ahead of time that take particular

11:00

actions I'm going to just tell my

11:03

testing framework what you can do to my

11:05

code like what actions you can take

11:07

right if it's like a little data

11:08

structure it's like maybe I can insert

11:10

an item and I can pop an item and I can

11:12

query for some item or something and

11:14

then you set up a bunch of randomized

11:17

generators which do all these things in

11:20

random orders and then you figure out

11:23

what the invariance of your program are

11:25

right like probably an easy one is it

11:28

shouldn't crash but like maybe a more

11:31

interesting one for a data structure is

11:32

like if I insert five things then

11:34

there's there's five things in it. But

11:36

actually, that's not a great one, right?

11:39

There's a higher order one, which is if

11:40

I insert n things and don't remove

11:42

anything, there's n things there. But

11:44

then we can make that even more abstract

11:46

and be like, if I insert n things and

11:48

then remove m things, so long as n is

11:51

bigger than m, you know, I'll have n

11:53

minus m things in there, right? And so

11:56

you can you can sort of get quite clever

11:57

with these things. And then the magic is

12:00

you now have not a test. You have a

12:03

thing that will produce an infinite

12:04

number of tests like so long as you keep

12:06

running it and it will basically try

12:09

your thing in many many more

12:11

permutations and combinations than you

12:13

would ever have thought of. That's the

12:15

basic idea of property based testing.

12:16

Right.

12:17

>> That's right. And these like classic

12:18

frameworks like quickjack in some sense

12:19

automate the the hardest part of this is

12:22

generating a good probability

12:23

distribution. And you were framing this

12:25

in terms of operations where you have

12:27

like sequences of operations on some

12:29

kind of system and that's already like

12:31

leaning a little more systemsy. I feel

12:32

like the classic functional programming

12:34

version is more like I'm going to test

12:35

my map data structure or whatever. And

12:38

then often like what you're putting in

12:39

is just you know like lists and whatever

12:43

shapes of containers or whatever that

12:44

you want to use for doing

12:45

straightforward things. And often you're

12:46

thinking about it less in terms of

12:48

sequences of operations and just like

12:50

some fairly broad shape of data that you

12:52

might want to put in. and you want nice

12:54

ways of generating good probability

12:56

distributions. The question of what

12:57

counts as a good probability

12:58

distribution is actually quite a

12:59

complicated one.

13:00

>> It is very complicated.

13:02

>> And so in some sense there's like two

13:04

things you need to specify. There's like

13:05

the properties are supposed to be true

13:07

and the probability distributions for

13:08

generating examples. And that's kind of

13:10

the whole bulk,

13:11

>> right? And so then one of the rules of

13:13

all human endeavors is that every good

13:15

idea is like rediscovered 17 different

13:17

times by different people who are in

13:19

slightly different subdomains and so

13:20

they didn't talk to each other and then

13:21

they create their own language and set

13:23

of concepts for it and it's all very

13:25

confusing

13:26

>> and this is also true of property based

13:27

testing which has been reinvented tons

13:30

of times and one of the most common

13:33

other you know one of the most

13:34

well-known other times it was invented

13:36

it was called fuzzing which is a very

13:38

very similar thing conceptually right

13:40

fuzzing is like more from the security

13:42

world. But if you squint, it's the same

13:44

thing. Like I have a property which is

13:46

my program shouldn't crash, shouldn't

13:48

have memory corruption, shouldn't have

13:50

security vulnerabilities. And then I'm

13:52

going to feed in a distribution and the

13:54

distribution happens to like look like

13:57

stuff to parse maybe that has errors in

14:00

it or has maliciously crafted content.

14:02

And I'm going to have a random generator

14:04

which is my fuzzer which is going to

14:06

like keep sending in stuff until I find

14:08

a failure of the property that I care

14:10

about. And this is like a totally

14:14

separate group of people who like solved

14:16

many very similar problems in some

14:19

different ways and in some similar ways.

14:21

And like the two sides just never talk.

14:22

>> That's right. And like the early

14:23

versions of fuzzing were like very

14:25

simple on the probability distribution

14:26

side. It's just like you know white

14:27

noise basically for throwing into things

14:29

for some of the very early research and

14:31

just like take the Unix utilities and

14:33

throw white noise at them and see what

14:34

happens. Y

14:35

>> uh and the language of properties was

14:37

incredibly impoverished. It was like not

14:39

much better than doesn't crash.

14:41

>> Yep. But the fuzzing people had a clever

14:43

trick which the property based testing

14:45

people did not have. The fuzzing people

14:47

realized that you don't need to make

14:49

this a blackbox process. You can

14:51

actually track things like code coverage

14:53

and you can see what your inputs make

14:57

your code do and then you can use like a

15:00

genetic algorithm or an evolutionary

15:02

algorithm to adapt your input

15:04

distribution as you go to find more and

15:06

more interesting behaviors.

15:07

>> That's right. You basically like have

15:09

these tentacles into the program and you

15:11

feel out where you are in the state

15:12

space and try and explore more of the

15:14

state space of which branches you've

15:15

gone through and all that.

15:16

>> Right. It's definitely like an extra

15:18

idea and and like you know a bunch of

15:20

the property based stuff came out of the

15:21

functional programming world which has

15:23

this oh we're going to derive prob

15:24

probability distributions from types

15:26

totally makes sense from that and this

15:28

like no no we're going to modify the

15:30

compiler and we're going to like do a

15:32

bunch of weird ad hoc stuff to like try

15:33

and exploit the state space it's a very

15:35

different but very good idea

15:36

>> yeah well the interesting thing is like

15:38

you are actually I mean you are trying

15:39

to solve the turn halting problem here

15:41

right we know you cannot do it we know

15:43

that there is no one technique that's

15:45

going to find all the bugs And so I

15:47

actually believe that the correct

15:48

response to that is just to like throw

15:50

everything at the wall and see what

15:52

sticks. like you should try and have

15:53

very clever probability distributions

15:55

and you should try to have you know

15:57

evolutionary algorithms and you should

15:59

have you know constraints and you know

16:02

constraint solvers and like I mean you

16:04

like do everything you can add some ML

16:06

like whatever like this is we're up

16:08

against a very hard problem and the nice

16:11

thing about a basket of tools is that if

16:14

you're careful about how you architect

16:16

them

16:18

no tool can like make the situation that

16:21

much worse But there are certain

16:23

situations where it can make it much

16:24

better. And so by having a broad

16:27

distribution of techniques, you're

16:29

likely to have something that works on a

16:30

larger space of programs.

16:32

>> Right. Particularly because we're doing

16:33

testing, right? It's just like you do an

16:35

extra thing. It takes some time. That's

16:37

right.

16:37

>> But it doesn't break anything. It's just

16:38

like if if it was the worst thing it can

16:40

do is not find any bugs for you.

16:41

>> That's right. And you have to be a

16:43

little bit more careful about that once

16:44

you have like sophisticated evolutionary

16:47

tactics, right? Because it could be that

16:50

some technique you use like pollutes

16:52

your distribution in some way that makes

16:53

it harder to find other bugs. But you

16:56

know that just means you don't have to

16:57

be you have to be not be totally naive.

16:59

>> Got it.

16:59

>> Yeah. So okay. So there's all these

17:02

people doing randomized testing. And

17:05

what's interesting

17:07

is nobody

17:10

until very recently had ever applied any

17:13

technique like this to what I would call

17:16

real software. And this is like not a

17:19

knock on hasll or you know or or or

17:21

small functional data structures.

17:22

Certainly not a knock on parsers written

17:24

in C and C++. What I mean by that is

17:26

like nobody fuzzed or used property

17:29

based testing on a database or on a

17:32

computer game or on a large distributed

17:34

system or on an operating system or a

17:37

kernel like people people have lately

17:40

started to do these things but by and

17:42

large it was not happening until quite

17:44

recently. I feel like it wasn't common,

17:46

but is it really that it wasn't done at

17:47

all? Like I' I've talked to like John

17:49

Hughes about stuff that the quick check

17:50

folk did where they like, you know,

17:52

worked with like auto manufacturers for

17:54

fuzzing their like, you know, super

17:56

weird network inside of the computer

17:58

>> and things like that. So, I feel like

17:59

there is stuff that like I think should

18:01

qualify as real software that's more

18:03

than like the traditional like toys to

18:05

which the stuff is applied. There's at

18:06

least been some commercial applications.

18:07

>> I think I think people did some of it,

18:09

but I would say it was vanishingly rare.

18:12

Um, I mean, all of these techniques

18:13

maybe arguably are like vanishingly

18:15

rare. Like to a first order

18:16

approximation, like 0% of people use

18:18

them,

18:19

>> but but but I think it was especially

18:21

uncommon to try and use it on big stuff.

18:24

>> Yeah. I mean, I think it's felt

18:26

relatively niche. I think that's I think

18:28

there are things that qualify as more

18:29

serious applications of it, but like

18:31

much rarer than they deserve to be

18:33

applied or something.

18:33

>> And and basically, I think that this is

18:35

actually for somewhat good reason. Um,

18:37

so when it when you have big software,

18:39

big complicated software, you sort of

18:42

have and I I I promise I'm getting back

18:43

to your original question, which is what

18:44

is deterministic simulation testing? Um,

18:47

basically when you have big complicated

18:48

software, there's two things that get

18:52

dramatically harder.

18:54

The first thing is the state space of

18:56

the software that you are trying to

18:58

explore is really complicated. And it is

19:02

probably complicated in such a way that

19:04

the fuzzing trick of just you know

19:08

recording code coverage is no longer a

19:10

very good map for where you have gotten

19:12

in the software. Right? Consider

19:14

something like a Python interpreter. If

19:17

you hit 100% code coverage in that you

19:20

have not gotten anywhere close to

19:21

exhausting its behavior. or consider

19:24

something like

19:24

>> and that's and that one is just because

19:26

like the state space is much bigger than

19:28

just like where you are in each branch

19:30

of the code like your code location

19:32

doesn't tell you that much about the

19:33

state space there's like lots of other

19:35

things going on that

19:36

>> what's what's in various variables like

19:38

what's in memory like all this other

19:39

stuff and if you try and like take the

19:41

cartisian product of that with all the

19:42

coverage you're just it's like way too

19:44

big and you're not going to make any

19:45

progress um you know or consider a

19:48

distributed system right where just what

19:51

coverage you have gotten might be less

19:53

important than like what order you have

19:56

encountered coverage across different

19:57

nodes in some distributed algorithm. Um,

20:00

and so basically

20:03

knowing where you are and fully

20:04

exploring the program becomes harder

20:08

both from the fuzzing philosophy of

20:11

we're going to use signals like coverage

20:13

to determine where we are and it also

20:15

gets harder from the like PBT philosophy

20:18

of we're going to have really clever

20:20

intelligent random distributions because

20:22

basically you have to just get lucky so

20:24

many times in a row to get something

20:27

useful happening that you're you you

20:30

just kind of it's intractable to solve

20:32

the problem purely that way.

20:33

>> Right. You more or less probably can't

20:35

do it fully obliviously. Right. That's

20:37

right. The oblivious thing where you

20:38

have the distribution chosen ahead of

20:39

time and you're just throwing things at

20:40

the system like you kind of have to be

20:42

responsive to the state of the system if

20:44

you're going to get the right kind of

20:45

coverage. Although it's worth saying

20:46

like when you say covering the like you

20:48

never actually cover the state space,

20:49

right? The thing that you're doing is

20:50

always weirder and more huristic because

20:53

the actual state space is like highly

20:55

exponential. Yes. And so you will not in

20:58

any reasonable testing budget be able to

21:00

test any appreciable fraction of it. So

21:02

there's some weird question of like

21:03

taste of like which vanishingly small

21:05

subset of the scenarios is it important

21:07

for you to cover.

21:08

>> Yes, totally true. And and we will come

21:11

back to that. That is like there's that

21:12

that's like right you want to cover all

21:14

of the interesting parts of the state

21:16

space and you want to try and do it as

21:18

quickly as you can and and that is a

21:20

whole another dimension along which this

21:22

is hard. Um, okay. So, then there's a

21:25

second problem with these larger

21:26

systems, more quote unquote real

21:28

systems, which is that they don't really

21:32

look they don't really look like the

21:35

kinds of systems that people have

21:36

traditionally applied fuzzing and

21:38

property based testing to in in two kind

21:40

of ways. One is that they tend to be

21:42

interactive, right? They tend to not be

21:45

things that accept an input and then do

21:48

a bunch of computation and then crash or

21:50

don't, right? which is kind of what

21:52

fuzzing is optimized for, right? They

21:54

tend to be things that take a little bit

21:55

of input and then send you a response,

21:56

then get a little more input and then do

21:58

something. Like imagine a web server or

21:59

a computer game. It's like got this

22:01

interactive flow to it. Um, which makes

22:03

the whole fuzzing model of like I'm

22:05

going to come up with what is a good

22:08

input to break the system and send it in

22:10

and see what happens a little bit more

22:12

complicated. Then the second thing which

22:16

which makes the state space exploration

22:18

problem even harder is that these

22:19

systems are all non-deterministic. And

22:21

this is like this is in some ways I

22:23

think the crux of it because basically

22:26

computers are machines right they're

22:28

like real physical machines in the real

22:31

world and in order to make those

22:33

machines really efficient you know CPU

22:37

designers have done all kinds of evil

22:38

and awful things to make them that that

22:41

have this side effect of making them

22:43

non-deterministic meaning that if you

22:45

try and perform the same computation on

22:48

the same computer twice with all the

22:51

same inputs. Once you have things like

22:53

threads involved, once you have things

22:55

like timers, once you have things that

22:57

need to interact in any way with the

22:59

real world, with network sockets, with

23:01

hard drives, suddenly your computer

23:04

program is not a pure function, right?

23:06

Unless you have written it in Haskell

23:08

and have been very very careful. Um it's

23:10

it's g it's a big complicated weird

23:13

state machine with all kinds of

23:14

co-effects from the environment that can

23:17

mean mean it does something totally

23:19

different each time you run it.

23:20

>> Yep.

23:21

>> Okay.

23:22

>> Although although one of the weird

23:23

paradoxes of this is it is often the

23:26

case that the individual components are

23:28

actually all very close to

23:30

deterministic. It's just that they

23:32

wildly depend on initial conditions

23:35

>> and their behavior is kind of chaotic

23:38

and diverges from predictable things. So

23:40

it's like you know the you know actually

23:42

the thread scheduler is a completely

23:44

deterministic program in some sense

23:46

right except and timers the timers like

23:49

work largely deterministically but like

23:51

your memory you know it doesn't always

23:53

have the same latency there's like a

23:55

cycle where the memory gets refreshed

23:56

and it'll block out for a very little

23:58

piece of time and you know did you start

24:00

your program at exactly the same time in

24:02

the memory refresh cycle the two times

24:03

that you ran it like probably not and

24:05

then like all of these things compound

24:07

and multiply as you have multiple

24:08

systems talking to each other the small

24:10

differences become big differences and

24:12

effectively this nondeterminism kind of

24:14

gets like pulled almost out of nothing.

24:16

>> Yeah, that that is a fantastically

24:18

accurate intuition and we have actually

24:21

we haven't started talking about our

24:22

technology yet but like we have actually

24:25

we we were able to measure that

24:26

intuition like we can empirically tell

24:28

you what the leaponov exponent of your

24:30

software is and like what its chaotic

24:32

doubling time is. And it turns out that

24:34

for Linux it's insanely fast. Like

24:37

basically if you change one bit in the

24:41

memory of a Linux computer, the whole

24:45

state of the system is completely

24:46

different like within tens of

24:48

microsconds. It's it's actually crazy.

24:51

>> That's shocking.

24:51

>> Yeah, it's it's nuts. It I I did not

24:54

believe it. Um but it's true.

24:55

>> Yeah, I'm still not sure I do, but

24:57

>> I I can I can I can show you I can show

24:59

you. Um okay, anyway. So So why is this

25:02

nondetermism so bad? So it's bad for two

25:04

reasons. The more obvious reason is it

25:06

means that if my I yeah you I do my cool

25:08

fuzzing property based testing thing I

25:10

run some fantastically expensive

25:11

computational search I find the bug

25:13

that's going to ruin my life and then

25:16

you know if I don't have exactly the

25:18

right logging in place if I can't just

25:20

look at the source code and oneshot the

25:22

bug I may never make it happen again and

25:25

that is very very frustrating now my

25:27

testing system has just made me feel bad

25:30

right

25:31

it's that's not that's

25:33

>> something is wrong

25:34

>> that's right Good luck.

25:35

>> You'll never know what it is until you

25:36

find out at 3:00 a.m. when your pager

25:38

goes off. Um, so that sucks. Then

25:41

there's a second problem with it, which

25:43

is that it makes the fuzzing trick of

25:47

look at what inputs have made me do

25:49

useful things so far and then try small

25:51

modifications on those inputs break down

25:53

and become much less performant. Because

25:56

if putting the same input into the

25:57

system again might not get me to the

25:59

same point in the state space, then

26:01

putting a slightly tweaked one is extra

26:04

maybe not going to get me to the same

26:05

point in the state space. And so this

26:07

like optimization loop that all of

26:10

fuzzing kind of implicitly depends on

26:13

doesn't work very well. You basically

26:14

need the fact that there's like a kind

26:16

of random input like more or less your

26:18

random number generator and like a

26:20

function from that into the behavior and

26:23

you really want that function to be a

26:24

real function. That's right. Which you

26:26

can always run and get the same answer

26:27

so that you can actually explore that

26:29

space. Whereas if like every time you

26:30

try it there's just like a new version

26:32

of the function that like is spiritually

26:34

similar but like has all the all

26:36

different behavior.

26:37

>> It makes fuzz degrade into random

26:38

guessing. That's right. That's okay. So

26:40

that brings me back to what is

26:42

deterministic simulation testing. And

26:43

the idea here is the somewhat crazy one

26:45

of like we can sidestep all these issues

26:48

if we just make all of the software

26:50

deterministic which sounds a little bit

26:53

insane and maybe like a little bit

26:54

useless like it's it's like you know

26:56

assume you had a can opener. How do you

26:57

make your software deterministic? And

26:59

that's a very fair criticism up until

27:02

the existence of antithesis which I will

27:03

get to later has kind of solved this

27:06

problem for people. But in the absence

27:08

of that, what we did at Foundation DB

27:10

was we wrote our software in such a way

27:13

that it could be run completely

27:15

deterministically. So we could simulate

27:17

an entire interacting network of

27:20

database processes within one physical

27:22

Linux process with deterministic task

27:25

scheduling and execution with fake

27:28

concurrency with mocked implementations

27:31

of communication with networks and with

27:33

disks. Right? we could cause database

27:35

processes to have simulated failures and

27:38

restart. We had to do all this with no

27:40

dependencies whatsoever, right? Because

27:42

as soon as you add a dependency on

27:45

Zookeeper or, you know, Kafka or some

27:48

some other program like you lose this

27:51

ability to run in this totally

27:53

deterministic mode. But it made us so

27:56

much more productive to be able to test

27:59

our software this way that it was worth

28:01

it to us to not have any dependencies.

28:04

>> So is it fair to say that the key

28:05

enabling technology here is dependency

28:07

injection? Like you have a bunch of APIs

28:09

that let you interact with the world.

28:11

Like most of what you write in a usual

28:12

program are in fact deterministic

28:14

components. Like you know you do some

28:16

computation, the result is

28:17

deterministic. But there are some things

28:19

that you do that aren't. Like you ask

28:21

what time it is. Mhm.

28:22

>> It's like, well, now you're really two

28:23

different pieces of hardware, right?

28:25

There's like a clock and a CPU and

28:26

they're interacting and like, who knows

28:27

what's going to happen when you ask what

28:29

time it is. You send or receive a

28:30

network packet. You ask for something

28:31

from disk. So, the thing you can do is

28:34

you can just like enumerate all of the

28:35

APIs that you have that introduce

28:37

non-determinism and just have them have

28:40

two modes. There's like the regular

28:41

production mode where it hits the real

28:43

world and is non-deterministic. And then

28:45

there's test mode where you just have

28:47

control and you can behind all of those

28:49

calls you can have a simulation that

28:51

does that gives sort of the response to

28:53

the API where you have control over it

28:55

and you can thereby force it to be

28:57

deterministic. Is that like the basic

28:58

trick

28:58

>> right? Well that's the basic trick but

29:00

it you you're left with one really

29:03

really really big problem which is

29:04

concurrency. Like if your if your

29:07

program you know even if your program

29:09

only runs on one computer you probably

29:10

have threads and then the OS is going to

29:12

schedule them in like god knows what

29:14

order and you know they also by the way

29:16

will take non-deterministic amounts of

29:19

time to execute actions you know thank

29:21

you Intel um you know and thank you

29:23

everything else running on your computer

29:24

right

29:25

>> well I mean thank you Intel because if

29:26

they didn't do that things would be way

29:27

slower

29:27

>> super true um so so that's you know that

29:31

that you know people can people can

29:33

solve that right like there are

29:34

languages is with sort of cooperative

29:37

multitasking

29:39

um models of concurrent programming uh

29:42

which you know which which you can

29:43

actually plug in a deterministiculer and

29:45

and make that all work.

29:46

>> But then if you have multiple processes

29:49

running on different computers now

29:51

you're really in trouble. Right now you

29:54

know how long did it take that network

29:55

packet to get from this computer to that

29:57

other one is something that's completely

29:58

outside of your control. And if you want

30:01

to try and run them all on the same

30:02

computer, you need to create some way of

30:06

faking

30:08

processes on different computers running

30:10

on the same computer in some sort of

30:12

cooperative multitasking runtime so that

30:14

you can make it all deterministic. And

30:17

there are people who have done that. You

30:19

know, we did it at Foundation DB. I

30:20

think you guys did it at Jane Street.

30:22

>> That's right. Yeah. One of the reasons I

30:23

sort of know the the bag of tricks is

30:25

that this is more or less exactly what

30:26

we have done and hit the exact kind of

30:28

same set of issues and uh the same basic

30:31

uh commitment to like we will write all

30:33

the code ourselves. We had kind of

30:35

weirdly fallen into by using an obscure

30:37

programming language. to like you know

30:38

we had this whole whole O camel

30:40

ecosystem where we had really deep

30:41

control over the whole thing and so yeah

30:43

a lot of our systems not all of our

30:45

systems but a bunch of our systems are

30:46

built in this way where we have this

30:48

kind of endto-end control and can do

30:50

this kind of deterministic simulation

30:52

and it's absolutely critical for all the

30:54

reasons you said it really helps you go

30:55

faster in many different ways

30:57

>> yeah like I I think I something I

31:00

haven't said yet is this all sounds like

31:02

a lot of work and it is a lot of work

31:04

but it was so gamechanging at foundation

31:06

DB like that company could not have

31:08

existed without this technology. We

31:10

built a thing that everybody thought was

31:12

impossible with a team of like 10 people

31:15

and we did it really really fast and we

31:19

did crazy things that nobody would ever

31:21

dare to do without a testing system like

31:23

this. I mean I'll give you two examples.

31:26

Um one was we deleted all of our

31:28

dependencies, right? And in particular,

31:30

we deleted Apache Zookeeper, which we

31:32

had been using as our implementation of

31:35

consensus, like of Paxos. And like

31:37

nobody writes their own Paxos

31:38

implementation. That's like a that's

31:40

like a thing that insane people do who

31:41

want to like have bugs. And we did it.

31:44

And our new one was less buggy than the

31:46

one the officially good one from

31:48

Zookeeper that everybody uses. Um, you

31:51

know, later we basically deleted and

31:54

completely rewrote from scratch our like

31:58

core database concurrency control and

32:02

conflict checking algorithm to make it

32:04

more parallelizable and more scalable

32:06

and faster which again is just like a

32:09

totally crazy thing to do. Like I don't

32:12

know of other databases that once they

32:14

have gotten that piece working have

32:15

rewritten it, let alone like rewritten

32:17

it to make it more theoretically

32:19

scalable and like cleaner, you know,

32:21

that's just like nuts. But if you have a

32:23

system that can find all the bugs really

32:25

really fast, it frees you to just do

32:27

crazy stuff like that.

32:29

>> Okay, so this is seems like a great

32:32

idea. We think it's a great idea, which

32:33

is why we've done it. Foundation DB

32:35

thought it was a great idea. It's also

32:37

like totally impractical. totally

32:39

impractical because like the whole thing

32:40

of like we'll just do everything from

32:41

scratch. It's like okay yeah maybe a

32:42

database system should do that and like

32:44

maybe some like crazy trading company

32:46

that made a decision 20 years ago to

32:48

like use a weird tech stack can do that

32:51

for all sorts of weird reasons but like

32:53

it's not like a generalizable tool,

32:55

>> right?

32:56

>> And Antithesis is trying to be a company

32:57

that sells a generalizable tool. So like

32:58

what how do you go from the good idea

33:00

that's totally impractical to like a

33:02

thing people can use,

33:03

>> right? So basically we've talked about

33:04

how there's sort of two key obstacles to

33:09

making a really really powerful

33:11

randomized testing system you know what

33:14

we call an autonomous testing system

33:16

that can find all your bugs really

33:17

really fast. One is need to, you know,

33:21

actually explore the state space

33:22

extremely quickly and find all the bugs.

33:24

And the other is this determinism issue

33:26

which both impacts the usefulness of

33:28

finding those bugs and also makes it

33:30

just harder to explore the state space.

33:32

And basically what we're trying to do is

33:35

the absolutely insane hubristic goal of

33:39

solving both those problems in full

33:41

generality for every piece of software

33:43

in existence. And so the so the

33:48

basically the the important thing is we

33:50

solve them in the reverse order.

33:51

>> Um so once you solve determinism that

33:54

actually gives you a huge leg up in

33:58

efficient state space exploration for

34:00

all the reasons we've already talked

34:01

about and I can go into more detail

34:03

about how we use that.

34:04

>> Um okay so how do we solve determinism?

34:06

That sounds kind of hard because as

34:08

we've just talked about all kinds of

34:09

things that you want to do on a computer

34:11

are nondeterministic. So there's other

34:14

people who have tried to do this. Um,

34:17

you know, there's people who use

34:19

frameworks, right? Like the one that you

34:21

guys have at Jane Street or like the one

34:23

that we built a Foundation DV. There's

34:25

since been a bunch of open source ones

34:27

built for various programming languages

34:29

and runtimes. Um, that's cool. It only

34:32

helps people who are committed to using

34:35

that framework, willing to write all of

34:36

their software with that framework, not

34:38

use any dependency. It's not in that

34:40

framework. It's not general, right? Yep.

34:42

not a general solution. Can't do it that

34:43

way. There are people who have tried to

34:46

solve this problem with record and

34:49

replay where basically like as I'm

34:51

running my program, I write down the

34:53

result of every single system call in

34:55

the exact moment at which it was

34:56

delivered. And then if I want to run my

34:59

program again, I can just replay all of

35:02

that without actually talking to the

35:03

system. And that works pretty well for a

35:06

thing running on a single node. Doesn't

35:08

work very well for distributed systems.

35:10

It's also just not very scalable. It's

35:12

>> although there's a critical idea that

35:14

you snuck in there which is where you

35:16

said the word sis call, right? So the

35:18

whole like the kind of foundation

35:19

dbjain/ whatever version of doing this

35:22

at the library level is like there are

35:24

particular function calls inside of a

35:26

language that you're going to make

35:28

swappable. But here what you're doing is

35:30

say you know what actually we're going

35:31

to do this at the OS level. Yes. Right.

35:33

at the bottom actually all the

35:34

non-determinism generally comes in from

35:36

the operating system and from

35:38

concurrency and concurrency is somewhat

35:40

mediated by the operating system. So the

35:42

SIS but system calls are anyway one huge

35:45

source of non-determinism. And so the

35:46

idea of these kind of you know record

35:48

replay things are we're just going to do

35:50

the dependency injection at that level.

35:52

And we've already now stepped up a big

35:53

level in generality. Right. I no longer

35:55

have to own your programming language.

35:57

>> Right. Right.

35:58

>> It's gotten better.

35:59

>> It's a big step.

36:00

>> We're not there yet though.

36:00

>> Okay.

36:01

>> So we're not we're not there yet for two

36:02

reasons. One is it's still not fully

36:04

general. Right. This is only going to

36:06

work for the operating systems that

36:08

you've designed this to support. And

36:09

maybe that's okay. Maybe you think it's

36:10

fine because everybody uses Linux, but

36:12

like

36:12

>> weirdly, you know, seem to be true now.

36:14

>> People people run IO write iOS apps,

36:16

man. Like people, you know, people

36:18

people write computer games that mostly

36:19

run on Windows. There's there's others

36:21

out there.

36:22

>> Um, but I think, you know, also also

36:25

doesn't work great for distributed

36:26

systems, although you can kind of hack

36:28

it and there's a few people who have.

36:29

>> Um,

36:30

>> actually, why doesn't it work great for

36:31

distributed systems? The sys call layer

36:33

gets, you know, it gets you a hook into

36:35

all the distributed like all the

36:37

distributed communication comes again

36:38

through the OS. Yeah.

36:39

>> So why why can't this generalize to

36:40

that?

36:41

>> Ba basically all of the record replay

36:43

systems out there are designed to do

36:45

this for one process.

36:46

>> Got it. So it's not so much a

36:47

fundamental question as an engineering

36:48

question.

36:48

>> Correct. Correct. It's just like the the

36:50

UX is not very good.

36:51

>> Sure.

36:51

>> Um but I think the more fundamental

36:54

limitation of these things is the

36:55

scalability problem, right? Like it is

36:57

just a vast amount of data to write down

36:59

every single SIS call that your thing

37:01

ever did. You're already doing a

37:02

computationally expensive search. You

37:04

really don't want to like hugely

37:06

increase the overhead of that. And it

37:09

doesn't actually get you true

37:10

determinism.

37:11

>> It lets you replay a non-deterministic

37:14

run.

37:14

>> Correct.

37:15

>> But it doesn't let you play a deter It

37:17

doesn't let you play things out a

37:18

deterministic way cuz every time you do

37:20

a thing you haven't previously captured.

37:22

>> Correct.

37:22

>> You just got to do it.

37:23

>> Exactly.

37:24

>> Right. Exactly.

37:24

>> So it's like it's a weird halfway house,

37:26

right?

37:26

>> Exactly. So basically what we decided to

37:29

do was just go another step beyond that

37:32

and say okay we're going to do the

37:34

dependency injection as you put it at an

37:36

even lower level. Let's just get under

37:39

the operating system and let's implement

37:42

a deterministic computer which is a

37:45

thing that you can do these days without

37:48

creating custom silicon because people

37:50

have virtual machines. Hooray. So

37:52

basically we just have to write a

37:53

hypervisor that emulates a fully

37:56

deterministic machine and then we don't

37:57

have to touch your OS at all. We don't

37:59

have to touch anything you do at all.

38:00

You can just run your stuff unmodified.

38:02

Right. And so your your like crazy hard

38:05

thing to do is possible because people

38:07

did a super weird crazy hard thing to do

38:10

years ago. And this was like part of the

38:11

historical failure of the operating

38:12

system where it's like oh we're going to

38:14

use the operating system like back in

38:16

the 60s or 70s where like it's have

38:18

these multi-user operating systems. are

38:19

going to have ways of isolating

38:21

different programs from each other and

38:23

stuff and then like some number of years

38:24

we're going to be like oh yeah none of

38:26

this works actually Unix is like very

38:27

badly designed and doesn't solve any of

38:28

these problems so instead we're going to

38:30

have a new abstraction where we are

38:31

going to like simulate things at the

38:33

level of machines the hypervisor is

38:35

basically the computer on that sort of

38:38

whose upward interface it exposes is a

38:40

fake machine and lets you run different

38:42

virtual machines on that hypervisor

38:44

>> and then once you have the hypervisor in

38:46

some sense the path is clear right

38:48

that's the layer at which like in some

38:50

sense before we said oh all the

38:52

non-determinism comes from the operating

38:53

system

38:53

>> but no it comes from the CPU

38:54

>> it comes from the well

38:55

>> it comes from the hardware from the

38:57

timers right it comes from

38:59

>> all the different pieces of hardware

39:00

introducing that so you got to be like

39:01

oh we just got that's the layer that at

39:03

which the nondeterminism comes and

39:05

that's the layer at which we can instead

39:06

do a deterministic simulation of what a

39:08

machine is

39:09

>> correct and our hypervisor is a little

39:12

bit more ambitious than just being a

39:14

deterministic hypervisor which was

39:16

already kind of hard but in order to

39:19

make this really work well, it also

39:21

needs to be really fast, right? Close to

39:24

native speed or even in some weird cases

39:26

a little faster than native speed for

39:29

most code, which is an interesting thing

39:31

that we have pulled off. But then

39:33

there's another property that's also

39:35

really important, which is we are trying

39:38

to do this huge branching exploration

39:41

through the state space of a computer

39:43

system. And so if we're running down

39:47

multiple branches on the same physical

39:50

host that is running the hypervisor,

39:52

it's really annoying if we have to like

39:55

store a separate copy of the memory for

39:57

each of the guest operating systems

39:58

that's running inside of it. That would

40:00

be a lot of RAM, right? And so what we

40:03

do instead is we dduplicate memory pages

40:07

at the host level using copy on write.

40:10

So that if you know one of the guests is

40:13

doing something and it doesn't affect

40:15

some particular page in memory it just

40:17

inherits a copy of that from its

40:19

ancestor and you know sibling VMs can

40:22

just be addressing the same underlying

40:24

memory on the host system which means

40:26

that we can do this with massive

40:28

concurrency on very big computers and

40:30

you know explore really fast.

40:32

>> Got it. Okay. So this kind of like

40:35

brings into focus like what is the thing

40:37

that antithesis is providing in the end,

40:39

right? It's trying to give like all of

40:42

the upsides you described of having this

40:44

very powerful testing system that can

40:46

efficiently explore lots of different

40:47

behavior.

40:48

>> Um but it does it in a way where the

40:52

amount of work that you have to do to

40:54

use the system is very low.

40:55

>> That's right.

40:56

>> It's just like it's like what is your

40:57

API to antithesis? It's actually what

40:59

you're doing already. like you threw a

41:00

bunch of stuff in a Docker container

41:02

before, you throw a bunch of stuff in

41:03

Docker container, now you're just like

41:05

running a VM somewhere. It's like, yeah,

41:07

you just run a VM somewhere else. You

41:10

run a VM on on antithesis's servers and

41:12

then they can get they they get to like

41:15

use all of this fancy tech to make it

41:17

efficient and be able to do all this

41:18

exploration and like you don't have to

41:19

do anything clever to make your system

41:22

testable. That's right.

41:23

>> It's just like

41:23

>> we magically find all the bugs and

41:24

they're magically reproducible. That's

41:26

right. It's very straightforward and you

41:28

know and and the key there right I said

41:30

we magically find all the bugs that's

41:32

the second really hard thing I mentioned

41:34

right once you've made the system

41:36

deterministic you still need to find all

41:38

the bugs right you still need to do this

41:40

state space exploration and you now need

41:43

to do it because you've enabled

41:45

exploration of way more complicated

41:48

computer programmers than parsers you

41:51

know and little data structures written

41:53

in hasll and so on you now need really

41:57

really smart state space exploration,

41:59

but because we have determinism, we can

42:01

be smart about it. It doesn't degrade

42:03

the random search. And so we've also

42:05

got, you know, a whole large chunk of

42:07

our company that is like doing

42:09

fundamental research on how to like do

42:11

this data exploration faster and more

42:14

efficiently for wider and wider classes

42:15

of programs. So to jump back for a

42:18

second to like the initial framing of

42:19

like this is all kind of comes out of

42:21

like property based testing in a sense.

42:23

We spent an enormous amount of time

42:25

talking about one half of property based

42:26

testing which is essentially the random

42:29

generation of the the generation

42:31

essentially of the probability

42:32

distributions right how you explore the

42:34

space and a bunch on the mechanics of

42:36

how you run it but very little on the

42:38

properties

42:39

>> right and like you know if you want to

42:40

find all the bugs right you have to know

42:43

what the program's supposed to do in the

42:45

first place. So like how do how do

42:47

properties fit into this story?

42:49

>> Right. So so this is actually a little

42:52

bit easier than people think it is. Um

42:54

and I believe that like I think a lot of

42:56

the problem here actually is that

42:58

property based testing was invented by

43:00

like mathematicians and functional

43:02

programming people who were thinking of

43:04

it in the same you same area as like

43:07

formal methods and stuff like that. You

43:08

know, my colleague David McKver calls

43:10

this the original sin of property based

43:12

testing is that like people were coming

43:14

from this very very mathematical

43:16

background and so they they were

43:17

thinking of it as like you have to

43:18

exhaustively enumerate all of the

43:21

properties of your system. And my belief

43:24

is that you don't actually have to do

43:26

that. And the reason why I don't think

43:28

you have to do that is that computers

43:31

and computer programs are very chaotic

43:33

and they are very good at escalating any

43:37

misbehavior of your program into much

43:40

more obvious and extravagant

43:41

misbehavior. And so you can actually

43:44

catch a very very large number of bugs

43:46

with a partially specified system. So to

43:49

give you a concrete example of this,

43:51

right? Like if I have some memory

43:53

corruption in my C++ program, you know,

43:56

and I don't have asan enabled, so I'm

43:58

not going to find the memory corruption

44:00

directly, that could still manifest in a

44:01

lot of ways. It could manifest as my

44:03

program giving wrong answers. It could

44:06

manifest as like weird garbage or

44:08

glitches, you know, in in some response

44:10

I get. It could manifest as a crash. It

44:14

could manifest as an infinite loop. it

44:16

could manifest as like corruption of

44:18

some other random invariant in my

44:20

program somewhere. And so if I have a

44:22

property that's set up to catch any of

44:24

those things, there's like a decent

44:26

chance that when I shake the box enough,

44:28

I will be able to detect that bug even

44:30

though I haven't thoroughly specified

44:32

every aspect of its behavior. And that

44:34

same that same idea actually it actually

44:38

is true for much broader classes of bug

44:40

than than memory corruption. So I think

44:43

what you're saying is like true for a

44:46

part of the space, but I don't think

44:48

you're going to get all the bugs that

44:49

way, right? There are I think there are

44:51

lots of areas and I think we as as

44:53

computer scientists and really as

44:55

software engineers rely on this kind of

44:57

brittleleness property all over the

44:58

place, right? Where like you know the

45:00

fact that you can like find the bugs

45:02

that you can

45:02

>> it's actually kind of why normal

45:05

non-randomized testing works so well. I

45:07

think

45:07

>> that's right. But I I also think whether

45:09

it works depends on the kind of things

45:11

you're doing and the way that the code

45:13

is structured in important ways. Like

45:15

the most obvious exception to this is

45:17

numerical bugs

45:18

>> where like numerical bugs just don't

45:21

show up this way. Like you get the

45:22

calculation a little bit wrong and then

45:24

like you know your curve doesn't go up

45:26

into the right quite at the speed that

45:27

you want it to but it's often very hard

45:30

to get any kind of bright line

45:31

demonstration that you've done something

45:32

wrong and know where you've done

45:34

something wrong.

45:34

>> That's right.

45:35

>> Um I think there are other properties

45:36

too. I mean from archive if you're if

45:37

you're building a trading system and

45:39

like the trading system might operate

45:41

perfectly well and it never like breaks

45:43

but like it's just like more aggressive

45:45

than it should be. it sends larger

45:46

orders more often or less often or not

45:49

placed quite properly in the book. And I

45:52

think if you don't do a good job of

45:55

specifying the properties, I think those

45:58

kind of systems are very hard to test

45:59

and this kind of coarse grained well

46:01

let's kind of look for like gross

46:03

misbehavior and shake the box a lot is

46:05

just like not going to get those things

46:06

at all.

46:07

>> Yeah. So totally agree with you. What

46:09

I've said so far only covers a subset of

46:11

the bugs. Um, I think that there are a

46:14

lot of other ways to add and refine

46:16

properties incrementally. Like the key

46:18

like I, you know, I am interested in how

46:20

to do this absolutely perfectly because

46:22

I'm a testing fanatic, but I'm also like

46:24

a pragmatic business owner, right? So I

46:26

want to give customers like an easy way

46:28

on which is just, you know, add the most

46:31

basic possible properties that all

46:32

software should have. And then I want to

46:34

give them a nice gentle ramp towards

46:36

more advanced usage. And I think what

46:39

the nice gentle ramp looks like for most

46:42

people who are not sophisticated

46:45

property based testing experts is

46:48

actually others have already done it for

46:50

us. I think it looks like observability

46:51

and alerting like if you think about a

46:54

system like cloudatch or a system like

46:56

data dog or whatever they have already

47:00

built in some sense like the the second

47:03

half of a property based testing system

47:05

right you can specify what you don't

47:08

want to see and then you can define

47:10

alerts on those things hey memory of

47:12

this thing should never exceed this

47:13

number oh my gosh if you ever see this

47:16

log message I want to be alerted right

47:18

away those are all properties

47:20

like they're not very good properties,

47:22

but they're

47:25

and I think the reason the main reason

47:28

they're inadequate is that with

47:29

something like Cloudatch or something

47:31

like Data Dog, you only find out that

47:33

those properties have been violated when

47:36

your customers do. Um, right? If you

47:39

could move that experience, that UX into

47:44

the testing world into before you

47:46

deploy, I think it's actually an amazing

47:49

sort of interactive way of defining and

47:52

then refining your systems properties.

47:55

And I think it's a thing that's like

47:56

actually quite accessible and intuitive

47:58

to quote unquote normal developers.

48:01

I see why you say that, but I worry

48:03

actually that observability style

48:05

approaches will scale very poorly

48:07

because part of what as I understand it,

48:10

the antithesis approach relies on is the

48:12

ability to take the work you're doing,

48:14

the testing work you're doing and run it

48:16

at kind of massive like incomprehensible

48:19

scale. Mhm.

48:20

>> And I think observability rules tend to

48:23

rely on the fact that you know you see

48:26

the things as often as they happen in

48:28

real life and so you can get away with

48:30

like soft properties that aren't quite

48:32

the properties that you care about but

48:34

are like indicators of and predictors of

48:37

the things that you care about. So there

48:39

you know you sort of get to specify

48:41

things to flag you and the key thing is

48:43

to make them not flag you incorrectly

48:46

too often. But I feel like something

48:48

like antithesis depends pretty

48:49

critically on the violations being real

48:53

at a high rate because otherwise you're

48:55

just going to like antithesis is going

48:56

to say, "Oh, we did your run and you

48:58

have like 68 million exceptions. You

49:00

might want to look at which ones are

49:01

real."

49:01

>> Totally. Totally. You should definitely

49:03

not take every single thing that you

49:06

know that you that you would find

49:07

interesting in your observability system

49:08

or whatever and and and turn it into a

49:11

property, right? But I think taking the

49:13

ones that would paid you and turning

49:15

them into properties is a great way to

49:16

get people who have never thought about

49:18

property based testing to start thinking

49:21

about what the properties of their

49:22

system might be. I think the other thing

49:25

that can help here is like a little bit

49:27

of the Socratic method, right? Like a

49:30

thing I found when talking to customers

49:32

is often, you know, if you ask somebody

49:34

like, "What are the properties of your

49:35

system?" They get this like deer in the

49:37

headlights look and they're like, "Oh my

49:38

god, get me out of here." Um, you know,

49:41

and then if you say to them, hey, should

49:44

your system always return an answer if

49:46

two out of three replicas are up?

49:47

They're like, yeah, yeah, totally. And

49:49

if you're like, okay, cool. Like, do you

49:51

expect that answer to come within some

49:52

defined SLA? They're like, oh, yeah,

49:54

obviously. And like, okay, great. And

49:56

it's like, okay, well, should the

49:58

system, you know, should two users ever

49:59

be able to stomp on each other's data?

50:01

Like, no, no, definitely not. Like, and

50:03

so you can kind of tease it out of

50:04

people. And I think that one thing that

50:07

we're very interested in experimenting

50:08

with is like can we automate teasing it

50:11

out of people a little bit?

50:12

>> You need to completely train an LLM to

50:13

like have the dialogue with customers to

50:15

figure out

50:16

>> or to just look at their code and to

50:18

guess at some properties and then

50:19

present them to the user being like,

50:21

hey, are these properties of your

50:23

system? Um, and by the way, even if the

50:26

user says that they're not properties of

50:28

their system, they're often pretty good

50:31

guideposts in the state space

50:32

exploration because they're often the

50:34

kinds of thing that some other developer

50:36

at that company might have mistaken as a

50:39

property of the system, which means that

50:41

if you get it to happen, it might lead

50:43

to a bug later on.

50:45

Do you want to present those

50:47

semi-propies to the like you the person

50:49

who owns the system or do you want to

50:51

present it to antithesis and see whether

50:53

it follows it and then like I feel like

50:55

you want to classify there's the

50:56

properties that are like seem to always

50:58

be followed and like maybe those are

50:59

properties and then there's the ones

51:00

that are not followed at all and like

51:02

those you discard and then there are the

51:04

ones that like are mostly followed and

51:06

maybe those are the interesting ones.

51:08

>> Yeah, this is actually this is so this

51:10

is not an original idea. Um, this is an

51:11

idea that the fuzzing people came up

51:13

with relatively recently, but they did

51:15

come up with it first. Um, I think they

51:17

call it speculative

51:21

speculative properties. I forget exactly

51:23

what the term it's. in a paper

51:24

somewhere. But basically the the fuzzing

51:26

spin on this is, you know, I look at a

51:28

function that I've executed a million

51:30

times and if I see that like one of the

51:32

parameters is positive every single time

51:35

that function is executed, I just go

51:37

ahead and add an assertion that that

51:39

parameter will always be positive. And

51:42

often like if I then like often that

51:44

just is a property and then even if it's

51:47

not right it's very likely that if every

51:50

time I execute it the thing is positive

51:52

and then I get it to be negative one

51:54

time that's going to lead to some

51:55

interesting behavior later in the system

51:58

possibly a bug because everybody else

51:59

assumed it was always positive. And so

52:01

the idea is like we can both use it to

52:03

guide exploration and use it as like a

52:05

kind of you know preemptive uh property

52:08

creation.

52:10

So I want to step back for a second.

52:12

Like I think a meta thing I'm observing

52:14

from this whole conversation is I wonder

52:16

how you sell things to customers, right?

52:19

Like there's like I feel like this this

52:21

whole conversation about like what I

52:23

think of as a really compelling and

52:24

important kind of superpower that you

52:26

can give software engineering teams, but

52:28

we've already had like a pretty long and

52:30

complicated story to like explain what's

52:33

actually going on. like to go to the the

52:37

perspective for a second of somebody

52:38

running a business like how do you think

52:40

about convincing people that this is

52:42

like a thing they should be excited

52:44

about and want to pay for?

52:45

>> How do we sell to you? So that's a good

52:47

question actually right uh how did we

52:49

actually get to antithesis um a little

52:51

randomly actually so from our

52:53

perspective like one of our engineers is

52:56

someone who just like followed the

52:57

foundation DB work and kind of knew

52:59

about it and thought it was cool and was

53:00

wondering what those people were doing

53:02

and at some point saw an antithesis web

53:04

page go up and was like oh we have

53:07

testing problems maybe this would be

53:08

good right this was someone who's

53:10

working on our kind of ultra low latency

53:11

team that does a lot of very complicated

53:14

multi-system extremely subtle behavior

53:16

kind of stuff and was like, "Oh, it'd be

53:18

nice to have like deterministic testing

53:20

for this. Maybe that would be good." And

53:22

so we reached out and set up some

53:25

conversations. Um, I got to sit in on a

53:27

couple of them. Um, not not because we

53:30

were like, "Oh, you know, we need like

53:33

the old guy who's been here for a long

53:34

time, but more because I'm like nerdly

53:35

really interested in testing stuff." So,

53:37

I thought it would be interesting. Um,

53:39

and then you know, one of our engineers,

53:43

a guy named Doug Patty, who's actually

53:44

previously been on the show, uh, decided

53:46

to try it out with Arya, which is one of

53:49

our internally developed distributed

53:51

systems that already has a ton of very

53:54

careful work on the testing. Indeed,

53:55

it's one of the one of the places where

53:57

we've done a lot of very careful work

53:58

around deterministic simulation testing.

54:00

>> Um, and yet, we thought there was some

54:02

actual real value ad from Antithesis's

54:05

version of this. Uh and that's basically

54:07

how how we found you guys. Um but it's

54:10

like a very like expert oriented people

54:14

who are already in the tank kind of

54:16

customer acquisition story. It's like

54:18

the people who already built their own

54:19

deterministic simulation framework are

54:21

like you know we'd like a better one.

54:23

>> Yeah. Well, we had a

54:24

>> we're not I don't think we're a big

54:26

audience. No, we we we actually had a

54:28

debate internally um in the early days

54:30

which was would people who have already

54:32

are already doing fuzzing or PBT or

54:34

deterministic simulation would they be

54:37

better customers right because they are

54:39

into this stuff or would they be worse

54:40

customers because they already have one

54:42

and they're not going to pay a lot of

54:43

money for another one and it turned out

54:45

that they're very conclusively better

54:47

customers but as you say there's not

54:49

that many of people like you and so

54:51

you're right we do have to make it we do

54:53

have to make it broader there's there's

54:55

a few arguments that we use and then I

54:57

think there's like a few trends that are

54:59

really really acting in our favor. Um

55:02

and that are giving us actually

55:04

considerable success in selling this to

55:06

normal people. Not saying you're

55:07

abnormal. Um

55:10

>> all good. I wouldn't object if you had.

55:12

>> Um basically that the two main

55:14

arguments, right, are like safety and

55:16

speed, right? And and you can think of

55:17

those things as being on a frontier,

55:19

right? for a given level of programming

55:22

technology and skill and architecture

55:25

and language choices and problem domain

55:28

and whatever there's some like efficient

55:30

frontier between safety like how sure

55:33

you are that your program has no bugs

55:34

and speed which is like how fast can you

55:37

add new features and and solve business

55:39

problems and I think of a tool like

55:42

antithesis as just pushing that frontier

55:45

outward and you can decide to reap the

55:47

benefits in more safety Right? You can

55:50

keep going at the same speed but be

55:52

really really really sure you don't have

55:54

any bugs which might be the right call

55:56

if you're making pacemakers or like

55:58

airplane guidance software or something

56:00

like that. Or you can just keep the same

56:04

level of quality but do everything a lot

56:07

faster because you're not writing as

56:09

many tests because when you do hit bugs

56:11

you're hitting them in your tests rather

56:13

than in production. you're not doing

56:14

some really long slow boring triage

56:17

process with a badly written bug report

56:19

from a customer while you're not

56:21

sleeping and you know so on right you so

56:23

you can just go faster with the same

56:25

level of quality or you can kind of get

56:26

a little bit of both and you know I

56:29

think we have we sort of have all three

56:31

kinds of customers I would say and

56:32

they're all you know they're all getting

56:34

some real benefit from it somewhere on

56:36

that frontier um I think the trend that

56:41

has there's sort of two trends that have

56:43

helped us a lot. One is just that all

56:47

kinds of software is now

56:50

either responsible for very very

56:52

critical stuff that needs to keep

56:55

working or responsible for making lots

56:57

and lots of money and needs to keep

56:59

working and keep getting better at

57:01

making money. And so, you know, a lot of

57:04

people a lot more people care relative

57:07

to 10 or 20 years ago that their

57:08

software works correctly and that

57:09

they're able to continue developing.

57:11

>> There are more critical systems. That's

57:13

right. Um the other trend that I think

57:15

has been really good for us is like AI

57:16

code generation which hugely increases

57:19

the salience of all these issues and you

57:23

know I think moreover just has like

57:26

made everybody realize the liked doll's

57:30

law nature of

57:32

being able to verify that your software

57:34

works correctly like being a critical

57:36

limiting factor in how much software you

57:38

can write. And I believe this was always

57:41

true, right? And it just like wasn't

57:43

obvious enough to people. But now it's

57:45

like really, really, really obvious to

57:47

people, right? I can have 10 clawed

57:49

codes all writing code and it doesn't

57:51

matter. I'm not going to go any faster

57:53

if I can't merge those PRs after

57:55

reviewing them and actually deciding

57:56

they work,

57:58

>> right? And like the two paths towards

57:59

this is one is figure out ways of making

58:02

your software less critical,

58:04

>> right? Right? And if you can find a

58:05

domain where you can like do a lot of

58:06

stuff where the state where you can get

58:07

value out of it but correctness isn't

58:09

incredibly important, you can move at

58:10

lightning speed and that's great. And

58:12

there's all sorts of cases where this is

58:13

true. Like if I am like a software

58:15

developer who wants an analysis tool to

58:18

understand what's going on in my

58:19

program. It's like you know it doesn't

58:21

have to be all that right. It can like

58:23

help me mo some of the time and not

58:25

succeed other time and it's kind of

58:26

fine. It's kind of a throwaway tool that

58:28

I just make and use and like that's

58:30

super great. You can just like vibe code

58:31

that and it's awesome. And by the way, I

58:32

think there will be many successful

58:34

companies built to make it easier to

58:36

have that kind of software. Things like

58:38

zerorust hosting, you know, things like

58:41

very powerful security guarantees around

58:43

a piece of software so it can't do any

58:45

damage. You know, things

58:47

>> security guarantees and I think also

58:48

just like picking the right abstraction

58:50

boundaries. Yes. Figuring out if I want

58:52

to make this whole thing useful, what

58:54

pieces have to be reliable and what

58:55

pieces don't have to be reliable. So

58:57

it's like there's a whole new kind of

58:58

software engineering challenge of how do

59:00

you build these architectures that let

59:01

you leverage less reliable code. So

59:04

that's like one direction to go and the

59:05

other direction is just getting much

59:06

better at verifying.

59:07

>> That's right. That's right. And and

59:08

right now I think that has suddenly

59:10

become suddenly become interesting. It's

59:12

very hot all of a sudden which is kind

59:13

of fun because you know this was like a

59:15

backwater in some ways a deliberately

59:17

chosen backwater for for a long time and

59:21

uh and now there's all this now there's

59:22

all this interest.

59:23

>> What do you mean by a deliberately

59:24

chosen backwater? Oh, if you are like I

59:29

sure so basically if you're trying to

59:31

decide what to make a career in, right?

59:33

I talked before about how you know

59:36

there's a lot of careers where you're

59:38

not going to have worldclass success

59:40

unless you are at an extreme of the of

59:43

the distribution of people

59:44

>> like being a violinist or a

59:45

mathematician.

59:46

>> Correct. One good way to be at the

59:49

extreme of the distribution is to pick

59:51

something where nobody else who is very

59:54

talented is interested in it. And then

59:57

it just is actually much much much

60:00

easier. And it's, you know, it's um

60:04

it's, you know, you have to you you

60:06

can't pick like, you know, making paper

60:08

airplanes or whatever, right? Like

60:09

nobody super talented is interested in

60:11

that because there's not a ton of

60:12

economic benefit in that, not a lot of

60:14

benefit to the world in that. But if you

60:15

can find a sweet spot where something is

60:18

like both really really really important

60:22

but for some reason nobody else has

60:24

noticed it's really really really

60:25

important or people know it but they

60:28

don't want to do it anyway

60:29

>> cuz it's painful

60:30

>> because it's painful or because it sucks

60:31

or because it's low status right like

60:33

that's just like that's actually an

60:35

>> testing is boring an incredible

60:37

arbitrage opportunity and so that was

60:40

actually a lot of why I got interested

60:42

in testing is this is like Jan antorial

60:45

work. All developers hate it. Like you

60:48

know the number of smart people who have

60:50

thought about software testing is very

60:52

low because

60:53

>> although I have to say like so when I

60:56

started at Jane Street like I was like

60:59

super incompetent like I you know what

61:00

did I have? I had a PhD in computer

61:02

science which is like doesn't tell you

61:04

how to be a software engineer and I was

61:05

like not super good at it and I didn't

61:07

know anything about testing. Um, but

61:09

just like over time, over the many years

61:11

of thinking about these systems and

61:13

building them, I've

61:14

>> come to realize that not just testing is

61:15

important, but it's like super

61:17

interesting and fun, like when you do it

61:19

well, right? There's a lot of

61:20

engineering work that it's one of these

61:22

things that if you don't do the work to

61:23

build good systems for it, it's horrible

61:25

and nobody likes to do it. You like, you

61:27

know, there are lots of companies that

61:28

solve this problem by hiring a whole

61:30

different tier of people to be like the

61:32

testers because it's like so low status

61:34

that you can't get like the real

61:36

software engineers to do it. So you get

61:37

like other classes of people to do it

61:39

and you just like make it a different

61:40

class job and it seems like a terrible

61:42

way to run a business.

61:44

>> Yeah. I I I actually believe that the

61:46

world is like fractally full of things

61:48

that are incredibly interesting and

61:50

incredibly ignored by the entire world.

61:53

I believe that there is tremendous

61:54

lowhanging fruit here. It's not just

61:56

software testing like things that are

61:57

super economically valuable, super

61:59

interesting and that nobody is doing.

62:01

The the the key though is even if you

62:03

find such a thing, your problems have

62:05

not ended because now you need to

62:08

convince other people that it is

62:11

actually super exciting and cool, which

62:13

you might be able to do like kind of

62:15

one-on-one, but like if you want to

62:17

start a successful company, you need to

62:19

somehow make yourself legible to capital

62:21

in the in the words of of somebody who I

62:23

like. Um so, you know, that's a whole

62:27

another challenge. We got a little bit

62:28

lucky there, right? as soon as you know

62:31

as as our company was growing and

62:32

scaling we'd kind of laid all the

62:33

groundwork and the foundations here and

62:35

then suddenly this giant thing happened

62:38

in the world that made what we were

62:39

doing super legible to capital and you

62:42

know that that was just like a nice

62:44

stroke of luck I think we would have

62:45

succeeded anyway but it it's always nice

62:48

when things break in your

62:49

>> right so what you should ideally do is

62:50

pick like a neglected area of the world

62:52

operate in stealth for a while get a

62:53

head start and then cause the area to

62:55

suddenly not be neglected

62:57

>> that's right

62:57

>> but only after you have done a lot of

62:59

pre-work

62:59

>> that's right That's what we somehow

63:00

managed to do.

63:02

>> Actually, the capital thing is a thing

63:03

that may be a good thing to talk about

63:04

for a second. So, like one one thing

63:06

that that that we got involved with. So,

63:08

we started using antithesis as a product

63:10

and we're like excited about the actual

63:12

results. I guess a thing I didn't say

63:14

before which was one one thing that made

63:16

us really happy about it is it like

63:19

actually found bugs that we didn't find

63:20

before. It allowed us to do more

63:23

aggressive, larger scale kinds of

63:25

simulations even though we already had a

63:26

deterministic simulation.

63:27

>> And your systems were really well

63:28

tested,

63:29

>> right? Really well tested and had a

63:31

really good record of a low number of

63:33

bugs in production. But the curse of a

63:36

system that has a really good level, a

63:37

really good record of very few bugs in

63:39

production is people start relying on it

63:42

having a very good record of very few

63:44

bugs in production in the future. And so

63:47

the stakes go higher and you end up

63:49

using it for more and you want to put

63:50

more engineering into making it yet more

63:52

reliable.

63:53

>> That's a super interesting testing

63:54

problem in its own right. By the way,

63:56

like if your if your system performs

63:59

better than its SLA, all everybody who

64:02

depends on you will start to assume in

64:04

code and otherwise that it will always

64:06

perform better than SLA. And then if you

64:08

ever merely meet your SLA, everything

64:11

will go down and crash.

64:13

Yeah, that's basically right. I've often

64:16

wondered what are SLAs's for. I have not

64:19

found like the whole like form of an SLA

64:21

to be a particularly useful like

64:23

engineering mechanism like in practice.

64:25

We at at foundation DB we actually

64:27

invented a technique didn't I mean we

64:29

invented it but others have invented it

64:31

too but like we call the technique

64:33

bugification and the basic idea is if

64:35

you have a piece of code that you have

64:37

written well such that it 99.99% of the

64:41

time does way better than its promise

64:43

right like you know it returns an

64:44

optional value but it always returns a

64:47

value um you should when running in test

64:50

sometimes just make it do the

64:51

pathological thing with some low but

64:54

real probability

64:55

So that depends on that code, all the

64:58

callers get used to the fact that it can

65:00

like exercise its full its full spectrum

65:02

of behavior.

65:03

>> Right. And I guess famously Netflix was

65:05

like, actually, we'll do this in

65:06

production,

65:07

>> right? That's the whole chaos monkey

65:08

idea.

65:09

>> I'm not such a fan of that.

65:10

>> Yeah, we we've there's there have been

65:12

spots where we've used it. I I have

65:14

complicated feelings about it. I do feel

65:16

like it degrades the quality of your

65:19

overall service in a way that is often

65:22

just economically inefficient and you

65:23

just don't want

65:24

>> I think Amazon actually might offer a an

65:28

entire region where like you deploy your

65:30

code there and their services will

65:32

region

65:32

>> they'll just like return 500 sometimes

65:35

you know whenever you know yeah it

65:37

actually a pretty good idea

65:39

>> yeah certainly seems good as like a mode

65:40

of testing

65:41

>> yeah sorry you were saying I yeah so I

65:45

guess we're talking I guess we're

65:45

talking capital. So we got involved as

65:48

customers. We found it like useful for

65:51

finding real bugs. We found that again

65:53

like in in in the way that you might

65:54

expect increased the ambition of the

65:57

kinds of things that we would try to do,

65:58

right? There are like things that we are

66:00

willing to experiment with that are

66:02

riskier but we feel like more of that

66:04

risk is tamped down by the better

66:06

testing story. So it's been like a very

66:08

positive experience for the places that

66:09

we've applied it. Uh, and then we got

66:12

involved actually in leading antithesis

66:15

series A. It was pretty cool,

66:16

>> which was like,

66:17

>> yeah, which I think I think you guys

66:19

found to be a little bit of an

66:20

interesting and weird experience and we

66:21

found to be a kind of novel experience,

66:23

too. And I'm kind of curious how it felt

66:25

from your perspective.

66:26

>> Yeah. Well, you guys haven't invested in

66:27

very many companies. So, it was not a

66:29

thing that we thought was even on the

66:31

table or likely to happen. I think it

66:33

basically happened as like a happy

66:35

coincidence. you heard through the

66:36

grapevine that we were raising money and

66:38

then I think your one of your corp dev

66:40

people came and and chatted with us and

66:43

we were initially like oh yeah whatever

66:45

like they want to do some small

66:46

strategic investment and then I think he

66:48

was like you know we would consider

66:49

leading the round and we were like what

66:51

like that's completely unheard of um but

66:56

it was actually an incredible experience

66:59

um you know Silicon Valley VCs are great

67:03

and they have many forms of knowledge

67:06

and they have many form like they have

67:08

deep networks and they have deep

67:11

experience from working with many many

67:13

companies that lets them give you all

67:15

kinds of good advice but they are

67:18

generally not super active users of your

67:20

product. Um and and

67:23

>> certainly not of this product.

67:25

>> Certainly not of this product. That's

67:26

right. That's right. Maybe Carta like

67:27

had an easier time with that. Um, and so

67:30

like I think, you know, I think one of

67:33

the amazingly cool things about having

67:34

Jane Street as an investor is just that

67:37

like I feel so very aligned in terms of

67:41

like you understand our vision and what

67:42

we want to do, you know, like you guys

67:46

give us like constant good ideas about

67:48

the product and like strategic

67:51

perspective on the world informed by

67:53

your use of it as a customer. It's like

67:55

a very different kind of advice than

67:57

most investors can give and like we've

68:00

already got the Silicon Valley VCs,

68:01

right? Like we we have that and so

68:03

having you guys as well just feels like

68:05

an incredible superpower.

68:07

>> Yeah. And I do think this lines up. I

68:09

mean we're you know we are we are not

68:11

certainly not primarily and not majorly

68:12

like a VC. That's not the primary at all

68:14

the primary thing we do. But we have

68:16

been doing more and more investments

68:17

over the year over the years and and

68:20

those investments have mostly been in

68:22

the form of companies where we are

68:26

connected to the underlying work where

68:27

we care about it are customers or want

68:29

to be customers of it and we feel like

68:31

we have direct kind of subject matter

68:34

expertise on the area in question and

68:36

that we really believe in the product

68:38

and think it's great and want to use it

68:39

ourselves and I think that's the kind of

68:41

thing where we think we actually have

68:43

some meaningful leverage.

68:44

>> Yeah. Yeah. And I think you guys have

68:45

done quite well. I think, you know,

68:48

you're not VCs, but I think you've

68:50

you've done a pretty good job of

68:51

spotting opportunities like, and I think

68:53

you've you've got a track record of

68:55

spotting them sort of before they become

68:56

quote unquote legible to capital. Like,

68:59

I think you guys were very early in

69:01

anthropic, if I don't if I don't

69:02

misremember. And I think you guys

69:04

invested in anthropic at a time when

69:06

they actually had trouble raising money,

69:08

hard as that might be to believe. um you

69:10

know because you saw something that

69:12

others didn't you know and I think with

69:15

us right like you invested in I mean

69:18

like three months is like a very long

69:19

time in in tech these days like I think

69:23

you know today

69:25

every single investor is probably lining

69:27

up to invest in testing companies

69:29

because it feels so salient you know

69:30

with with with AI codegen but like three

69:34

months ago when you guys made this

69:36

investment

69:37

no investor had ever heard of software

69:39

testing or cared like to a first

69:41

approximation and you know I talked to a

69:43

lot of people who were like nobody has

69:45

ever made a software testing company

69:47

that has made any money like why do you

69:48

think you'll be any different and who

69:50

like really needed to hear arguments

69:52

right whereas I think you guys sort of

69:54

spotted that opportunity before the

69:56

professionals did and it's worth saying

69:59

I think we were interested in and

70:02

excited about antithesis and the value

70:04

it provided independent of the AI angle

70:07

right I think the AI angle added a lot

70:08

more but I to some degree I think we

70:10

share some of your basic intuition that

70:12

this stuff has always been important.

70:14

>> Um but it definitely as a kind of you

70:16

know market hypothesis makes a lot more

70:19

sense in the present world where this

70:21

stuff is becoming more salient because

70:23

of of the challenge of verifying AI

70:25

generated code. I I'm actually curious

70:29

how you think about the product really

70:32

working in this context because in some

70:34

ways I think it's a really good fit

70:37

>> and in some ways it's not quite perfect

70:40

right because one of the critical things

70:42

that you want

70:43

>> both when you're thinking about RL right

70:45

you want to like

70:46

>> provide feedback to agents as you are

70:48

training them and then also when you

70:50

actually try and use this stuff is you

70:51

want reliable feedback on whether the

70:54

thing that they did is good but you also

70:55

want fast feedback Anti antithesis is

70:58

good at a lot of things, but it's not

71:00

like super fast, right? When you send

71:02

kick off an antithesis run to find your

71:04

bugs, you know, you know, you might come

71:05

back tomorrow to look at the results.

71:07

>> So, I I actually think that last I I do

71:09

think that there are ways it doesn't fit

71:11

well, but I think that last thing you

71:12

said is a unfortunate current limitation

71:15

that is like highly contingent and will

71:17

not be for long. Um, basically

71:20

antithesis began like its bread and

71:23

butter was like very very large

71:25

distributed systems and those very large

71:27

distributed systems tend to just kind of

71:30

be expensive to run period. And so there

71:34

was not tremendous

71:37

like pressure on us to make all of the

71:40

constant factors of running our software

71:43

like really zippy and snappy. And you

71:47

know

71:49

basically people who were who were

71:50

testing this stuff were just okay with

71:52

getting a relatively slow answer and so

71:54

we weren't under a lot of pressure to to

71:55

do otherwise. Um as we move beyond

71:58

distributed systems which we are doing

72:00

this year you know that equation changes

72:04

and I think you are going to see that

72:07

antithesis gets way faster at giving

72:10

results and we have a lot of really

72:12

really cool projects underway that are

72:14

that are going to enable that and make

72:15

that possible. And by the way I think

72:17

that even for distributed systems you

72:19

might be able to start getting pretty

72:20

fast results. Like I don't think there's

72:21

a law of the universe which says you

72:23

can't test a distributed system fast.

72:25

Um, at Foundation DB, we often got good

72:28

quality answers within minutes or tens

72:30

of minutes. Um, very thorough answers.

72:34

Sometimes we'd even find the first bug

72:35

in less than a minute. And I think that

72:37

that is totally a thing that you will be

72:39

getting from antithesis, you know, in

72:41

the next year or so.

72:42

>> So, what are ways in which beyond the

72:44

kind of time time scale issues, what are

72:46

ways in which you think maybe it doesn't

72:48

solve all the problems for

72:49

>> Oh, for AI in particular?

72:51

>> Yeah.

72:51

>> Yeah. Well, so okay, there's a few

72:53

things.

72:54

Let's let's talk first about what I

72:58

consider the most fundamental one and I

73:00

think the most interesting one and I

73:03

don't think that this is like

73:04

catastrophic but I think it's like an

73:06

interesting challenge that everybody

73:08

who's doing any kind of you know

73:11

autonomous software verification whether

73:13

that's property based testing or formal

73:15

methods or code review or whatever is in

73:18

my opinion not thinking about. Okay. So

73:21

code generation tools, code synthesis

73:23

tools, specificationdriven

73:26

tools like that have existed since way

73:29

before chat GPT existed, right? These

73:31

have existed for 20, 30 years. And

73:34

nobody ever used them because they suck,

73:36

>> right?

73:36

>> And why did they suck? Basically because

73:39

they all acted like evil genies. you

73:42

would say exactly what you wanted the

73:44

program to do and the, you know, nonLM

73:48

program synthesis machine would crank

73:50

out a program that exactly matched your

73:53

specification and totally did not do

73:55

what you wanted to do.

73:56

>> Yep.

73:57

>> Right. You you've had experience with

73:58

this.

73:59

>> Yeah. Yeah, I mean I've been sort of

74:00

paying attention to like the program

74:01

synthesis literature for a long time and

74:03

like it is there's a lot of really great

74:05

research and a lot of great researchers

74:07

doing interesting stuff but remarkably

74:10

little practical applications in it and

74:12

all the things that people work on end

74:14

up looking mostly like toys like I think

74:16

maybe like the single most successful

74:19

like program synthesis style thing is

74:21

like Microsoft flashfill in Excel which

74:23

is like you know pretty good but like I

74:27

feel like for all the like smart work

74:28

that's gone into this stuff, you would

74:30

expect like in some ways to have more

74:32

practical impact. But like the problem

74:34

is just really hard to do well. And I

74:37

think in some ways one of the reasons

74:38

why LLMs are better than classic program

74:42

synthesis and is that there are less

74:44

evil genies. Yes.

74:45

>> And like they're not really

74:47

specification driven. They're like vibes

74:49

driven. like you say and it makes some

74:51

inferences and there's a lot of like

74:53

leaning on the priors of the thing it's

74:54

seen in the past and what it generates

74:56

and it's just optimizing less.

74:59

>> Exactly. Exactly. Exactly. Exactly.

75:01

>> And of course the RL process makes it

75:03

optimize more. Right. So this whole

75:06

thing where you have basically uh like

75:09

eval hacking where it like does kind of

75:11

whatever it can do to try and get like

75:13

the light to turn green, right? This is

75:15

a problem with like LLM. It's a problem

75:17

with people, right? Sometimes like you

75:18

have some system where you have some

75:19

checks in place and like a thing we talk

75:21

about internally is like don't just play

75:23

the video game, right? You don't just

75:25

try and like score. You want to actually

75:26

like do the right thing and use the

75:28

alerting as a way of understanding

75:30

what's going wrong. But if you turn the

75:32

alerting into the the thing that you're

75:33

actually optimizing for, very bad stuff

75:36

happens.

75:36

>> Goodart's law. Yeah.

75:37

>> Yes.

75:37

>> Yeah. So that's exactly right.

75:40

Basically, basically the reason I

75:43

personally thought that AI code

75:45

generation wouldn't go anywhere like a

75:47

year or two ago because of exactly this.

75:49

I had experience with program synthesis

75:50

tools. I was like, "Oh, they're all evil

75:51

genies. They suck." You know, I think a

75:53

lot of people who had experience with

75:55

these tools had the same kind of

75:56

reaction. And what we all missed was

75:58

exactly the thing you just said. LLMs

76:00

are not like they actually want to make

76:03

you happy, right? Like like

76:06

>> for good and ill.

76:07

>> Exa Exactly. They're like the the

76:08

sycophency thing is like there's

76:10

actually a nice a nice flip side to it.

76:12

Like they've been trained on like

76:14

zillions and zillions of examples of

76:15

people on Reddit and Stack Overflow

76:17

being helpful and then they've been RLHF

76:20

by people who reward it for being

76:22

helpful and and so it actually is kind

76:24

of trying to write the code that you're

76:26

asking for as opposed to like write code

76:28

that fits the specification that you

76:29

asked for in the least amount of work or

76:32

whatever. And what happens when you put

76:35

these things in a loop with something

76:38

that's like, eh, no, try again. Ah, no,

76:41

try. Right. Like, it kind of shifts it

76:44

back into being an evil genie a little

76:45

bit.

76:46

>> That's right. Although, to be clear, I

76:47

think that the people who are doing the

76:48

training are no fools. And that, you

76:51

know, you've talked to some people who

76:52

do this kind of training work and they

76:54

pull they pull the system simultaneously

76:56

in multiple directions, right? There are

76:58

things that you do to pull it in the

76:59

direction of trying to just satisfy the

77:03

immediate feedback goal and also trying

77:05

to pull it in the direction of like

77:06

fitting more the general distribution

77:08

and not just kind of totally getting

77:10

completely twisted out.

77:11

>> Yes. But the problem is that when you're

77:13

done training, when you're actually

77:14

running this thing, if you run it in a

77:16

loop, it's it's still pushing it back

77:19

towards being an evil genie. Not in

77:21

terms of like shifting its weights and

77:22

and so on, but just in terms of its

77:24

behavior and what it tries next. Like

77:26

I've seen this happen even with just

77:28

very very very

77:30

not sophistic like not property based

77:31

testing right like I have clog code and

77:33

I'm like hey do this thing for me and

77:35

make sure the tests all pass and like if

77:37

the thing is hard and it can't do it

77:39

correctly eventually it deletes the

77:41

tests or like or eventually it like

77:43

makes the test pass in some trivial way

77:45

or in some way that is totally not what

77:47

I want and

77:48

>> I do think this is getting a little

77:50

better but the phenomenon is still very

77:52

strong.

77:52

>> Yes. And I think I basically think that

77:54

the more powerful and unyielding the

77:58

validation step is probably the worse

78:01

this overall effect gets.

78:03

>> Yeah. And another I think general

78:05

problem with these issues, we talked

78:06

before about the kind of functional

78:08

properties of the software that you're

78:09

optimizing for and then the

78:10

non-functional properties like all these

78:12

kind of architectural and clarity and

78:14

extensibility properties

78:15

>> and those probably get worse. Yeah.

78:17

>> Right. Because if you look at the agents

78:19

in their efficacy depends a lot on those

78:22

non-functional properties. They just do

78:24

better in context where things are

78:26

tighter and more extensible and easier

78:29

to understand and where the systems are

78:31

fundamentally simpler, but they're super

78:33

bad at maintaining those properties.

78:35

>> Um, I feel like the the the thing that

78:38

Anthropic came out with of like the C

78:40

compiler that they built was a really

78:41

interesting example where the they got

78:44

really far. They built like a pretty

78:46

good compiler. I mean, not actually a

78:47

good compiler. You wouldn't want to use

78:48

it for anything. But like an impressive,

78:50

it was an impressive technical feat.

78:52

>> You know, it's a little bit like the

78:53

talking dog. It's, you know, it's not

78:55

it's not that what it says is so great.

78:57

It's that it talks at all. Like that

78:58

they got a compiler that got to that

78:59

level is is is impressive.

79:01

>> But the thing a lot of people have

79:03

focused on like, oh, you know, it didn't

79:04

do any typeing and it didn't do this and

79:06

it didn't do that. And that's like a

79:08

little interesting, but the thing I was

79:09

more struck by was the way in which it

79:11

ended. And they were unable to make

79:13

future progress to make more progress

79:15

with this like team of agents approach

79:17

because it just started to be the case

79:19

that as the agents started to make

79:21

improvements they would break other

79:23

stuff at such a rate that they couldn't

79:25

actually net

79:25

>> which is an experience that every junior

79:27

engineer has had too right like and it's

79:30

why things like architecture matter and

79:32

it's why things like you know

79:34

>> making your system like actually fit

79:36

together in a minimal and clean way and

79:37

have concerns be orthogonal and well

79:39

factored and all that stuff. Yeah, it's

79:41

just like a bringing to life the like

79:44

deconstruction of the non-functional

79:45

properties of your software, right?

79:47

>> Uh and that's I think that's one of the

79:48

reasons why, you know, it seems to me

79:51

like testing while still important just

79:54

isn't enough, right? You still need to

79:56

think about architecture. You still

79:57

think need to think about the

79:58

cleanliness of the code and all of that.

80:01

Like it's not I think it's it's you just

80:02

you just have to maintain those

80:03

nonfunctional properties. And and it's

80:05

possible that if you put an LLM or an

80:07

agent swarm or something in a loop with

80:09

a really strong test or a really strong

80:11

formal verification system or something,

80:14

it's just going to make the architecture

80:17

worse and worse in order to get the test

80:19

to pass. Like that seems like a very

80:21

plausible failure mode.

80:22

>> Yep.

80:24

So how do you think about

80:28

kind of the completeness of antithesis

80:30

as an approach, right? Like to what

80:32

degree are you like an antithesis

80:34

maximalist? I mean I don't so much mean

80:36

antithesis the product but the approach

80:37

right the approach is like we are going

80:39

to have a kind of ability to do these

80:42

high-powered endto-end randomized tests

80:46

of our systems in a way that like are

80:47

very crosscutting and can test check

80:49

lots of different properties. That's not

80:52

the only way to write tests, right? You

80:53

know, you know, there's like the classic

80:55

I'm going to like at a small scale write

80:57

a unit test which like sticks an example

80:59

in there and see whether the example

81:01

behaves in the way that I want. Like

81:03

>> to what degree do you think this the

81:04

antithesis approach is really the

81:05

approach that people should be doubling

81:06

down on and to what degree do you think,

81:08

you know, we should be throwing many

81:10

things at the wall?

81:10

>> Yep. So, I will first say that I I I

81:14

want to dispute the idea that there's an

81:16

antithesis approach. Um Okay. So the

81:17

thing the thing that we've told people

81:19

including all of our investors from the

81:20

start is that this is not a

81:22

solutionsbased company. It's a problem

81:23

based company. Like our goal is to make

81:26

software validation incredibly cheap and

81:28

easy and and like running water and find

81:32

all the bugs in all the software by any

81:34

means necessary. And it just so happens

81:36

that we thought that the lowest hanging

81:38

fruit, the best way to like start making

81:39

money and really start making a dent was

81:41

to do this deterministic simulation

81:43

thing and to make that cheap and easy

81:45

for people to to adopt. Um but you know

81:49

that is not the full extent of our

81:51

ambitions. If we someday you like you

81:53

know I I kind of dream of a day where

81:57

software engineers don't need to know

81:59

what deterministic simulation or unit

82:03

testing or formal methods or you know

82:07

concolic solving or or any of these

82:10

things are they just hand their software

82:13

to a box and you know and get back like

82:15

it worked or it didn't. And obviously

82:18

there's going to have to be a lot of

82:19

very complicated things that happen in

82:20

order to enable that vision. But like I

82:23

kind of yeah that that's the dream.

82:25

Okay. That said there's a reason we

82:26

started where we did and it's that I

82:28

think we do believe that this technique

82:29

is uniquely high leverage and a little

82:32

bit uniquely low adopted for how high

82:36

leverage it is. And you know I've seen

82:41

I have seen I have seen both situ. Okay.

82:44

So like our team right is always dog

82:46

fooding our own product which is a thing

82:49

that every team that's making a

82:50

developer tool should should do or

82:51

really any kind of tool. Um

82:53

>> it can be harder if it's not a tool that

82:55

you use right developer tools where it's

82:56

easiest.

82:56

>> Yes. And so we we you know that's both

83:01

fun. And I think I feel like that both

83:03

shows the power and the limitations of

83:05

the current basket of tools that we

83:07

offer to our customers. Like we have

83:09

gotten ridiculously far with just doing

83:12

antithesis style deterministic PBT on

83:15

everything that we write. Um including

83:17

like UI components, browserbased stuff,

83:21

you know, including like very low-level

83:23

things, just like everything. Um, we

83:28

have entire extremely complex systems

83:30

that are literally only tested with

83:32

antithesis and nothing else where like

83:34

nobody has written a unit test and we're

83:36

like one of the policies of that area of

83:37

the codebase is that people don't write

83:39

unit tests. You just add more, you know,

83:41

more sophistication to the property

83:43

based tests to cover whatever you need

83:45

to cover. And then there's some parts of

83:47

our code where I'm like, man, there

83:49

should just be a unit test here, you

83:51

know, and and that would make this a lot

83:53

more straightforward. And so I feel like

83:55

this is like kind of a wimpy answer to

83:57

your question, but I kind of feel like

83:59

there is a line, right? There is there

84:01

is a place at which you should just

84:03

write the stupid unit test or you should

84:05

not use testing at all. You should be

84:08

using something like proof-based

84:10

techniques because of the nature of your

84:12

problem domain or you should be using

84:14

exhaustive testing, right? Like if your

84:16

function takes an int32, you can just

84:18

try all of them.

84:19

>> Yep.

84:20

>> Won't take that long.

84:20

>> Definitely done that. Um, so like I I

84:23

think that that line does exist. I think

84:25

it is a lot farther away than most

84:28

people realize. Like I think more things

84:30

are amendable to property based testing

84:31

than people think and that if we can

84:33

make it easier and more powerful, people

84:35

will use it in more situations where

84:37

they don't currently use it.

84:38

>> Yeah, I think I I think that's right. I

84:42

and I think your point about it being

84:44

neglected essentially feels right to me

84:46

as well of like if you're going to see

84:47

where you can add a new thing and make a

84:49

big a big change. I feel like that's a

84:51

natural thing to work on. Um I do think

84:54

the other kind of testing is like really

84:55

important. I think there's a a kind of

84:57

like unreasonable effectiveness of

84:59

example based testing, right? Like I

85:02

think it's in some ways it's almost kind

85:04

of sounds like a comically bad idea of

85:05

like I'm going to have a big complicated

85:07

program and then I'm going to test it by

85:09

like writing six examples. Um, but like

85:13

to a surprising degree for like modest

85:15

complexity things, it actually like

85:17

works super well.

85:18

>> Um, and I think works especially well in

85:22

code bases that have other good

85:24

non-functional properties. Like a thing

85:25

I've long been struck by is the degree

85:28

to which having a really good and

85:30

expressive type system that like

85:32

captures a lot of useful properties of

85:34

your program and tests together kind of

85:37

there's a kind of multiplicative effect

85:39

where it has this very strong property

85:41

to kind of snap in place. Like you just

85:42

kind of put your finger on a couple of

85:44

spots and make sure that the behavior is

85:45

what you expect it is and like the kind

85:48

of analytic continuation of your

85:49

program. the rest of the behavior is

85:51

kind of smooth enough that there's kind

85:54

of like only one natural thing for it to

85:55

do and it kind of just like clicks in

85:57

and does that one thing.

85:58

>> Yes. I I think as a thing I've said

86:01

before is like, you know, there's this

86:03

funny thing about impossibility results

86:05

where they often are actually cluing you

86:07

into like a thing that you should really

86:09

try and do. And and the reason is that a

86:11

lot of impossibility results, this is

86:13

true in mathematics, true in computer

86:14

science, true everywhere, kind of rely

86:17

on this like anti-inductive property,

86:20

right? It's like it's like I'm going to

86:23

prove that the thing that you're trying

86:24

to do is impossible by constructing a

86:26

really fishly awful example and like ha

86:29

your technique fails here and I'm going

86:30

to adapt it based on the technique that

86:32

you're bringing, right? And like that's

86:34

kind of that's kind of how you know

86:36

impossibility results in mathematics

86:38

often work like diagonalization

86:39

arguments. It's also true in many famous

86:42

impossibility results in computer

86:44

science. And I think what's significant

86:46

about this is like we're not trying to

86:49

like we're not trying to find bugs in

86:53

every random touring machine or even in

86:56

a random touring machine drawn from the

86:58

space of all touring machines, right?

87:00

We're trying to find bugs in software

87:02

that people write to accomplish business

87:04

purposes. And that is a very very very

87:07

infiniteely small subset in the space of

87:09

all possible programs. And it's like a

87:11

really nice one, right? It's like, you

87:14

know, it's like it's like smooth

87:16

functions or functions that are

87:17

everywhere differentiable or something.

87:19

It's like, you know, it's like the these

87:21

are programs that people have built for

87:23

a reason and have built so that they can

87:26

like come back and modify them and

87:27

extend them someday. And I think it just

87:30

turns out that in that space of

87:32

programs, testing is actually way more

87:35

tractable than it would be in a

87:38

completely random, you know, random

87:39

program.

87:40

>> Yeah, there are tons of things like

87:41

this. Another fun example from our world

87:43

is uh type checking in Okamel and any

87:47

language in that ecosystem or in that

87:49

kind of rough space of languages is like

87:51

doubly exponential. Like you can write,

87:53

you know, an 18line program that will

87:55

not finish typeeing until the heat death

87:56

of the universe,

87:58

>> but nobody does. It turns out those

88:00

programs don't make any sense, right?

88:02

And you can find that like if you think

88:03

really hard, you can figure out what

88:05

those programs are, but they're not

88:06

actually a practical part of the actual

88:09

things that you run into when you when

88:10

you actually do the real work. And

88:11

again, I think this behavior of like

88:13

real world programs being a much

88:15

smoother, tamer, better behaved subspace

88:18

is a really important one for lots of

88:20

engineering questions.

88:21

>> It's true. Although we do trollishly

88:23

inside of our company have the like

88:25

inside joke like at our last company we

88:27

violated the cap theorem and at this one

88:28

we're violating the turn halting

88:30

theorem. So you we're just like moving

88:31

up the hierarchy of theorems.

88:34

>> Yeah. What's next? What's the next

88:36

theorem to violate?

88:37

>> I don't know. That's a good question.

88:38

>> It's a good it's a good company

88:39

formation question.

88:40

>> Yeah.

88:40

>> Um so we've talked a bunch about kind of

88:45

the kind of engineering practices you're

88:46

trying to create in the outside. um and

88:48

a little bit on your engineering

88:50

practices internally, but I'd like to

88:51

hear a little bit more about that like

88:52

what how does antithesis operate

88:54

internally.

88:56

>> Uh and I'm kind of curious how that how

88:57

that differs from what you guys see in

88:59

the outside.

89:00

>> Sure. So, I think um

89:03

I learned a a useful trick from somebody

89:05

recently, which is when you're talking

89:07

about your company's culture, like

89:09

culture is always a set of trade-offs,

89:11

right? There's no like purely good

89:13

cultural attribute. Yep.

89:15

>> They're all just like choices on a

89:16

spectrum and being one thing implies

89:18

that you are not the good things about

89:20

the opposite. And so I'm going to try

89:22

and phrase this in like the most edgy

89:25

way possible maybe. Um so

89:30

I think that we generally believe a

89:34

couple important premises that have led

89:36

us to pick a pretty pretty weird by by

89:40

outside standards place on a lot of

89:41

these culture spectrums. I think we

89:44

believe that

89:46

for many kinds of projects the overall

89:50

cost of the project is dominated by the

89:53

number of mistakes you make. Like like

89:57

big architectural mistakes early on in a

89:59

project can just have like an

90:01

exponential effect on the amount of work

90:03

that it takes to get the project done. I

90:06

think we also believe

90:09

that one of the biggest scalability

90:11

barriers to

90:14

human organizations is communication and

90:18

that one of the things that is worst for

90:21

communication is like lack of trust. Um

90:25

and

90:28

yeah, let's just start there. So, so

90:30

given that you believe these things

90:31

about the world, like what would you

90:33

want your engineering culture to look

90:34

like? Well, basically we try really,

90:39

really, really hard to talk a lot about

90:44

what we're going to do before we do it

90:46

and to debate multiple possibilities for

90:50

how we could accomplish some important

90:52

objective before we like go all in on

90:55

one. And that doesn't mean that we don't

90:57

prototype. Like often these discussions

90:59

do involve people bringing prototypes

91:01

and showing them to each other and and

91:02

debating the merits of them. But like it

91:05

is it is basically considered like

91:08

uncuthter

91:15

and then explain why you picked this one

91:17

over that other one and then explain why

91:19

you don't think there's a great third

91:20

alter alternative, right? Like and that

91:23

I think drives some people completely

91:24

insane. Like like there's there's a lot

91:27

of people who are just like, "Man, I

91:28

want to put on my headphones. I want to

91:29

write my code. Leave me alone." and like

91:32

they just won't have a great time at

91:34

antithesis where people are going to

91:36

walk by and be like and there you know

91:38

we all work in a big open room exactly

91:40

like you guys do here and people will

91:41

just come look at your screen and be

91:43

like hey why are you doing that you know

91:44

which is like not a thing that would

91:46

happen at some other companies I've

91:48

worked at um so we're highly

91:50

collaborative highly deliberative

91:54

you know collaborative does mean that

91:55

we're all in a physical office together

91:58

for the most part because it's you know

92:00

adding any friction to communication

92:02

just means that you get a whole lot less

92:04

of it.

92:05

>> Sure.

92:05

>> Um it means that we don't really care

92:09

about hierarchy very much. Like there is

92:11

hierarchy. Every human society and

92:13

organization has hierarchy. Um

92:16

>> I've heard you're the CEO.

92:17

>> That's right. But like everybody's

92:21

opinions can be questioned and debated

92:24

and like you know just because somebody

92:26

is the big boss of some particular part

92:28

of our software architecture does not

92:31

mean that they get to sort of be

92:32

dictatorial or rule by fiat. Like people

92:34

people can just come and be like I think

92:36

you're making a stupid decision and

92:38

that's like a very normal thing and we

92:40

try to praise people for sticking their

92:41

necks out and making statements like

92:43

that.

92:45

>> Yeah. A lot of this feels very familiar.

92:47

I think we've taken it like a pretty

92:49

similar role. It's not like

92:50

>> like the whole like big tech thing of

92:52

like, you know, you're an L8, you know,

92:54

sergeant, second class, something

92:56

something. It's just like we just don't

92:58

think makes a lot of sense for us. And,

93:00

you know, people have functional titles

93:01

as like someone who's like responsible

93:03

for a given area or whatever. Uh, but

93:06

there's no kind of general notion of

93:08

title that like shows up somewhere.

93:10

>> We're the same way. And I'm we're

93:11

actually debating whether we need to

93:12

change this at some point, but basically

93:14

every single person on our engineering

93:16

team has the same title on their job

93:18

offer. It's senior engineer.

93:19

>> Yeah. For a while. Yeah. For a while, I

93:21

think for weird legal reasons, we

93:23

thought we needed like two different

93:24

ones and like for the first two years

93:26

you were a software engineer and then

93:27

afterwards you were but like with no

93:28

internal like reference or anyone paying

93:31

attention to that kind of stuff.

93:32

>> Yeah. So the the the thing which I

93:34

should probably not be saying but it's

93:35

true is is we um we we sort of treat

93:38

titles as like a as tools right like so

93:42

when we're interacting with the outside

93:43

world people can adopt any title they

93:45

wish pretty much so it's like if

93:47

somebody really needs to get into a

93:48

conference like suddenly they're a

93:50

senior staff engineer third class or

93:52

whatever like whatever our marketing

93:53

people decided would be the correct

93:54

title for you to get into that

93:55

conference and you know people you know

93:58

can use sort of whatever titles in their

93:59

by lines that they think would be most

94:01

useful or put on LinkedIn like this is

94:02

like a form of compensation like please

94:04

pick your title but internally there are

94:07

no titles

94:08

>> right and I think part of that is we

94:09

very much want a culture where the thing

94:11

that matters is the idea

94:14

>> and like what's the actual thing you're

94:15

trying to do and not like the particular

94:17

position and rank and like no culture is

94:20

perfect our culture is certainly far

94:21

from perfect and I don't think this

94:22

ideal

94:24

>> 100% works out in all the cases but I

94:26

think it's definitely like directionally

94:28

much more this way here than I think in

94:30

lots of other places and I think it's a

94:32

little disorienting actually sometimes

94:33

for like like a you know a strong

94:35

experienced person who comes from

94:37

somewhere else and lands at Jane Street.

94:39

It's like you know doesn't have like

94:41

like a rank that helps them navigate and

94:44

we have to actually be much more

94:45

intentional about like trying to get

94:47

them into the right spot and make sure

94:48

that like people quickly realize that

94:50

like oh this actually is a person who's

94:52

like substantively worth including in

94:54

and listening to in a bunch of different

94:56

contexts because we like sort of just

94:58

don't have the title tool as a way of

95:00

making that happen. And so you have to

95:01

use other methods to get people in the

95:03

right spot.

95:03

>> So how do you guys think about

95:05

maintaining that as you grow? Because I

95:07

think like this kind of organization is

95:10

really really effective and also really

95:13

hard to preserve if you grow quickly.

95:15

>> Right? So I think one of the things is

95:17

even though it feels kind of quick, we

95:19

just kind of haven't grown quickly.

95:21

We've been relatively disciplined about

95:22

growing at I don't know what feels like

95:24

a fast pace between 10 and 30%, you

95:27

know, depending on the year, usually

95:29

south of 30. Um and and when we've been

95:32

on the upper range of that, we're like,

95:33

"Wow, this is like really

95:34

uncomfortable." Like we kind of maybe

95:36

want to slow down a little bit and and

95:37

we really feel like it's important to be

95:40

able to take the time to absorb people

95:42

into the organization.

95:43

>> Um I don't I don't know how to run a

95:45

company where you need to double every

95:47

year for a few years. It seems

95:48

terrifying

95:49

>> and and it's just not how we've how

95:51

we've operated.

95:52

>> Um so that's one thing.

95:54

Uh we've also just been very rigorous

95:57

about interviewing just trying to make

95:59

sure we're bringing in people who are

96:02

very good technically like that's really

96:04

important. Um, but also who fit in

96:06

culturally, who are like nice and humble

96:09

and yep,

96:10

>> have good second order knowledge and

96:12

aren't made super uncomfortable about

96:14

being wrong because like we're all wrong

96:16

a lot. Like you make a lot of mistakes

96:18

and you want people who are comfortable

96:20

owning up to those mistakes and

96:21

>> Yeah, we design we actually deliberately

96:23

design our interview to try and assess

96:25

these qualities. Um, that's like a

96:27

significant part of why it's set up the

96:28

way it is.

96:29

>> Yep. Yeah. Know, we have similar things

96:32

from our side. It's we think it's it's

96:35

after some early mistakes based on not

96:38

understanding this, we realize that like

96:39

you really don't just want to solve the

96:41

people who are like best at solving the

96:43

puzzles. Like being good at solving

96:45

puzzles is really good, right? Being

96:46

just like having like, you know, high

96:48

wattage and just being really smart at

96:49

stuff is good. Um, but you really want

96:52

to make sure that whoever you're

96:53

interviewing, you see how they operate

96:55

under challenge. Yep. because like

96:57

you're going to take everyone and you

97:00

know there's more they can do and you're

97:01

going to keep on asking them to do more

97:02

until the job is hard and there's you

97:04

know there's no end of hard problems to

97:06

solve and so you want to see how people

97:08

operate in that context. the the thing

97:10

you mentioned of like niceness and and

97:12

being good to work with and so on that I

97:14

think we fully agree with that and that

97:16

comes from a another sort of like

97:18

fundamental observation about the world

97:20

which is most problems are hard enough

97:23

that one person alone cannot solve them

97:25

and even if they were like your

97:29

individual value that you bring just by

97:32

like the stuff that you do in almost

97:34

every case is dwarfed by the positive

97:37

and negative externalities that you

97:39

cause on the team like you know you are

97:41

going to be chatting with your friend or

97:43

your colleague at lunch and like have

97:45

some good idea that makes their job

97:46

easier or you're going to be mentoring

97:48

some junior engineer and teaching them

97:49

some trick that's going to make them

97:50

more valuable for the rest of their

97:52

career or you know on conversely you're

97:54

going to be like being really mean to

97:56

somebody and then they're in a bad mood

97:57

for the rest of the day and and aren't

97:59

as productive and also just make the

98:00

place like a less fun place to work. Um,

98:03

and so like that stuff just kind of

98:06

dominates actually when you get to a

98:08

sufficiently large organization size.

98:10

And it's not to say that you can be

98:12

ineffectual and really nice and and have

98:15

a job. Like, you know, we there you have

98:17

to get things done.

98:18

>> There is still a bar. That's right. Not

98:20

least because having people around like

98:21

that is is terrible for morale. Um,

98:23

>> right. Lowers the intellectual density.

98:25

>> That's exactly right. But but it's sort

98:27

of like you just need both and and we're

98:30

just not going to accept you unless you

98:31

are both really great on your own and

98:34

also really great and magnify the

98:36

abilities of the people around you.

98:37

>> Yep. Yeah. I think that's totally true.

98:39

One one point about the like you know

98:42

the externalities really matter. I think

98:44

that's true. I feel like you could take

98:46

that kind of thinking in the direction

98:47

of thinking that like what really

98:48

matters is like organizational stuff and

98:50

how things are put together and teams

98:52

and all that. And I think that's that

98:53

stuff is all really important. I also

98:55

feel like the shape of this business

98:58

makes very clear to us how amazingly

99:01

valuable strong individual contributors

99:03

are and like a lot of that value is like

99:05

the externalities that they have. But

99:07

like like individuals in both a kind of

99:09

trading and a technology and various

99:11

other contexts who are just like super

99:12

good at their job and like not kind of

99:15

built to be large scale leaders can

99:17

still be just like enormously valuable

99:18

and enormously well paid because that

99:21

kind of individual contribution can just

99:23

move the needle in a huge way. So like

99:25

you know it's both like this kind of

99:27

collective stuff that really matters but

99:29

also people's just individual power to

99:31

do amazing things is super important and

99:33

it's really important to like recognize

99:35

and compensate people for that kind of

99:37

stuff.

99:37

>> Yeah, I totally agree. I think you know

99:39

another thing another thing that helps

99:41

with keeping that kind of environment as

99:42

you grow is just having strong espree

99:44

and a strong like sense of yourself as

99:46

an organization. And I think you know I

99:49

think quirky cultural choices and quirky

99:51

technology choices actually help with

99:53

that. Like I think it makes people hold

99:56

their heads a little bit higher. It's

99:57

like yeah, I work at Jane Street. I work

99:58

at Antithesis. It's like a slightly

100:00

weird place. Like people who don't work

100:02

here definitely don't work here. You

100:03

know, it's not just like another

100:05

interchangeable company. And I think

100:06

that actually makes all these cultural

100:08

problems a little bit easier to solve on

100:11

every dimension.

100:13

>> Yeah, certainly. I like to think so

100:14

since I think I'm deeply culable for our

100:16

weird choice of programming language. So

100:17

I hope that has some positive

100:19

externalities. There's actually a really

100:20

interesting paper I read recently um

100:22

that that talks about this in the

100:25

context of hidic Jewish merchants in the

100:27

New York diamond district.

100:29

>> So

100:29

>> amazing.

100:30

>> Have you have you have are you familiar

100:31

with this? The researcher named Beric

100:33

Richmond.

100:34

>> I have I mean I I am familiar with like

100:36

the stores like I have seen those guys

100:38

and been in this but I have not heard

100:39

about the research.

100:40

>> So they have incredibly low transaction

100:42

costs with each other. They lend on

100:45

credit. they uh you know they they they

100:48

don't require huge amounts of

100:49

collateral. They don't sue each other.

100:53

They are very very very low transaction

100:55

cost. And that is a big part of why they

100:58

are so successful. And Richmond studies

101:01

them and basically concludes that a lot

101:03

of why they have such low transaction

101:04

costs is because they are clearly not

101:08

the world, right? They're clearly an

101:10

insular group of people who all know

101:13

each other, who all trust each other.

101:15

and you know and and and where leaving

101:18

that group or joining that group is very

101:20

expensive and and he basically thinks

101:23

that that kind of makes all of their

101:25

economic dealings more efficient and

101:27

smoother and it it's it's actually super

101:29

interesting paper.

101:29

>> Yeah, that's interesting. I do think the

101:31

high trust thing matters a lot for us. I

101:33

do think it reduces the kind of internal

101:35

transaction cost. It's kind of easier to

101:37

get things done. A thing that I'm kind

101:39

of always worried about but still

101:41

delighted seems to be still in place is

101:44

that the the place it's still a place

101:46

that can like pivot quickly. Like when

101:48

something different needs to happen, you

101:50

realize there's a new emergence and we

101:51

have to change things and move people

101:52

around and like focus less on this and

101:54

more on that. Like we're able to do it

101:56

in a way that feels generally pretty

101:58

positive. Um people who come from other

102:00

organizations are sometimes like we say,

102:02

"Oh, we're reorganizing this area."

102:03

people like there's a reorg and they you

102:06

know they stiffen up in their chair and

102:07

it's like what are you worried like

102:09

what's what's wrong about re like we

102:11

reorganize stuff all the time we change

102:12

where the seats are we move it's all

102:14

happens kind of routinely and like I

102:15

real I then I hear stories about what

102:17

reorgs are like at various big tech

102:19

firms I'm like oh now I see what you're

102:21

scared of

102:22

>> we made we've made two huge pivots in

102:25

the last two years that I'm actually

102:26

just tremendously proud of our team for

102:28

doing because they both required

102:31

>> astonishing levels of like intellectual

102:33

humility and like dealing with reality

102:35

which is a thing that organizations are

102:36

usually pretty bad at. Um the first was

102:40

basically you know we had been in

102:42

stealth mode doing R&D like deep

102:44

research for 5 years and then we came

102:46

out and started selling it and at some

102:49

point we kind of realized that we were

102:52

still thinking of the world in a very

102:55

R&D way and then in particular we just

102:58

were not listening to our customers and

102:59

did not have the like customer service

103:02

mindset at all and were really really

103:05

bad at listening to their feedback and

103:07

were really really bad at like doing

103:09

what our customers wanted and that maybe

103:11

this is like not a great property for a

103:13

company trying to have more customers to

103:15

have.

103:16

>> That makes sense. And so like this like

103:18

kind of like sense dawned on us and

103:21

eventually we were just like oh we have

103:23

to change how we think about everything

103:24

and how we do everything and you know

103:27

the company just like all pulled

103:28

together and we're like okay we're going

103:30

to be different now and and we did and

103:32

we like turned on a dime and I think it

103:34

went really well and it's like not 100%

103:36

done but it's like notably and

103:38

distinctly different. Um, and the second

103:41

one was AI where basically for like most

103:44

of the last few years we were kind of

103:46

like AI coding is dumb. It like doesn't

103:48

work. It's like not not like mostly a

103:50

waste of time. Like you shouldn't do

103:52

that. And then like you know Opus 4.5

103:54

came out and everybody played with it at

103:56

home and we were like ah crap this

103:58

actually works now. And and it was just

104:01

like again this like like a lot of

104:03

places I think would have trouble

104:07

admitting that they had been that wrong

104:10

about something that important. And

104:12

instead the technical leaders at our

104:14

company who I respect tremendously not

104:18

least for this were sort of like okay we

104:21

were wrong like let's let's let's deal

104:23

with the world now time time to change

104:25

you know and like and like very quickly

104:28

everything got reoriented and

104:29

recalibrated and like I just I think

104:32

that's what it looks like for an

104:34

organization to be able to like adapt to

104:36

a changing environment. I do by the way

104:38

think that was like in some sense the

104:41

right pivot point. I kind of feel like

104:42

we've actually been spending an an

104:44

enormous amount of energy building tools

104:46

and trying to get agentic coding working

104:48

effectively for a few years now. Um, and

104:52

I think up until now it's kind of been

104:56

bad. Like there are a bunch of things

104:58

for which it's great. There are defin

105:00

but but like for the majority of work

105:01

you're doing doing like critical

105:04

software. I think it's more has been

105:06

more likely to slow you down than speed

105:08

you up. And it it sort of they had this

105:10

feeling of like

105:11

>> you know spending a bunch of time

105:13

building a boat and having a sail there

105:15

and like holding the sail up and like

105:16

there's no wind coming. Um and you know

105:18

we get some utility out of it. People

105:20

use it for some things, the tools get

105:21

better, but like with a recent round of

105:23

models both from from like all the

105:25

vendors actually at this point, like

105:26

there are the models are much better.

105:28

Yeah.

105:28

>> Uh and suddenly it feels like there's

105:30

wind in the sales and now it feels like

105:32

we're pretty well prepared

105:33

>> and have you know a good team in place

105:34

and are like, you know, being able to

105:36

deliver a lot of value based on this

105:38

stuff. But there was an awkward period

105:39

of like

105:40

>> I mean these things are miraculous but

105:42

also not super useful. Um and now they

105:45

seem both miraculous and useful.

105:46

>> Yep. Yep. Yep. Yeah. So I don't know it

105:49

is I think I think also like on all of

105:52

this cultural stuff one of the most

105:54

important things is just having senior

105:56

people modeling good behavior like we

105:59

all take great pains the senior people

106:01

at the company take great pains to like

106:05

give credit to others right to like to

106:08

to to loudly proclaim when they were

106:11

wrong or did something dumb just like

106:13

showing that that is what we do.

106:15

Everybody is always looking at the

106:17

implicit like we all have the same

106:18

title, but you're look you're looking at

106:19

the implicit leaders and seeing how they

106:21

act and so having them act the way that

106:23

you want everybody to act is like kind

106:25

of step one.

106:26

>> Yeah. And I I I just want to say like

106:28

don't give it up. Like it is possible to

106:30

maintain at larger scale. I you know I I

106:32

don't want to say we've done all of this

106:33

perfectly but it echoes a lot with the

106:34

kind of things that you're talking

106:35

about. I think we really have been able

106:37

to keep up with it. Um by the one other

106:39

thing that has been I think important is

106:41

the place is designed for long tenurs.

106:44

like we just have people who have been

106:45

around here for a long time. Like the

106:46

turnover rate is pretty low and I think

106:48

that affects a lot of things about the

106:50

culture. It keeps a lot of institutional

106:52

knowledge around and it helps maintain

106:54

the culture. I think one of the things

106:55

about cultures is they're kind of

106:57

mysterious. You don't actually know

106:58

which parts of it are the ones that are

107:00

loadbearing and so you want to be very

107:02

careful about preserving it in a

107:04

somewhat conservative way. There's a lot

107:05

of like Chesterton's tent fence kind of

107:08

thinking going along.

107:09

>> You know, that's why we're in DC.

107:10

Everybody always asks me, why on earth

107:12

did you put a ambitious deep tech

107:14

company in DC and not the Bay Area? And

107:16

it's basically 100% so that we can

107:18

actually keep people and invest in them

107:20

for the long term. It's not just the Bay

107:22

Area has tons and tons of competition.

107:24

It's actually just that the Bay Area has

107:26

a meta culture of job hopping every 9

107:29

months to get slightly more RSUs. And

107:31

basically once every company is in that

107:34

equilibrium, nobody invests in anybody.

107:36

and it's like very hard to to be the one

107:40

that stands out and doesn't act that

107:41

way. Whereas in DC, you know, people are

107:44

used to working for the government and

107:46

working there for like 30 years. And so

107:48

the like kind of ambient expectation in

107:51

the water is like, yeah, you're going to

107:52

go work somewhere and work there for 30

107:54

years. And so we have ridiculously good

107:57

tenure among our engineers and are able

108:00

to invest in them. And it's just like a

108:02

way nicer in my opinion.

108:04

>> That's okay. That's amazing. Okay, that

108:06

seems like a great note to end on. Thank

108:08

you so much. This has been really fun.

108:09

>> This was awesome. Thank you so much for

108:10

having me.

108:12

>> You'll find a complete transcript of the

108:13

episode along with show notes and links

108:15

at signalsandthreads.com.

Interactive Summary

This episode features a conversation with Will Wilson, co-founder and CEO of Antithesis, about the evolution of software testing and development. Wilson shares his journey from studying mathematics to working in distributed databases and eventually founding Antithesis, a company focused on revolutionizing software testing. The discussion delves into the challenges of traditional software development, the intricacies of property-based testing and fuzzing, and the innovative approach of deterministic simulation testing employed by Antithesis. They explore how Antithesis tackles non-determinism in software, the importance of architectural design, and the role of AI in code generation and testing. The conversation also touches upon company culture, the value of long-term employee retention, and the strategic importance of investing in neglected but high-impact areas of technology.

Suggested questions

7 ready-made prompts