I was laid off by Atlassian

Watch on YouTube

Now Playing

Transcript

1032 segments

0:00

I was recently affected by the layoffs

0:03

made by Atlassian and I wanted to take

0:06

some time out to reflect on the time

0:09

that I spent working for Atlassian. I

0:11

worked there for about eight years.

0:13

During that time I built a lot of things

0:16

and I wanted to talk about what I built,

0:18

mainly the things that I personally

0:19

found interesting or that I'm proud of.

0:21

I hope that this video will be useful or

0:24

helpful to someone who perhaps is or was

0:28

in the same situation as me and maybe

0:31

it'll give them some inspiration in

0:32

terms of how they can tackle the same

0:35

things that I did or something similar

0:37

and perhaps avoid some of the mistakes

0:39

that I've made. I also might talk about

0:41

non-technical parts of my experience at

0:43

Atlassian, although most of it will be

0:45

technical and this video will be split

0:48

into chapters so that you can skip to

0:51

sections that are more interesting to

0:54

you rather than rather than watching the

0:56

video from start to finish. So I suppose

0:58

to start with I'll talk about when I was

1:00

first hired and even though I it was

1:02

eight years ago, I still remember the

1:04

interview process, which is different

1:07

nowadays, and the reason why I was hired

1:10

or at least from my perspective the

1:11

reason why I was hired and the things

1:13

that I started working on during the

1:15

start. So yeah, let's just start at the

1:18

interview process. So I was interviewed

1:20

by some people that I now consider

1:23

friends and I remember having the

1:25

impression while being interviewed that

1:28

these individuals were quite intelligent

1:30

and that was something that was exciting

1:32

for me. The interview process consisted

1:33

of a coding quiz on HackerRank, which I

1:36

aced with full marks. Then the first

1:39

technical interview was with two

1:42

interviewers and they gave me a white

1:44

paper and asked me to read it while they

1:46

sat out of the room for about 10

1:48

minutes. They came back in and then

1:50

asked me questions about the white

1:52

paper, asked me to basically articulate

1:54

what was in in white paper and the white

1:56

paper was actually about custom domains,

1:59

and the white paper was by Cloudflare.

2:01

They then asked me a few questions about

2:03

things like microservices and

2:05

architectural things like that, um

2:07

containers and and whatnot. And they

2:09

were happy enough. I don't remember the

2:11

rest, but they were happy enough with

2:13

with me during that stage, so I

2:15

continued to the uh second technical

2:18

interview, which was a troubleshooting

2:19

exercise where I was asked to

2:22

essentially prompt the interviewer for

2:24

information in order to troubleshoot a

2:27

real incident that occurred in

2:28

Atlassian. And it was a it was an

2:30

application problem that lead led to a

2:34

denial of service. Uh so that was fun.

2:36

And then I think I was asked something

2:38

about how um latency-based DNS works,

2:42

and my answer was not accurate, but

2:45

perhaps acceptable. I I I thought about

2:48

it from first principles, and I thought

2:49

that that's uh for example, I thought

2:52

that Route 53 did a triangulation based

2:54

on the actual latency of the client, but

2:58

it is more like that they use a uh they

3:01

probably use a geolocation database in

3:04

order to do latency-based routing of DNS

3:07

requests of the DNS answers, sorry. Then

3:10

after that was a values uh interview.

3:12

And to be honest, I don't really

3:14

remember most of the questions for for

3:16

the values portion, but

3:18

I do remember one thing, which was when

3:21

I asked the question of I asked I asked

3:24

the interviewers to think about 12

3:26

months from now and to look back

3:28

retrospectively, what is the thing that

3:30

I would have had to achieve in order to

3:33

for for you to say it was a good

3:35

decision hiring this person. And then

3:36

they told me about

3:38

a [clears throat]

3:39

an application that they needed to be

3:41

built for the platform within Atlassian,

3:44

and the application would facilitate

3:47

self-service load balancers. Sort of

3:48

similar to if you were using Amazon

3:52

application load balances or the

3:54

equivalent in any cloud provider. But

3:55

for the internal developers of

3:58

Atlassian. And it was essentially just a

4:00

a framework that I personally was not

4:03

familiar with. And I said I could build

4:05

it because I had confidence in building

4:07

web apps with Python at that time. And

4:10

they accepted my level of confidence and

4:13

decided to hire me. So that's the the

4:14

interview portion completed. So I joined

4:17

Atlassian and they have this classic

4:20

saying or impression that when you join

4:23

Atlassian that you are drinking from the

4:26

fire hose because there's so much

4:28

information that you have to absorb in

4:30

the first few weeks and months in order

4:32

to just sort of get going. My first my

4:35

very first task that at least task that

4:38

I gave myself was to build the

4:40

application that they had told me that

4:43

they wanted. Let me just open a browser

4:45

and we'll take a little looky at what I

4:48

mean exactly. Let me just uh move my

4:51

face a little bit. There's more real

4:52

estate. Now obviously it's scalar draw

4:55

is not what I care about. So they wanted

4:57

me to build an open service broker. This

4:59

5:00

a a web app with an API which

5:02

facilitates the provisioning of

5:04

resources for a platform essentially. So

5:07

you can you can see here

5:09

it's sort of built to operate in a

5:11

Kubernetes world where you're submitting

5:13

these provisioning requests as things

5:17

come up and down. And it's going to bind

5:20

a resource to your pod or your cloud

5:23

instance or whatever it is as you can

5:25

see here. And it sort of sits in between

5:28

these real resources. So you might

5:31

provision something like a database and

5:34

then you'll get MySQL. So you'll get

5:36

something that's SQL compatible but

5:38

that's abstracted away for your internal

5:40

developers. Anyhow,

5:42

the spec is uh, is here on GitHub. You

5:46

can take a look at it. It goes into It

5:48

goes into, like, for example, the

5:51

catalog endpoint. And the catalog lists

5:54

all of the services and plans that are

5:56

available on the OSB, and, uh, just

6:00

metadata about them. And you might say

6:03

query the the service broker and then

6:05

display some of the metadata in your

6:08

Maybe you've got a a a console a

6:10

console, like, the Amazon console, but

6:12

maybe you've got something like that

6:13

internally. Where developers can click

6:16

and provision things. In the Atlassian

6:18

case, it was all through configuration

6:20

files that were committed to, um,

6:23

version control, and then those would be

6:25

uploaded, uh, during uploaded from a a

6:27

build server to deploy a service. Um,

6:30

but, yeah. So, you might have, you know,

6:32

other APIs like, uh, provisioning here.

6:35

So, put and patch for updating and

6:38

deletes and blah blah blah. So, you

6:40

would just basically go ahead and and

6:41

implement this. Or

6:43

I mean, if you wanted to build your own,

6:45

and that is essentially what I did. You

6:47

can see also there's a an open API

6:49

document here that has the endpoints.

6:52

So, I chose to build this in in Python

6:54

using Flask. Oh, no. In fact, what I

6:57

What I built it with first is with a

7:00

library called Connection. This is a

7:02

Python library which takes an open API

7:05

document and then creates the API

7:09

handlers for the paths for the API for

7:12

the API routes that are in that

7:13

document. Which is cool, but then we

7:15

eventually I eventually migrated that to

7:18

just pure Flask. And then eventually

7:21

migrated that to, uh, Fast API, which I

7:24

believe is what it still is at the

7:25

moment. Um, okay. So, it's the first 2

7:29

weeks, and my primary focus is to build

7:32

sort of what I promised in the

7:34

interview, which is this web app that's

7:36

going to be a broker for the platform,

7:39

and is going to allow self-service

7:41

provisioning of load balancing in

7:44

Elastic. So, like I mentioned earlier, I

7:46

started with this library called

7:48

connection, which took an open API

7:49

document, turned that into routes. But,

7:52

I'm going to just go with

7:53

what it ended up as, which is a fast API

7:56

app. Let's just say we've got fast API

7:58

here, and then we've got

8:02

a worker, and then we have a database,

8:05

which was DynamoDB. Oh, that's annoying.

8:09

And we would have a client making

8:11

requests. That's why that's a fast API.

8:14

The client would say, "Hey, please

8:16

provision something for me." And the web

8:19

worker wouldn't do it itself. It would

8:21

actually send that over SQS. It would

8:25

drop the task details into SQS.

8:28

And the worker would then handle that.

8:31

So, what does a provisioning task

8:33

actually look like?

8:35

It's something like creating

8:38

DNS records somewhere, maybe creating a

8:42

CloudFront distribution, maybe creating

8:47

some API calls. And this would be the

8:51

provisioning task that the worker would

8:53

do asynchronously, while the web and

8:57

client would wait for it to be completed

8:59

essentially. So, the client's polling

9:01

continuously to say, "Is it ready? Is it

9:03

ready?" And when it is completed, the

9:06

worker writes it to the database, the

9:08

web server checks the status, and then

9:10

responds saying, "Yes, it's finished."

9:12

Or it'll say that something went wrong

9:15

and there was an error.

9:16

So, then we can sort of encapsulate this

9:18

as the open service broker that I built.

9:23

Pretty straightforward.

9:25

Um to be honest, there's not much more

9:27

to this, but we're going to go and talk

9:30

about some of the more complicated bits

9:33

in just a second, and I will directly

9:35

link to this as well. So, we got this

9:37

client requesting

9:39

uh let's say "Please provision a load

9:41

balancing." And that is essentially what

9:43

they were asking for was some kind of

9:46

load balancing somewhere in the edge

9:48

infrastructure of Atlassian to allow

9:50

traffic to go to their service. So,

9:53

that's a good uh demarcation point to

9:56

start talking about the next thing that

9:58

I sort of built.

9:59

And I built it through necessity

10:02

of Essentially, I began to understand

10:06

and unravel the requirements more as I

10:09

went along. One of the architects had

10:11

this idea to replace the load balancers

10:15

at Atlassian, which were enterprise load

10:18

balancers that had licensing costs, with

10:21

a open-source cloud-native sort of

10:24

commodity proxy. And the tech that we

10:26

chose for that was Envoy proxy. You may

10:29

be familiar with Envoy proxy. If you're

10:31

not, then it's very similar to something

10:33

like Nginx, but perhaps more modern than

10:37

Nginx. Um you can take a look at its,

10:40

you know, uh what's what's great about

10:42

it if you want. You can just read

10:44

through like why why choose Envoy, blah

10:46

blah blah. But essentially, we wanted

10:47

[clears throat] to replace the

10:48

enterprise load balancers we had, make

10:50

them self-service, so that devs

10:52

effectively didn't have to talk to us to

10:55

go set up their load balancing. So,

10:57

Envoy has an API that allows you to

11:02

configure it dynamically. Being able to

11:04

reload the configuration at run time

11:06

means that you can deploy a whole bunch

11:08

of proxies and have them sit there

11:11

running all the time.

11:12

And then when someone needs different

11:14

configuration for their particular uh

11:16

service, then they can push out a change

11:19

through the provisioning task detailed

11:21

here. And those changes should flow to

11:25

the proxy somehow. And so now, that's a

11:27

good time to talk about the Envoy

11:30

management server that I built, which we

11:32

called the Envoy control plane. And this

11:35

was it's essentially quite similar

11:40

uh to this.

11:41

Yet again, we used a Fast API app. But

11:44

this was slightly different actually.

11:46

Let's go into a little bit of detail

11:47

here. I'm just going to wing this

11:49

because I should be able to wing it

11:50

because I know it quite well. Uh this is

11:52

actually a I open sourced this this

11:55

software and I called it Sovereign. You

11:56

can actually go find that on Bitbucket.

11:58

It's it's a public repo at least for

12:01

now. I don't know if that's going to be

12:03

the case always.

12:04

But essentially Sovereign runs a Fast

12:06

API app. And some of the things that it

12:09

takes in as

12:11

uh configuration are templates

12:15

and context. And so the app uh polls

12:20

these. Uh it's obviously got like uh say

12:24

let's just say this is the

12:25

configuration. Okay, let's stick Now,

12:28

so the templates might be particular

12:32

resource types. And in Envoy, you've got

12:35

stuff like clusters, routes, listeners.

12:39

And let's just leave it at that for the

12:40

moment. You'd have these kind of

12:43

templates. And so when this when this

12:47

management server loads up, it'll read

12:49

in these templates in the context and

12:50

make these available as APIs for the uh

12:55

proxies. So then you can imagine let's

12:58

just say we've got uh an Envoy here.

13:02

It is going to request these things and

13:05

Sovereign is going to respond by taking

13:08

the context, putting it into the

13:10

templates, and rendering out different

13:12

content uh as the context changes. Now,

13:15

where does the context come from? Well,

13:17

this is part of this management server

13:19

that is dynamic. So well, let's just uh

13:22

let's just do a bit of flip around here.

13:26

Put the context

13:27

the context actually comes from this

13:30

database, but we are requesting it from

13:34

the broker. So, the we're we're

13:36

requesting data from the broker and

13:38

other sources, in fact. Let's just add

13:40

another source here. Let's just say we

13:42

have a little S3 bucket with some data,

13:45

and maybe that data is changing over

13:46

time. So, we take that data, it's

13:48

dynamic, we feed that into the

13:49

templates. The The templates have logic

13:53

that spits out particular Envoy

13:56

configuration, and then the proxy

13:57

changes over time. So, what happens is

14:00

we've got a client that's making a

14:02

provisioning requests to our broker. The

14:04

worker is doing some provisioning tasks,

14:08

and then writing the new data to the

14:10

database. Then the the management

14:13

server, let's say. Stop this. Let's

14:15

encapsulate this a little bit. The

14:16

management server is then polling that

14:18

data from various places and generating

14:21

new configuration. That configuration

14:23

hits the proxy, and then it starts doing

14:26

different stuff. That is essentially the

14:28

second part of what I built. So, we've

14:30

got a broker, we've got a management

14:31

server, we've got the client, we've got

14:33

the proxy. Uh why did this detach?

14:35

Anyway, so now we've sort of figured

14:37

this stuff out.

14:39

This is all at a very high level.

14:41

>> [clears throat]

14:41

>> So, we've got this created. Now we can

14:44

sort of think more about more

14:47

infrastructure type things. We've got

14:49

this proxy, but how do we end up with

14:51

this proxy? How does that actually get

14:53

provisioned? What is it? Where does it

14:55

live? Well, let's start with one thing,

14:58

which is that these proxies,

15:02

there's many many many of them, as you

15:04

would expect, and they are provisioned

15:06

15:07

15:08

they are provisioned by a CloudFormation

15:11

template. This is an AWS thing that

15:13

allows you to essentially do

15:15

infrastructure as code, and it allows

15:17

you to create resources in in AWS that

15:21

you would normally create via the

15:23

console if you were just uh uh let's say

15:26

uh basic user. So, what kind of stuff do

15:29

we create in here? Well, if we were to

15:31

do this stuff from scratch, we'd

15:33

probably have like a VPC and then we'd

15:36

have uh you know, a subnet inside that

15:40

VPC and maybe we'd have

15:43

an internet gateway, maybe we'd have

15:47

we'd all security group, maybe we'd have

15:50

a key pair, maybe an IAM role.

15:54

Um oh, of course we need to have the

15:56

auto scaling group.

15:58

Of course, that's what's going to be

16:01

creating these

16:03

EC2 instances.

16:05

And well, the auto scaling group needs

16:10

an AMI, doesn't it? Well, indeed it does

16:12

need an AMI. IAM role has to be attached

16:15

16:16

uh must be attached to all these. The

16:18

key pair goes on on the these. Security

16:21

group is attached to to the

16:24

this Well, it's probably attached to the

16:26

auto scaling group to be fair. Well, the

16:28

EC2 instances would inherit it from the

16:30

from the ASG, blah blah blah. So, we've

16:33

got all these like uh blocks of

16:35

resources and stuff like that. Cool.

16:37

Let's let's put these up put these up

16:38

together, blah blah blah. Cool. So,

16:41

yeah, we've kind of got like a little

16:42

template going on here and it's creating

16:44

these proxies in many different regions.

16:48

Uh we might have we might have like

16:50

uh an NLB in here, a layer four proxy.

16:54

Maybe we'd have uh bit of maybe a bit of

16:57

ACM. Of course, these acronyms might

16:59

mean nothing to some people, but for

17:01

people that have used AWS, they would

17:03

know what these things are. And they

17:04

know it's not really that complicated.

17:06

It's like pretty basic building blocks

17:08

and this is what we created

17:11

17:12

say 2,000 proxies,

17:14

uh something like 13 regions, blah blah

17:16

blah. Um and we also had a little bit of

17:19

route 53 records for other stuff. Now,

17:23

the AMI, it's not really provisioned by

17:26

the the template. It's more like it's

17:28

referenced by the template, isn't it?

17:30

So, that would bring us on to the next

17:32

piece of this thing that I built, which

17:34

is, well, we need to produce an AMI. We

17:37

need to produce a standard image for

17:39

these proxies, and it's going to include

17:41

all the important stuff in there. So,

17:43

how do we create this image for the

17:45

proxy? Well, in this case, we had uh

17:48

repository that was using HashiCorp

17:50

Packer, and

17:52

uh we had um

17:55

a Salt Stack

17:56

uh let's call it configuration.

17:58

And so, we would use Packer to um let's

18:02

say we'd have the EC Oh, we'd use the

18:05

EC2 provisioner. And so, we would create

18:08

an EC2 in like a dev account. We'd then

18:11

upload all of our Salt Stack

18:12

configuration. Salt Stack, by the way,

18:14

is very similar to Puppet, Ansible, and

18:16

Chef, in case you're not aware of what

18:18

those are. It is configuration

18:19

management tools, and that's a fancy way

18:22

of saying that I want to run I want to

18:25

install packages, put files, and run

18:28

services on a machine in a particular

18:30

way, in a particular order, and it

18:32

automates that process, makes that

18:34

process declarative for you. Well, not

18:37

for you, but it helps you to to make it

18:38

declarative. So, we created a little um

18:41

created a little EC2 live running EC2

18:44

here. We dump the config on there, we do

18:47

a provisioning step, and then we take

18:49

the

18:50

uh essentially turn this into an image,

18:52

like shut it down, uh whatever, snapshot

18:54

it, and turn it into an image. So, that

18:56

essentially would just produce this uh

18:58

AMI. Now, what was included in here?

19:00

Let's Let's say we can just uh we can

19:02

include a few things here. Let's just

19:04

say we had um we had states for for

19:07

Envoy. So, like let's say install,

19:10

configure,

19:11

uh let's say just install and configure

19:13

Envoy Uh

19:15

logging agents, security,

19:18

let's say slash hardening, network

19:19

tuning,

19:21

containers,

19:23

tracing. Oh, let's just say let's just

19:25

say observability agent there. And that

19:28

can cover I can cover logging, tracing,

19:30

metrics. So, that's essentially what's

19:32

going on here. Produces the AMI,

19:35

CloudFormation template takes this AMI

19:37

provisions these EC2s

19:40

EC2. And they're running with all this

19:42

stuff. And then when they when they get

19:44

provisioned, that's something that we

19:46

forgot here. There's parameters.

19:48

Parameters, bump that up, bump this up,

19:52

make it neat. So, we've got the

19:53

parameters and these at

19:57

runtime would pass in secrets and keys

20:00

and blah blah blah. And then these these

20:03

proxies would grab the resources, be

20:05

configured, and then they would be

20:07

running and accepting traffic. Boom,

20:09

that's it. Everything's done, working.

20:11

This was essentially the first two years

20:14

of working at Lyft. So, now when a

20:16

developer says, "I want to run my

20:19

service and I want it to be

20:21

I want it to be accessible on the

20:22

internet with all the fancy bells and

20:24

whistles and routing and advanced

20:25

stuff." We'd say, "Yes, no problem. Let

20:28

me just get that provisioned for you."

20:30

20:31

send off the provisioning task, we write

20:33

something to the database, we tell them

20:34

it's ready, then the management server,

20:37

it's the

20:39

broker says, "What's the current state

20:42

of things?" Takes that data, plus other

20:44

data, puts it into the templates,

20:46

creates resources out of those

20:48

templates, gives them to Envoy on when

20:50

it requests them. And this was all

20:53

pre-provisioned.

20:55

That's long-lived infrastructure with

20:57

CloudFormation. And the CloudFormation

20:58

is relying on an AMI that it can use to

21:01

provision those images, those machines.

21:03

So, yeah, that is probably the first 24

21:06

months. So, what was next after this?

21:08

So, this was the foundation of our team,

21:10

essentially the product that we were

21:12

going forward with, um which is uh

21:14

centralized load balancing managed by

21:17

our team, and all of the features that

21:20

we provided to our customers would live

21:22

in logic defined in these templates.

21:25

We've now laid the foundation for the

21:27

team. We've got proxy infrastructure

21:29

that's reacting dynamically to services

21:32

that are being deployed with different

21:34

configurations over time. What was next

21:36

after that point? The big thing after

21:38

that was taking some of the larger

21:40

products and making it possible for them

21:43

to use this platform component. That was

21:46

one big part, and the second big part

21:48

was migrating all of the microservices

21:51

within Atlassian to use this. And that

21:53

was relatively easier because we could

21:56

enforce that through the platform.

21:58

Essentially, what that means is that the

22:00

platform was previously providing very

22:03

basic load balancing to every service.

22:05

And they forced a switch to where you

22:08

could no longer expose your service

22:10

publicly through their load balancer,

22:12

which is too basic, and you had to go

22:14

through our centralized load balancing

22:16

infrastructure and to explicitly

22:18

configure it as a way of signaling your

22:21

intention for that service to be

22:23

publicly accessible. Whereas previously,

22:26

it could have just been maybe accidental

22:28

that your service was public and not

22:30

very well protected. So, that was the

22:32

big major push. We got products like

22:35

Jira, Confluence, Bitbucket, Status

22:38

Page, and many others behind this edge

22:41

infrastructure. And then, what was after

22:43

that? Well, now we can sort of talk more

22:45

about Let's say we can talk more about

22:48

the uh the Envoy-based product that we

22:51

had here. So, this particular thing,

22:54

we've got this groundwork of being able

22:57

to take basic inputs from a a developer

23:00

and to turn that into templated

23:03

configuration. Now, Envoy has a lot of

23:06

configuration. It has a lot of stuff you

23:09

can configure. Let's just look at the

23:11

routes, for example. Let's look at the

23:13

virtual host, for example. You can

23:15

configure what domains to accept traffic

23:17

on. Pretty basic. You can do routing.

23:19

Sort of basic, but once you delve into

23:21

how you can do this, it gets pretty

23:23

complicated pretty quickly. You can

23:25

match on different things. You can route

23:27

it in different ways. You can do direct

23:29

responses. do redirects, blah, blah,

23:31

blah. You can add and remove headers. I

23:34

guess I guess you could say that's

23:35

pretty standard, but you can also

23:37

choose, for example, when you're

23:39

configuring a route action, you can also

23:40

choose to send to any cluster that's on

23:44

the proxy. So, then if I have a thousand

23:47

devs, or a thousand services, and they

23:50

each have their own cluster, and any

23:52

route can send to any cluster, well, it

23:54

sort of brings up this point of well,

23:56

this

23:57

data here needs to be validating that

24:01

and abstracting that, and so on and so

24:05

forth. So, there was definitely a

24:07

concentration of a lot of the

24:09

development work around this logic here,

24:11

making sure it was validated here in

24:13

terms of the parameters were validated

24:16

such that when those parameters were run

24:18

through the logic in these in these

24:21

resources, that it would produce valid

24:23

resources. Pretty standard, I suppose

24:25

you could say. I don't know. Maybe I do

24:27

feel like I have the curse of knowledge.

24:29

Um and that this stuff seems easier to

24:32

me now because I I've done so much with

24:34

it. Uh but there's a lot. There's a lot

24:36

in here. And if we go into, uh for

24:38

example, extensions, there's a lot of

24:40

extensions that can be applied to a

24:43

listener or a or a cluster. For example,

24:46

you might have, uh where is it? You've

24:48

got, uh network filters here. You've got

24:51

all kinds of network filters. And a big

24:53

one that we obviously used was a HTTP

24:56

connection manager, where you could

24:58

configure routing and how to handle

25:01

proxies and web sockets and all this

25:04

stuff. And then, if we go a little bit

25:06

before that, there's also things like

25:08

external processing and external

25:11

authorization. And this sort of brings

25:13

us to Oh, let's say something that

25:16

happened next. So, I did briefly mention

25:18

that some of the big parts after

25:19

building this was to migrate big

25:21

products onto Let's assume that's all

25:23

finished. It took It took some time. It

25:25

took a couple years because there were

25:27

many features that needed to be built

25:29

out here and and wherever else in order

25:32

to support the larger products and their

25:34

special cases to work on

25:37

what's effectively a generic

25:39

multi-tenanted platform. So, let's just

25:41

assume that they're all migrated. Then,

25:42

we have more features that we want to

25:45

add. I did sort of allude to like we

25:47

have this We have this groundwork. We

25:49

have this dynamic configuration. What

25:52

I'm trying to say is that we we created

25:54

opportunity. We created opportunity to

25:57

centralize logic and to handle concerns

26:01

early in the chain of requests. What I

26:04

mean by that is a customer, let's just

26:07

make a smiley customer. And customer is

26:10

someone that's using our cloud products

26:12

or Atlassian cloud products. They are

26:14

hitting the

26:16

Let's just say they're hitting an NLB

26:17

first and that's then being proxied to

26:19

these boys. Yes? If we can deal with the

26:22

problems here before they reach a

26:24

service,

26:26

let's say and let's give it a square.

26:28

Let's call it a back-end service, you

26:30

know. So, the requests are flowing in

26:31

from the customer to the proxies and to

26:34

the Pretty standard stuff. If we can

26:35

deal with certain concerns here before

26:39

it reaches here, we save a lot of time,

26:41

we save some money, which is and it

26:44

saves the customer time. It's great for

26:45

everyone, really. Um

26:47

and one of those things Now, this is

26:49

where the the diagram becomes

26:51

complicated, so let's move off to the

26:53

side. Let's just copy a few of these.

26:55

Let's grab

26:56

three things

26:58

move over to the side. We've got the

26:59

customer talking to the proxy and the

27:01

proxy is talking to back end. Of course,

27:04

the request comes back up and back out

27:06

to the customer. Fine and dandy. Yes,

27:08

this is a this is a proxy. Whatever,

27:10

there's no surprises here. Now, without

27:12

with with the products that Elysium

27:15

runs, there's all kinds of stuff that

27:17

needs to happen like authentication for

27:19

example or authorization or

27:23

DDoS protection or rate limiting or

27:26

access logs. All this kinds of stuff

27:28

that needs to happen and it's just turns

27:31

out that we can deal with them here

27:32

instead of on a bazillion bazillion back

27:37

end services. Just imagine there are a

27:40

bazillion bazillion of these. Just

27:43

zillions upon zillions. See Daisy. Just

27:47

zillions and zillions. They're like

27:48

gazillion. Now, can you imagine if a

27:50

thousand dev teams needed to deal with

27:53

all this stuff plus more on their own

27:56

service? It would be a tremendous waste

27:59

of money for the company. It would slow

28:01

down features. The customer wouldn't get

28:03

their features when they need them and

28:05

stuff is already hard enough to deliver

28:07

as it is. Thus, the platform and

28:10

centralized management of resources and

28:12

centralized

28:13

implementation of these features. So,

28:16

how how were some of these things

28:17

implemented? Well, DDoS protection was

28:19

really provided by

28:22

CloudFront. That was

28:24

that was

28:25

spearheaded by a colleague of mine who

28:27

is very smart and conscientious. And

28:30

essentially, let's make this a bit more

28:33

accurate. Let's just say let's get rid

28:34

of these. There's an NLB here. Oh, blah

28:37

blah blah blah blah. And of course, it's

28:39

two-way. So, that's one way that we can

28:41

take care of that concern for these back

28:43

end services. Great, we've solved solved

28:45

the concern for that. Fantastic. These

28:47

others, well, access logs, what we can

28:50

do is something like we can use these

28:53

network filters. Yes, we use the network

28:55

filters. For example, in the HCM, we

28:58

have

29:03

Where are the access logs? Access log.

29:05

Now, remember, all of this configuration

29:08

is dynamic. It's all dynamic and it is

29:11

created by templates which abstract away

29:15

the resource configuration from the

29:17

developer who wants to configure it.

29:19

They provide simple parameters. Those

29:21

parameters are then validated and then

29:22

they are fed into the template as

29:24

context so that we produce the correct

29:27

template. That means that

29:29

they send us a little bit of JSON and we

29:32

set up this whole thing for them with

29:34

all the access logging and blah blah

29:36

blah, whatever. So, that is done, in

29:38

fact, inside the proxy, natively.

29:41

Fantastic. Some of these things,

29:42

however, a little bit more complicated.

29:44

These things, we need to use a sidecar

29:47

model where Envoy is talking out the

29:48

side and then these are their own

29:51

services running locally on the on the

29:53

proxy. So, these would be like

29:56

containers, essentially. We've got this

29:58

sidecar model and those sidecars, some

30:01

of them were contributed by other teams

30:03

and some of them were created by me and

30:05

our team. The authentication and the

30:07

authentication sidecar was created by

30:09

me, written, of course, in the Lord's

30:12

language, Rust. Authorization was done

30:15

by another team and rate limiting was

30:16

done by another team. And so, they were

30:18

able to contribute these sidecars,

30:20

which, by the way, were set up and

30:23

were downloaded and configured onto the

30:26

AMI by this provisioning AMI

30:28

provisioning flow. Great. So, now we

30:31

have a programmable proxy with sidecars

30:35

that have their own separate logic from

30:37

the proxy and they, too, can actually

30:40

receive configuration, which is dynamic

30:43

over the wire locally and

30:46

and make it even more program. So, we're

30:48

solving all these concerns before they

30:50

hit these

30:51

these back ends and in very very little

30:54

time. So, that was

30:55

essentially that is some of the stuff I

30:58

worked on after migrations and blah blah

31:00

blah. Yay. What I do after that? With

31:03

this big blob rid of this mess. So, then

31:05

we had some non-technical requirements

31:08

come through. More compliance

31:11

and things like that. And that effort

31:13

was very tedious and boring for me

31:15

personally. It didn't involve building

31:17

new stuff. It involved taking all of

31:19

this, making sure that it was compliance

31:22

for in certain ways. Very bored boring

31:26

checklist ticking work. Blah blah blah.

31:29

So, I said earlier that I would also go

31:30

over some of the non-technical things

31:32

that I had to go through while working

31:34

at Atlassian. Obviously, all of that

31:35

stuff is sort of high-level technical

31:37

stuff that I just showed. What was some

31:40

of the other stuff that I went through

31:42

during my eight-year slog at Atlassian?

31:45

The first few things to come that come

31:47

to mind is that I have grown

31:50

tremendously in my diplomacy skills,

31:54

conflict avoidance, probably conflict

31:57

resolution as well. Being able to

31:59

persuade, propose ideas, being able to

32:02

teach, educate, and mentor. These are

32:05

the non-technical things that you

32:06

probably don't hear a lot about. But

32:09

after Another thing is that the ability

32:11

to maintain things, maintain software

32:15

and systems, to see where the cracks

32:17

show up and to build things so that

32:21

those cracks don't show up as or at

32:23

least to make them show up late as

32:25

possible. That's definitely something

32:26

that I picked up. Let's just talk about

32:28

that maintenance for a sec. I noticed

32:30

over the eight years that I was there,

32:32

when I built these apps, these

32:35

services, that there's obviously that at

32:37

the very start there's the requirement

32:40

to onboard people and write

32:41

documentation and train people so that

32:44

they understand how things work, know

32:46

how to contribute to them, and debug

32:48

them. So that when they become when they

32:50

go on call, they know where to look,

32:52

what could go wrong, where do things

32:54

break essentially. So, you know, that's

32:57

whether that's knowing what kind of what

33:00

particular log messages mean, what sort

33:02

of metrics to check when something is

33:05

going wrong and what those metrics could

33:07

allude to, how to resolve those

33:10

um you know, particular expected

33:12

problems if they're not automated away.

33:14

Um and this could be like, you know,

33:16

Amazon could have an outage and the

33:18

database isn't access for example. What

33:20

do you do in that case? What if SQS

33:22

stops working and you can't do any

33:24

provisioning tasks? What how what impact

33:26

does that have on the services that need

33:28

to provision the resources? And how do

33:29

you resolve um

33:31

What happens if an if a proxy receives

33:33

bad configuration? What if it receives

33:35

configuration that's valid, but that

33:37

destroys the traffic that's flowing

33:39

through? How do you pick up on those?

33:41

What do you check, etc. etc. So there's

33:43

obviously a lot of that at the start

33:45

when you build something. There's a lot

33:46

of that at the start. But the thing

33:48

that's more difficult is over time

33:50

people come and go. People get hired.

33:53

People get people leave for other jobs

33:55

and whatnot. And so you get you have to

33:57

do that onboarding again obviously. But

33:59

you should have more people that are

34:01

able to do that onboarding collectively.

34:02

But then you sort of bring in new

34:04

opinions. People look at an existing

34:06

codebase and they want to change things.

34:08

They want to make it better, and so on

34:10

and so forth. And so they do that. And

34:12

change ends There's I I suppose there's

34:15

this concept of churn in the codebase.

34:18

The area that churns, it's sort of it

34:20

becomes predictable where all the churn

34:22

is going to be

34:24

at a certain stage. And once you notice

34:26

that there is some churn, it's sort of a

34:29

a smell. It is it's an indication that

34:32

that part of the service or project is

34:35

going to keep increasing in size or

34:38

complexity. And something there needs to

34:41

happen. Something needs to be done to

34:43

avoid that mess. It's just just how

34:45

software goes, I suppose. It'll be

34:47

interesting with all these

34:49

vibe coded apps and AI assisted apps to

34:52

see how we handle that. When we have

34:54

people that are not really familiar with

34:56

what they've created, and the

34:58

maintenance burdens appear. They don't

35:00

appear at the beginning. There's just

35:01

not enough going through. It hasn't been

35:03

around for long enough. There hasn't

35:05

been enough changes. Building something

35:06

is easy. Changing it and making sure

35:08

that it you can still change it over

35:09

time is difficult. Because as you change

35:12

things, it slowly becomes harder to

35:14

change. Things start to get coupled, and

35:16

all of a sudden when you change

35:18

something in one area, it affects

35:19

another, and you have to deal with the

35:20

task of detangling something. And you

35:22

might be able to find these areas quite

35:24

quickly, get an LLM to perform the

35:26

detangling for you. I think that's If we

35:28

can do that, that's fantastic. But I

35:30

don't want to be too optimistic just in

35:32

case. So, there's that on That's my

35:35

opinion on the maintenance side of

35:36

things. The next thing I want to talk

35:38

about is when I mentioned diplomacy,

35:41

what I'm really trying to say is that I

35:43

was exposed to different types of

35:46

managers and colleagues over time. And

35:48

everyone has different personalities and

35:50

styles of working. And because I was

35:52

exposed to so many different types, I

35:54

experienced conflicts with certain

35:56

people. And even though I had conflicts,

35:59

there's still people that I respect.

36:00

It's just something that happens when

36:03

you when your personality doesn't mix

36:05

with their personality. And that's just

36:06

something that's a bit inevitable. And I

36:08

think that the only thing you can really

36:10

do in those situations is to try to have

36:13

the self-awareness and the awareness of

36:15

the other person and the, I suppose,

36:18

understanding of psychology and and how

36:20

people work to an extent, so that you

36:23

can be responsible for that difference

36:25

and the potential for conflict, and to

36:28

handle it effective to to anticipate the

36:31

conflict that's going to arise and and

36:33

to do something to make the relationship

36:36

work. And maybe it's impossible. I don't

36:37

know. But that was definitely a source

36:39

of great stress and at [clears throat]

36:41

times it affected my performance. And so

36:45

I do think that because it affected my

36:48

performance that I took it quite

36:50

seriously and I learned and changed as a

36:53

result. So the next time that those

36:55

situations come around, I do firmly

36:58

believe that I'll be able to handle them

37:00

quite a lot better. And then some of the

37:02

other things, in fact one of the things

37:04

that I found quite challenging was

37:08

mentoring. And so I find it easy to help

37:11

people to point out areas where they

37:15

need understanding and to deliver that

37:18

understanding to them, to break down

37:20

complex things into simple terms so that

37:23

they can build a mental model of the

37:26

system that they're working on. I have

37:27

that ability. I'm quite good at that.

37:29

But mentoring is distinct from that. I

37:31

had an intern in the last year and I

37:34

will first say that the result of their

37:36

internship was that they got the highest

37:38

rating possible and it essentially

37:41

guarantees

37:42

an offer to work at a last year. The

37:44

project that they worked on was very

37:46

impressive and how they approached it

37:49

and and built it was very impressive.

37:51

And so that's why they got that

37:52

excellent rating. What I found

37:54

personally difficult was striking the

37:57

balance between It was essentially

38:00

striking the balance between how much

38:02

time I give to the mentee and what that

38:06

time would consist of, whether it's, you

38:08

know, I didn't I don't want to give them

38:09

answers to problems, but I don't want

38:11

them to get so stuck that they become

38:13

frustrated. I have no idea if I reached

38:16

that balance, but I suppose the results

38:18

speak for themselves. I I but I I don't

38:20

I don't I'm not sure if I can attribute

38:22

the results to me necessarily. The

38:24

intern was helped by some of my other

38:26

colleagues when in areas that I'm much

38:28

weaker in. So, they effectively got

38:31

subject matter experts in a few

38:34

different areas to contribute to their

38:36

success. But then they did the majority

38:38

of the legwork to actually build the

38:41

thing and to test it and to make design

38:44

decisions and stuff like that. And it

38:45

was very successful. But I still have

38:46

this lingering impression of feeling

38:48

that mentoring is difficult for me and

38:51

that I I don't have um a good way of um

38:55

figuring that out because I've never

38:56

been mentored myself. So, I don't really

38:58

know what to expect and what they do.

39:01

But I want to emphasize that that's a

39:03

very specific type of mentoring that I'm

39:05

not too sure about. Whereas training my

39:07

colleagues, getting them to understand,

39:09

working, you know, working through

39:11

problems with my colleagues, that was

39:12

that was essentially my bread and butter

39:14

during the last half of my employment.

39:16

You know, jumping on uh uh call and

39:19

going through stuff. Feedback that I got

39:21

from my colleagues all the time was that

39:22

I was always available to help and that

39:24

I could boil down hard topics into

39:26

something that was understandable, which

39:29

I'm pretty proud of. And I've been

39:31

yapping for a while. I think that covers

39:32

a quite a lot. If I remember more, I'll

39:34

probably just make a second video. Um

39:36

maybe maybe if people are interested, I

39:38

could actually go through and build some

39:40

of these things. I could actually go

39:41

through and build some of these things

39:42

from scratch on stream or just a video

39:45

uploaded to kind of show, I guess,

39:47

essentially what I made and maybe

39:48

recreate and sharpen my skills a little

39:50

bit more. Um maybe. I've got a lot of

39:52

stuff on my to-do list, so maybe maybe

39:54

not. It depends on the demand. Anyway,

39:55

I'm going to cut the video from here. If

39:57

you listened all the way through or to

39:59

portions, then thank you very much. I

40:01

hope it was interesting and enlightening

40:03

and whatever else. I'll catch you

40:05

around.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

A former Atlassian software engineer reflects on their eight-year tenure, detailing the technical architecture they built for centralized load balancing using Envoy proxy, and sharing personal insights into professional growth, conflict resolution, and the challenges of mentoring.