HomeVideos

I was laid off by Atlassian

Now Playing

I was laid off by Atlassian

Transcript

1032 segments

0:00

I was recently affected by the layoffs

0:03

made by Atlassian and I wanted to take

0:06

some time out to reflect on the time

0:09

that I spent working for Atlassian. I

0:11

worked there for about eight years.

0:13

During that time I built a lot of things

0:16

and I wanted to talk about what I built,

0:18

mainly the things that I personally

0:19

found interesting or that I'm proud of.

0:21

I hope that this video will be useful or

0:24

helpful to someone who perhaps is or was

0:28

in the same situation as me and maybe

0:31

it'll give them some inspiration in

0:32

terms of how they can tackle the same

0:35

things that I did or something similar

0:37

and perhaps avoid some of the mistakes

0:39

that I've made. I also might talk about

0:41

non-technical parts of my experience at

0:43

Atlassian, although most of it will be

0:45

technical and this video will be split

0:48

into chapters so that you can skip to

0:51

sections that are more interesting to

0:54

you rather than rather than watching the

0:56

video from start to finish. So I suppose

0:58

to start with I'll talk about when I was

1:00

first hired and even though I it was

1:02

eight years ago, I still remember the

1:04

interview process, which is different

1:07

nowadays, and the reason why I was hired

1:10

or at least from my perspective the

1:11

reason why I was hired and the things

1:13

that I started working on during the

1:15

start. So yeah, let's just start at the

1:18

interview process. So I was interviewed

1:20

by some people that I now consider

1:23

friends and I remember having the

1:25

impression while being interviewed that

1:28

these individuals were quite intelligent

1:30

and that was something that was exciting

1:32

for me. The interview process consisted

1:33

of a coding quiz on HackerRank, which I

1:36

aced with full marks. Then the first

1:39

technical interview was with two

1:42

interviewers and they gave me a white

1:44

paper and asked me to read it while they

1:46

sat out of the room for about 10

1:48

minutes. They came back in and then

1:50

asked me questions about the white

1:52

paper, asked me to basically articulate

1:54

what was in in white paper and the white

1:56

paper was actually about custom domains,

1:59

and the white paper was by Cloudflare.

2:01

They then asked me a few questions about

2:03

things like microservices and

2:05

architectural things like that, um

2:07

containers and and whatnot. And they

2:09

were happy enough. I don't remember the

2:11

rest, but they were happy enough with

2:13

with me during that stage, so I

2:15

continued to the uh second technical

2:18

interview, which was a troubleshooting

2:19

exercise where I was asked to

2:22

essentially prompt the interviewer for

2:24

information in order to troubleshoot a

2:27

real incident that occurred in

2:28

Atlassian. And it was a it was an

2:30

application problem that lead led to a

2:34

denial of service. Uh so that was fun.

2:36

And then I think I was asked something

2:38

about how um latency-based DNS works,

2:42

and my answer was not accurate, but

2:45

perhaps acceptable. I I I thought about

2:48

it from first principles, and I thought

2:49

that that's uh for example, I thought

2:52

that Route 53 did a triangulation based

2:54

on the actual latency of the client, but

2:58

it is more like that they use a uh they

3:01

probably use a geolocation database in

3:04

order to do latency-based routing of DNS

3:07

requests of the DNS answers, sorry. Then

3:10

after that was a values uh interview.

3:12

And to be honest, I don't really

3:14

remember most of the questions for for

3:16

the values portion, but

3:18

I do remember one thing, which was when

3:21

I asked the question of I asked I asked

3:24

the interviewers to think about 12

3:26

months from now and to look back

3:28

retrospectively, what is the thing that

3:30

I would have had to achieve in order to

3:33

for for you to say it was a good

3:35

decision hiring this person. And then

3:36

they told me about

3:38

a [clears throat]

3:39

an application that they needed to be

3:41

built for the platform within Atlassian,

3:44

and the application would facilitate

3:47

self-service load balancers. Sort of

3:48

similar to if you were using Amazon

3:52

application load balances or the

3:54

equivalent in any cloud provider. But

3:55

for the internal developers of

3:58

Atlassian. And it was essentially just a

4:00

a framework that I personally was not

4:03

familiar with. And I said I could build

4:05

it because I had confidence in building

4:07

web apps with Python at that time. And

4:10

they accepted my level of confidence and

4:13

decided to hire me. So that's the the

4:14

interview portion completed. So I joined

4:17

Atlassian and they have this classic

4:20

saying or impression that when you join

4:23

Atlassian that you are drinking from the

4:26

fire hose because there's so much

4:28

information that you have to absorb in

4:30

the first few weeks and months in order

4:32

to just sort of get going. My first my

4:35

very first task that at least task that

4:38

I gave myself was to build the

4:40

application that they had told me that

4:43

they wanted. Let me just open a browser

4:45

and we'll take a little looky at what I

4:48

mean exactly. Let me just uh move my

4:51

face a little bit. There's more real

4:52

estate. Now obviously it's scalar draw

4:55

is not what I care about. So they wanted

4:57

me to build an open service broker. This

4:59

is

5:00

a a web app with an API which

5:02

facilitates the provisioning of

5:04

resources for a platform essentially. So

5:07

you can you can see here

5:09

it's sort of built to operate in a

5:11

Kubernetes world where you're submitting

5:13

these provisioning requests as things

5:17

come up and down. And it's going to bind

5:20

a resource to your pod or your cloud

5:23

instance or whatever it is as you can

5:25

see here. And it sort of sits in between

5:28

these real resources. So you might

5:31

provision something like a database and

5:34

then you'll get MySQL. So you'll get

5:36

something that's SQL compatible but

5:38

that's abstracted away for your internal

5:40

developers. Anyhow,

5:42

the spec is uh, is here on GitHub. You

5:46

can take a look at it. It goes into It

5:48

goes into, like, for example, the

5:51

catalog endpoint. And the catalog lists

5:54

all of the services and plans that are

5:56

available on the OSB, and, uh, just

6:00

metadata about them. And you might say

6:03

query the the service broker and then

6:05

display some of the metadata in your

6:08

Maybe you've got a a a console a

6:10

console, like, the Amazon console, but

6:12

maybe you've got something like that

6:13

internally. Where developers can click

6:16

and provision things. In the Atlassian

6:18

case, it was all through configuration

6:20

files that were committed to, um,

6:23

version control, and then those would be

6:25

uploaded, uh, during uploaded from a a

6:27

build server to deploy a service. Um,

6:30

but, yeah. So, you might have, you know,

6:32

other APIs like, uh, provisioning here.

6:35

So, put and patch for updating and

6:38

deletes and blah blah blah. So, you

6:40

would just basically go ahead and and

6:41

implement this. Or

6:43

I mean, if you wanted to build your own,

6:45

and that is essentially what I did. You

6:47

can see also there's a an open API

6:49

document here that has the endpoints.

6:52

So, I chose to build this in in Python

6:54

using Flask. Oh, no. In fact, what I

6:57

What I built it with first is with a

7:00

library called Connection. This is a

7:02

Python library which takes an open API

7:05

document and then creates the API

7:09

handlers for the paths for the API for

7:12

the API routes that are in that

7:13

document. Which is cool, but then we

7:15

eventually I eventually migrated that to

7:18

just pure Flask. And then eventually

7:21

migrated that to, uh, Fast API, which I

7:24

believe is what it still is at the

7:25

moment. Um, okay. So, it's the first 2

7:29

weeks, and my primary focus is to build

7:32

sort of what I promised in the

7:34

interview, which is this web app that's

7:36

going to be a broker for the platform,

7:39

and is going to allow self-service

7:41

provisioning of load balancing in

7:44

Elastic. So, like I mentioned earlier, I

7:46

started with this library called

7:48

connection, which took an open API

7:49

document, turned that into routes. But,

7:52

I'm going to just go with

7:53

what it ended up as, which is a fast API

7:56

app. Let's just say we've got fast API

7:58

here, and then we've got

8:02

a worker, and then we have a database,

8:05

which was DynamoDB. Oh, that's annoying.

8:09

And we would have a client making

8:11

requests. That's why that's a fast API.

8:14

The client would say, "Hey, please

8:16

provision something for me." And the web

8:19

worker wouldn't do it itself. It would

8:21

actually send that over SQS. It would

8:25

drop the task details into SQS.

8:28

And the worker would then handle that.

8:31

So, what does a provisioning task

8:33

actually look like?

8:35

It's something like creating

8:38

DNS records somewhere, maybe creating a

8:42

CloudFront distribution, maybe creating

8:47

some API calls. And this would be the

8:51

provisioning task that the worker would

8:53

do asynchronously, while the web and

8:57

client would wait for it to be completed

8:59

essentially. So, the client's polling

9:01

continuously to say, "Is it ready? Is it

9:03

ready?" And when it is completed, the

9:06

worker writes it to the database, the

9:08

web server checks the status, and then

9:10

responds saying, "Yes, it's finished."

9:12

Or it'll say that something went wrong

9:15

and there was an error.

9:16

So, then we can sort of encapsulate this

9:18

as the open service broker that I built.

9:23

Pretty straightforward.

9:25

Um to be honest, there's not much more

9:27

to this, but we're going to go and talk

9:30

about some of the more complicated bits

9:33

in just a second, and I will directly

9:35

link to this as well. So, we got this

9:37

client requesting

9:39

uh let's say "Please provision a load

9:41

balancing." And that is essentially what

9:43

they were asking for was some kind of

9:46

load balancing somewhere in the edge

9:48

infrastructure of Atlassian to allow

9:50

traffic to go to their service. So,

9:53

that's a good uh demarcation point to

9:56

start talking about the next thing that

9:58

I sort of built.

9:59

And I built it through necessity

10:02

of Essentially, I began to understand

10:06

and unravel the requirements more as I

10:09

went along. One of the architects had

10:11

this idea to replace the load balancers

10:15

at Atlassian, which were enterprise load

10:18

balancers that had licensing costs, with

10:21

a open-source cloud-native sort of

10:24

commodity proxy. And the tech that we

10:26

chose for that was Envoy proxy. You may

10:29

be familiar with Envoy proxy. If you're

10:31

not, then it's very similar to something

10:33

like Nginx, but perhaps more modern than

10:37

Nginx. Um you can take a look at its,

10:40

you know, uh what's what's great about

10:42

it if you want. You can just read

10:44

through like why why choose Envoy, blah

10:46

blah blah. But essentially, we wanted

10:47

[clears throat] to replace the

10:48

enterprise load balancers we had, make

10:50

them self-service, so that devs

10:52

effectively didn't have to talk to us to

10:55

go set up their load balancing. So,

10:57

Envoy has an API that allows you to

11:02

configure it dynamically. Being able to

11:04

reload the configuration at run time

11:06

means that you can deploy a whole bunch

11:08

of proxies and have them sit there

11:11

running all the time.

11:12

And then when someone needs different

11:14

configuration for their particular uh

11:16

service, then they can push out a change

11:19

through the provisioning task detailed

11:21

here. And those changes should flow to

11:25

the proxy somehow. And so now, that's a

11:27

good time to talk about the Envoy

11:30

management server that I built, which we

11:32

called the Envoy control plane. And this

11:35

was it's essentially quite similar

11:40

uh to this.

11:41

Yet again, we used a Fast API app. But

11:44

this was slightly different actually.

11:46

Let's go into a little bit of detail

11:47

here. I'm just going to wing this

11:49

because I should be able to wing it

11:50

because I know it quite well. Uh this is

11:52

actually a I open sourced this this

11:55

software and I called it Sovereign. You

11:56

can actually go find that on Bitbucket.

11:58

It's it's a public repo at least for

12:01

now. I don't know if that's going to be

12:03

the case always.

12:04

But essentially Sovereign runs a Fast

12:06

API app. And some of the things that it

12:09

takes in as

12:11

uh configuration are templates

12:15

and context. And so the app uh polls

12:20

these. Uh it's obviously got like uh say

12:24

let's just say this is the

12:25

configuration. Okay, let's stick Now,

12:28

so the templates might be particular

12:32

resource types. And in Envoy, you've got

12:35

stuff like clusters, routes, listeners.

12:39

And let's just leave it at that for the

12:40

moment. You'd have these kind of

12:43

templates. And so when this when this

12:47

management server loads up, it'll read

12:49

in these templates in the context and

12:50

make these available as APIs for the uh

12:55

proxies. So then you can imagine let's

12:58

just say we've got uh an Envoy here.

13:02

It is going to request these things and

13:05

Sovereign is going to respond by taking

13:08

the context, putting it into the

13:10

templates, and rendering out different

13:12

content uh as the context changes. Now,

13:15

where does the context come from? Well,

13:17

this is part of this management server

13:19

that is dynamic. So well, let's just uh

13:22

let's just do a bit of flip around here.

13:26

Put the context

13:27

the context actually comes from this

13:30

database, but we are requesting it from

13:34

the broker. So, the we're we're

13:36

requesting data from the broker and

13:38

other sources, in fact. Let's just add

13:40

another source here. Let's just say we

13:42

have a little S3 bucket with some data,

13:45

and maybe that data is changing over

13:46

time. So, we take that data, it's

13:48

dynamic, we feed that into the

13:49

templates. The The templates have logic

13:53

that spits out particular Envoy

13:56

configuration, and then the proxy

13:57

changes over time. So, what happens is

14:00

we've got a client that's making a

14:02

provisioning requests to our broker. The

14:04

worker is doing some provisioning tasks,

14:08

and then writing the new data to the

14:10

database. Then the the management

14:13

server, let's say. Stop this. Let's

14:15

encapsulate this a little bit. The

14:16

management server is then polling that

14:18

data from various places and generating

14:21

new configuration. That configuration

14:23

hits the proxy, and then it starts doing

14:26

different stuff. That is essentially the

14:28

second part of what I built. So, we've

14:30

got a broker, we've got a management

14:31

server, we've got the client, we've got

14:33

the proxy. Uh why did this detach?

14:35

Anyway, so now we've sort of figured

14:37

this stuff out.

14:39

This is all at a very high level.

14:41

>> [clears throat]

14:41

>> So, we've got this created. Now we can

14:44

sort of think more about more

14:47

infrastructure type things. We've got

14:49

this proxy, but how do we end up with

14:51

this proxy? How does that actually get

14:53

provisioned? What is it? Where does it

14:55

live? Well, let's start with one thing,

14:58

which is that these proxies,

15:02

there's many many many of them, as you

15:04

would expect, and they are provisioned

15:06

by

15:07

um

15:08

they are provisioned by a CloudFormation

15:11

template. This is an AWS thing that

15:13

allows you to essentially do

15:15

infrastructure as code, and it allows

15:17

you to create resources in in AWS that

15:21

you would normally create via the

15:23

console if you were just uh uh let's say

15:26

uh basic user. So, what kind of stuff do

15:29

we create in here? Well, if we were to

15:31

do this stuff from scratch, we'd

15:33

probably have like a VPC and then we'd

15:36

have uh you know, a subnet inside that

15:40

VPC and maybe we'd have

15:43

an internet gateway, maybe we'd have

15:47

uh

15:47

we'd all security group, maybe we'd have

15:50

a key pair, maybe an IAM role.

15:54

Um oh, of course we need to have the

15:56

auto scaling group.

15:58

Of course, that's what's going to be

16:01

creating these

16:03

EC2 instances.

16:05

And well, the auto scaling group needs

16:10

an AMI, doesn't it? Well, indeed it does

16:12

need an AMI. IAM role has to be attached

16:15

to

16:16

uh must be attached to all these. The

16:18

key pair goes on on the these. Security

16:21

group is attached to to the

16:24

this Well, it's probably attached to the

16:26

auto scaling group to be fair. Well, the

16:28

EC2 instances would inherit it from the

16:30

from the ASG, blah blah blah. So, we've

16:33

got all these like uh blocks of

16:35

resources and stuff like that. Cool.

16:37

Let's let's put these up put these up

16:38

together, blah blah blah. Cool. So,

16:41

yeah, we've kind of got like a little

16:42

template going on here and it's creating

16:44

these proxies in many different regions.

16:48

Uh we might have we might have like

16:50

uh an NLB in here, a layer four proxy.

16:54

Maybe we'd have uh bit of maybe a bit of

16:57

ACM. Of course, these acronyms might

16:59

mean nothing to some people, but for

17:01

people that have used AWS, they would

17:03

know what these things are. And they

17:04

know it's not really that complicated.

17:06

It's like pretty basic building blocks

17:08

and this is what we created

17:11

uh

17:12

say 2,000 proxies,

17:14

uh something like 13 regions, blah blah

17:16

blah. Um and we also had a little bit of

17:19

route 53 records for other stuff. Now,

17:23

the AMI, it's not really provisioned by

17:26

the the template. It's more like it's

17:28

referenced by the template, isn't it?

17:30

So, that would bring us on to the next

17:32

piece of this thing that I built, which

17:34

is, well, we need to produce an AMI. We

17:37

need to produce a standard image for

17:39

these proxies, and it's going to include

17:41

all the important stuff in there. So,

17:43

how do we create this image for the

17:45

proxy? Well, in this case, we had uh

17:48

repository that was using HashiCorp

17:50

Packer, and

17:52

uh we had um

17:55

a Salt Stack

17:56

uh let's call it configuration.

17:58

And so, we would use Packer to um let's

18:02

say we'd have the EC Oh, we'd use the

18:05

EC2 provisioner. And so, we would create

18:08

an EC2 in like a dev account. We'd then

18:11

upload all of our Salt Stack

18:12

configuration. Salt Stack, by the way,

18:14

is very similar to Puppet, Ansible, and

18:16

Chef, in case you're not aware of what

18:18

those are. It is configuration

18:19

management tools, and that's a fancy way

18:22

of saying that I want to run I want to

18:25

install packages, put files, and run

18:28

services on a machine in a particular

18:30

way, in a particular order, and it

18:32

automates that process, makes that

18:34

process declarative for you. Well, not

18:37

for you, but it helps you to to make it

18:38

declarative. So, we created a little um

18:41

created a little EC2 live running EC2

18:44

here. We dump the config on there, we do

18:47

a provisioning step, and then we take

18:49

the

18:50

uh essentially turn this into an image,

18:52

like shut it down, uh whatever, snapshot

18:54

it, and turn it into an image. So, that

18:56

essentially would just produce this uh

18:58

AMI. Now, what was included in here?

19:00

Let's Let's say we can just uh we can

19:02

include a few things here. Let's just

19:04

say we had um we had states for for

19:07

Envoy. So, like let's say install,

19:10

configure,

19:11

uh let's say just install and configure

19:13

Envoy Uh

19:15

logging agents, security,

19:18

let's say slash hardening, network

19:19

tuning,

19:21

containers,

19:23

tracing. Oh, let's just say let's just

19:25

say observability agent there. And that

19:28

can cover I can cover logging, tracing,

19:30

metrics. So, that's essentially what's

19:32

going on here. Produces the AMI,

19:35

CloudFormation template takes this AMI

19:37

provisions these EC2s

19:40

EC2. And they're running with all this

19:42

stuff. And then when they when they get

19:44

provisioned, that's something that we

19:46

forgot here. There's parameters.

19:48

Parameters, bump that up, bump this up,

19:52

make it neat. So, we've got the

19:53

parameters and these at

19:57

runtime would pass in secrets and keys

20:00

and blah blah blah. And then these these

20:03

proxies would grab the resources, be

20:05

configured, and then they would be

20:07

running and accepting traffic. Boom,

20:09

that's it. Everything's done, working.

20:11

This was essentially the first two years

20:14

of working at Lyft. So, now when a

20:16

developer says, "I want to run my

20:19

service and I want it to be

20:21

I want it to be accessible on the

20:22

internet with all the fancy bells and

20:24

whistles and routing and advanced

20:25

stuff." We'd say, "Yes, no problem. Let

20:28

me just get that provisioned for you."

20:30

We

20:31

send off the provisioning task, we write

20:33

something to the database, we tell them

20:34

it's ready, then the management server,

20:37

it's the

20:39

broker says, "What's the current state

20:42

of things?" Takes that data, plus other

20:44

data, puts it into the templates,

20:46

creates resources out of those

20:48

templates, gives them to Envoy on when

20:50

it requests them. And this was all

20:53

pre-provisioned.

20:55

That's long-lived infrastructure with

20:57

CloudFormation. And the CloudFormation

20:58

is relying on an AMI that it can use to

21:01

provision those images, those machines.

21:03

So, yeah, that is probably the first 24

21:06

months. So, what was next after this?

21:08

So, this was the foundation of our team,

21:10

essentially the product that we were

21:12

going forward with, um which is uh

21:14

centralized load balancing managed by

21:17

our team, and all of the features that

21:20

we provided to our customers would live

21:22

in logic defined in these templates.

21:25

We've now laid the foundation for the

21:27

team. We've got proxy infrastructure

21:29

that's reacting dynamically to services

21:32

that are being deployed with different

21:34

configurations over time. What was next

21:36

after that point? The big thing after

21:38

that was taking some of the larger

21:40

products and making it possible for them

21:43

to use this platform component. That was

21:46

one big part, and the second big part

21:48

was migrating all of the microservices

21:51

within Atlassian to use this. And that

21:53

was relatively easier because we could

21:56

enforce that through the platform.

21:58

Essentially, what that means is that the

22:00

platform was previously providing very

22:03

basic load balancing to every service.

22:05

And they forced a switch to where you

22:08

could no longer expose your service

22:10

publicly through their load balancer,

22:12

which is too basic, and you had to go

22:14

through our centralized load balancing

22:16

infrastructure and to explicitly

22:18

configure it as a way of signaling your

22:21

intention for that service to be

22:23

publicly accessible. Whereas previously,

22:26

it could have just been maybe accidental

22:28

that your service was public and not

22:30

very well protected. So, that was the

22:32

big major push. We got products like

22:35

Jira, Confluence, Bitbucket, Status

22:38

Page, and many others behind this edge

22:41

infrastructure. And then, what was after

22:43

that? Well, now we can sort of talk more

22:45

about Let's say we can talk more about

22:48

the uh the Envoy-based product that we

22:51

had here. So, this particular thing,

22:54

we've got this groundwork of being able

22:57

to take basic inputs from a a developer

23:00

and to turn that into templated

23:03

configuration. Now, Envoy has a lot of

23:06

configuration. It has a lot of stuff you

23:09

can configure. Let's just look at the

23:11

routes, for example. Let's look at the

23:13

virtual host, for example. You can

23:15

configure what domains to accept traffic

23:17

on. Pretty basic. You can do routing.

23:19

Sort of basic, but once you delve into

23:21

how you can do this, it gets pretty

23:23

complicated pretty quickly. You can

23:25

match on different things. You can route

23:27

it in different ways. You can do direct

23:29

responses. do redirects, blah, blah,

23:31

blah. You can add and remove headers. I

23:34

guess I guess you could say that's

23:35

pretty standard, but you can also

23:37

choose, for example, when you're

23:39

configuring a route action, you can also

23:40

choose to send to any cluster that's on

23:44

the proxy. So, then if I have a thousand

23:47

devs, or a thousand services, and they

23:50

each have their own cluster, and any

23:52

route can send to any cluster, well, it

23:54

sort of brings up this point of well,

23:56

this

23:57

data here needs to be validating that

24:01

and abstracting that, and so on and so

24:05

forth. So, there was definitely a

24:07

concentration of a lot of the

24:09

development work around this logic here,

24:11

making sure it was validated here in

24:13

terms of the parameters were validated

24:16

such that when those parameters were run

24:18

through the logic in these in these

24:21

resources, that it would produce valid

24:23

resources. Pretty standard, I suppose

24:25

you could say. I don't know. Maybe I do

24:27

feel like I have the curse of knowledge.

24:29

Um and that this stuff seems easier to

24:32

me now because I I've done so much with

24:34

it. Uh but there's a lot. There's a lot

24:36

in here. And if we go into, uh for

24:38

example, extensions, there's a lot of

24:40

extensions that can be applied to a

24:43

listener or a or a cluster. For example,

24:46

you might have, uh where is it? You've

24:48

got, uh network filters here. You've got

24:51

all kinds of network filters. And a big

24:53

one that we obviously used was a HTTP

24:56

connection manager, where you could

24:58

configure routing and how to handle

25:01

proxies and web sockets and all this

25:04

stuff. And then, if we go a little bit

25:06

before that, there's also things like

25:08

external processing and external

25:11

authorization. And this sort of brings

25:13

us to Oh, let's say something that

25:16

happened next. So, I did briefly mention

25:18

that some of the big parts after

25:19

building this was to migrate big

25:21

products onto Let's assume that's all

25:23

finished. It took It took some time. It

25:25

took a couple years because there were

25:27

many features that needed to be built

25:29

out here and and wherever else in order

25:32

to support the larger products and their

25:34

special cases to work on

25:37

what's effectively a generic

25:39

multi-tenanted platform. So, let's just

25:41

assume that they're all migrated. Then,

25:42

we have more features that we want to

25:45

add. I did sort of allude to like we

25:47

have this We have this groundwork. We

25:49

have this dynamic configuration. What

25:52

I'm trying to say is that we we created

25:54

opportunity. We created opportunity to

25:57

centralize logic and to handle concerns

26:01

early in the chain of requests. What I

26:04

mean by that is a customer, let's just

26:07

make a smiley customer. And customer is

26:10

someone that's using our cloud products

26:12

or Atlassian cloud products. They are

26:14

hitting the

26:16

Let's just say they're hitting an NLB

26:17

first and that's then being proxied to

26:19

these boys. Yes? If we can deal with the

26:22

problems here before they reach a

26:24

service,

26:26

let's say and let's give it a square.

26:28

Let's call it a back-end service, you

26:30

know. So, the requests are flowing in

26:31

from the customer to the proxies and to

26:34

the Pretty standard stuff. If we can

26:35

deal with certain concerns here before

26:39

it reaches here, we save a lot of time,

26:41

we save some money, which is and it

26:44

saves the customer time. It's great for

26:45

everyone, really. Um

26:47

and one of those things Now, this is

26:49

where the the diagram becomes

26:51

complicated, so let's move off to the

26:53

side. Let's just copy a few of these.

26:55

Let's grab

26:56

three things

26:58

move over to the side. We've got the

26:59

customer talking to the proxy and the

27:01

proxy is talking to back end. Of course,

27:04

the request comes back up and back out

27:06

to the customer. Fine and dandy. Yes,

27:08

this is a this is a proxy. Whatever,

27:10

there's no surprises here. Now, without

27:12

with with the products that Elysium

27:15

runs, there's all kinds of stuff that

27:17

needs to happen like authentication for

27:19

example or authorization or

27:23

DDoS protection or rate limiting or

27:26

access logs. All this kinds of stuff

27:28

that needs to happen and it's just turns

27:31

out that we can deal with them here

27:32

instead of on a bazillion bazillion back

27:37

end services. Just imagine there are a

27:40

bazillion bazillion of these. Just

27:43

zillions upon zillions. See Daisy. Just

27:47

zillions and zillions. They're like

27:48

gazillion. Now, can you imagine if a

27:50

thousand dev teams needed to deal with

27:53

all this stuff plus more on their own

27:56

service? It would be a tremendous waste

27:59

of money for the company. It would slow

28:01

down features. The customer wouldn't get

28:03

their features when they need them and

28:05

stuff is already hard enough to deliver

28:07

as it is. Thus, the platform and

28:10

centralized management of resources and

28:12

centralized

28:13

implementation of these features. So,

28:16

how how were some of these things

28:17

implemented? Well, DDoS protection was

28:19

really provided by

28:22

CloudFront. That was

28:24

that was

28:25

spearheaded by a colleague of mine who

28:27

is very smart and conscientious. And

28:30

essentially, let's make this a bit more

28:33

accurate. Let's just say let's get rid

28:34

of these. There's an NLB here. Oh, blah

28:37

blah blah blah blah. And of course, it's

28:39

two-way. So, that's one way that we can

28:41

take care of that concern for these back

28:43

end services. Great, we've solved solved

28:45

the concern for that. Fantastic. These

28:47

others, well, access logs, what we can

28:50

do is something like we can use these

28:53

network filters. Yes, we use the network

28:55

filters. For example, in the HCM, we

28:58

have

29:03

Where are the access logs? Access log.

29:05

Now, remember, all of this configuration

29:08

is dynamic. It's all dynamic and it is

29:11

created by templates which abstract away

29:15

the resource configuration from the

29:17

developer who wants to configure it.

29:19

They provide simple parameters. Those

29:21

parameters are then validated and then

29:22

they are fed into the template as

29:24

context so that we produce the correct

29:27

template. That means that

29:29

they send us a little bit of JSON and we

29:32

set up this whole thing for them with

29:34

all the access logging and blah blah

29:36

blah, whatever. So, that is done, in

29:38

fact, inside the proxy, natively.

29:41

Fantastic. Some of these things,

29:42

however, a little bit more complicated.

29:44

These things, we need to use a sidecar

29:47

model where Envoy is talking out the

29:48

side and then these are their own

29:51

services running locally on the on the

29:53

proxy. So, these would be like

29:56

containers, essentially. We've got this

29:58

sidecar model and those sidecars, some

30:01

of them were contributed by other teams

30:03

and some of them were created by me and

30:05

our team. The authentication and the

30:07

authentication sidecar was created by

30:09

me, written, of course, in the Lord's

30:12

language, Rust. Authorization was done

30:15

by another team and rate limiting was

30:16

done by another team. And so, they were

30:18

able to contribute these sidecars,

30:20

which, by the way, were set up and

30:23

were downloaded and configured onto the

30:26

AMI by this provisioning AMI

30:28

provisioning flow. Great. So, now we

30:31

have a programmable proxy with sidecars

30:35

that have their own separate logic from

30:37

the proxy and they, too, can actually

30:40

receive configuration, which is dynamic

30:43

over the wire locally and

30:46

and make it even more program. So, we're

30:48

solving all these concerns before they

30:50

hit these

30:51

these back ends and in very very little

30:54

time. So, that was

30:55

essentially that is some of the stuff I

30:58

worked on after migrations and blah blah

31:00

blah. Yay. What I do after that? With

31:03

this big blob rid of this mess. So, then

31:05

we had some non-technical requirements

31:08

come through. More compliance

31:11

and things like that. And that effort

31:13

was very tedious and boring for me

31:15

personally. It didn't involve building

31:17

new stuff. It involved taking all of

31:19

this, making sure that it was compliance

31:22

for in certain ways. Very bored boring

31:26

checklist ticking work. Blah blah blah.

31:29

So, I said earlier that I would also go

31:30

over some of the non-technical things

31:32

that I had to go through while working

31:34

at Atlassian. Obviously, all of that

31:35

stuff is sort of high-level technical

31:37

stuff that I just showed. What was some

31:40

of the other stuff that I went through

31:42

during my eight-year slog at Atlassian?

31:45

The first few things to come that come

31:47

to mind is that I have grown

31:50

tremendously in my diplomacy skills,

31:54

conflict avoidance, probably conflict

31:57

resolution as well. Being able to

31:59

persuade, propose ideas, being able to

32:02

teach, educate, and mentor. These are

32:05

the non-technical things that you

32:06

probably don't hear a lot about. But

32:09

after Another thing is that the ability

32:11

to maintain things, maintain software

32:15

and systems, to see where the cracks

32:17

show up and to build things so that

32:21

those cracks don't show up as or at

32:23

least to make them show up late as

32:25

possible. That's definitely something

32:26

that I picked up. Let's just talk about

32:28

that maintenance for a sec. I noticed

32:30

over the eight years that I was there,

32:32

when I built these apps, these

32:35

services, that there's obviously that at

32:37

the very start there's the requirement

32:40

to onboard people and write

32:41

documentation and train people so that

32:44

they understand how things work, know

32:46

how to contribute to them, and debug

32:48

them. So that when they become when they

32:50

go on call, they know where to look,

32:52

what could go wrong, where do things

32:54

break essentially. So, you know, that's

32:57

whether that's knowing what kind of what

33:00

particular log messages mean, what sort

33:02

of metrics to check when something is

33:05

going wrong and what those metrics could

33:07

allude to, how to resolve those

33:10

um you know, particular expected

33:12

problems if they're not automated away.

33:14

Um and this could be like, you know,

33:16

Amazon could have an outage and the

33:18

database isn't access for example. What

33:20

do you do in that case? What if SQS

33:22

stops working and you can't do any

33:24

provisioning tasks? What how what impact

33:26

does that have on the services that need

33:28

to provision the resources? And how do

33:29

you resolve um

33:31

What happens if an if a proxy receives

33:33

bad configuration? What if it receives

33:35

configuration that's valid, but that

33:37

destroys the traffic that's flowing

33:39

through? How do you pick up on those?

33:41

What do you check, etc. etc. So there's

33:43

obviously a lot of that at the start

33:45

when you build something. There's a lot

33:46

of that at the start. But the thing

33:48

that's more difficult is over time

33:50

people come and go. People get hired.

33:53

People get people leave for other jobs

33:55

and whatnot. And so you get you have to

33:57

do that onboarding again obviously. But

33:59

you should have more people that are

34:01

able to do that onboarding collectively.

34:02

But then you sort of bring in new

34:04

opinions. People look at an existing

34:06

codebase and they want to change things.

34:08

They want to make it better, and so on

34:10

and so forth. And so they do that. And

34:12

change ends There's I I suppose there's

34:15

this concept of churn in the codebase.

34:18

The area that churns, it's sort of it

34:20

becomes predictable where all the churn

34:22

is going to be

34:24

at a certain stage. And once you notice

34:26

that there is some churn, it's sort of a

34:29

a smell. It is it's an indication that

34:32

that part of the service or project is

34:35

going to keep increasing in size or

34:38

complexity. And something there needs to

34:41

happen. Something needs to be done to

34:43

avoid that mess. It's just just how

34:45

software goes, I suppose. It'll be

34:47

interesting with all these

34:49

vibe coded apps and AI assisted apps to

34:52

see how we handle that. When we have

34:54

people that are not really familiar with

34:56

what they've created, and the

34:58

maintenance burdens appear. They don't

35:00

appear at the beginning. There's just

35:01

not enough going through. It hasn't been

35:03

around for long enough. There hasn't

35:05

been enough changes. Building something

35:06

is easy. Changing it and making sure

35:08

that it you can still change it over

35:09

time is difficult. Because as you change

35:12

things, it slowly becomes harder to

35:14

change. Things start to get coupled, and

35:16

all of a sudden when you change

35:18

something in one area, it affects

35:19

another, and you have to deal with the

35:20

task of detangling something. And you

35:22

might be able to find these areas quite

35:24

quickly, get an LLM to perform the

35:26

detangling for you. I think that's If we

35:28

can do that, that's fantastic. But I

35:30

don't want to be too optimistic just in

35:32

case. So, there's that on That's my

35:35

opinion on the maintenance side of

35:36

things. The next thing I want to talk

35:38

about is when I mentioned diplomacy,

35:41

what I'm really trying to say is that I

35:43

was exposed to different types of

35:46

managers and colleagues over time. And

35:48

everyone has different personalities and

35:50

styles of working. And because I was

35:52

exposed to so many different types, I

35:54

experienced conflicts with certain

35:56

people. And even though I had conflicts,

35:59

there's still people that I respect.

36:00

It's just something that happens when

36:03

you when your personality doesn't mix

36:05

with their personality. And that's just

36:06

something that's a bit inevitable. And I

36:08

think that the only thing you can really

36:10

do in those situations is to try to have

36:13

the self-awareness and the awareness of

36:15

the other person and the, I suppose,

36:18

understanding of psychology and and how

36:20

people work to an extent, so that you

36:23

can be responsible for that difference

36:25

and the potential for conflict, and to

36:28

handle it effective to to anticipate the

36:31

conflict that's going to arise and and

36:33

to do something to make the relationship

36:36

work. And maybe it's impossible. I don't

36:37

know. But that was definitely a source

36:39

of great stress and at [clears throat]

36:41

times it affected my performance. And so

36:45

I do think that because it affected my

36:48

performance that I took it quite

36:50

seriously and I learned and changed as a

36:53

result. So the next time that those

36:55

situations come around, I do firmly

36:58

believe that I'll be able to handle them

37:00

quite a lot better. And then some of the

37:02

other things, in fact one of the things

37:04

that I found quite challenging was

37:08

mentoring. And so I find it easy to help

37:11

people to point out areas where they

37:15

need understanding and to deliver that

37:18

understanding to them, to break down

37:20

complex things into simple terms so that

37:23

they can build a mental model of the

37:26

system that they're working on. I have

37:27

that ability. I'm quite good at that.

37:29

But mentoring is distinct from that. I

37:31

had an intern in the last year and I

37:34

will first say that the result of their

37:36

internship was that they got the highest

37:38

rating possible and it essentially

37:41

guarantees

37:42

an offer to work at a last year. The

37:44

project that they worked on was very

37:46

impressive and how they approached it

37:49

and and built it was very impressive.

37:51

And so that's why they got that

37:52

excellent rating. What I found

37:54

personally difficult was striking the

37:57

balance between It was essentially

38:00

striking the balance between how much

38:02

time I give to the mentee and what that

38:06

time would consist of, whether it's, you

38:08

know, I didn't I don't want to give them

38:09

answers to problems, but I don't want

38:11

them to get so stuck that they become

38:13

frustrated. I have no idea if I reached

38:16

that balance, but I suppose the results

38:18

speak for themselves. I I but I I don't

38:20

I don't I'm not sure if I can attribute

38:22

the results to me necessarily. The

38:24

intern was helped by some of my other

38:26

colleagues when in areas that I'm much

38:28

weaker in. So, they effectively got

38:31

subject matter experts in a few

38:34

different areas to contribute to their

38:36

success. But then they did the majority

38:38

of the legwork to actually build the

38:41

thing and to test it and to make design

38:44

decisions and stuff like that. And it

38:45

was very successful. But I still have

38:46

this lingering impression of feeling

38:48

that mentoring is difficult for me and

38:51

that I I don't have um a good way of um

38:55

figuring that out because I've never

38:56

been mentored myself. So, I don't really

38:58

know what to expect and what they do.

39:01

But I want to emphasize that that's a

39:03

very specific type of mentoring that I'm

39:05

not too sure about. Whereas training my

39:07

colleagues, getting them to understand,

39:09

working, you know, working through

39:11

problems with my colleagues, that was

39:12

that was essentially my bread and butter

39:14

during the last half of my employment.

39:16

You know, jumping on uh uh call and

39:19

going through stuff. Feedback that I got

39:21

from my colleagues all the time was that

39:22

I was always available to help and that

39:24

I could boil down hard topics into

39:26

something that was understandable, which

39:29

I'm pretty proud of. And I've been

39:31

yapping for a while. I think that covers

39:32

a quite a lot. If I remember more, I'll

39:34

probably just make a second video. Um

39:36

maybe maybe if people are interested, I

39:38

could actually go through and build some

39:40

of these things. I could actually go

39:41

through and build some of these things

39:42

from scratch on stream or just a video

39:45

uploaded to kind of show, I guess,

39:47

essentially what I made and maybe

39:48

recreate and sharpen my skills a little

39:50

bit more. Um maybe. I've got a lot of

39:52

stuff on my to-do list, so maybe maybe

39:54

not. It depends on the demand. Anyway,

39:55

I'm going to cut the video from here. If

39:57

you listened all the way through or to

39:59

portions, then thank you very much. I

40:01

hope it was interesting and enlightening

40:03

and whatever else. I'll catch you

40:05

around.

Interactive Summary

A former Atlassian software engineer reflects on their eight-year tenure, detailing the technical architecture they built for centralized load balancing using Envoy proxy, and sharing personal insights into professional growth, conflict resolution, and the challenges of mentoring.

Suggested questions

3 ready-made prompts