И37: А. М. Бородин | PostgreSQL | Yandex Cloud | Open source in Russia
1092 segments
Andrey, hi!
Hi!
Thanks for coming. We will talk about programming, open source, and PostgreSQL.
And right away a question, which is correct, PostgreSQL or Postgres?
Now a short historical note for 30 seconds.
I try to say specifically "Postgre" or "PostgreSQL" to annoy the "purists."
In general, engineers speak in a way that is understandable to each other.
That is, on one hand, terms should not be mixed.
A bump and a figobina, when you design a system, they should be different.
Everyone who installs and debugs what you do should understand you.
On the other hand, if you are understood clearly, without misinterpretations, say whatever you want.
So "Postgre," "Postgres," "PostgreSQL," whatever.
For some reason, I say "PostgreSQL."
Me too. When I know that I'm being listened to by not just "purists," I alternate saying "PostgreSQL," "PostgreSQL," "PostgreSQL," changing the stress back and forth.
Idiomatically, yes, "PostgreSQL" or "Postgres."
Well, like "Post-ingress."
How long have you been in this system, in this project?
Well, I wrote my first patches in 2016.
Right. I became a user in 2007, somewhere around that time.
So, less than 20 years, actually.
I became a user before you, but I never became a committer.
A committer is not someone who wrote patches. I also never became a committer.
I switched from MySQL to PostgreSQL a long time ago, and I don't regret it.
Tell me, what is better in MySQL? We recently discussed and argued about it in the office and couldn't come to a conclusion.
For some reason, I have the feeling that MySQL is not really open source, and PostgreSQL surpassed it precisely because it was open source.
Look, I again adhere to an engineering point of view.
Any tools that solve your tasks.
The only thing you, as an engineer, should understand is that some of your tasks go beyond the horizon of decades.
Or go beyond the horizon of petabytes.
Then you need to think.
Well, my advice somewhere in the video - it’s not really an argument.
And from the point of view of what to use for the system that I am talking about today,
use everything! Try both SQLite and MySQL. Both are great databases. But I like MySQL a little more architecturally.
PostgreSQL has better tools and is truly a free database where you can really make a contribution and provide utility.
Well, bringing a patch to MySQL will be difficult now, and I wanted to develop databases. Well, without working at Oracle.
Why is it difficult?
Well, for starters, I'm just less familiar with the process. Moreover, I will have to negotiate with the team,
which has its own vision and seems to have more limited resources right now.
Listen, your patches are accepted through mailing lists, if I'm not mistaken.
And it seems to me that this is some kind of prehistoric process, long gone into oblivion, and you still work this way.
Great word, in short, "prehistoric." The point is that the development of PostgreSQL took shape before Git appeared.
Before we started keeping history the way it is kept now, we had already written PostgreSQL,
which I actually worked well with and developed well.
Yes, these are not the best tools we have for development now, but almost never are tools the problem.
That is, I like it, I just boiled like a frog in this mailing list.
When you read it every day, then you get stuck in Bitbucket and think, how is all this so messy, wrong, in short.
You need to write a letter, you need to attach a patch or a series of patches.
I spent half a day yesterday reading your mailing list PgSqlHackers, and it was a complete nightmare.
I liked it.
Well, how did I like it? These are emails, you need to unfold threads, you need to search for where someone said something.
It's all unformatted, all in plain text, nothing is highlighted, syntax is not highlighted.
I felt like I was in FidoNet. That's how I felt.
Listen, well, letters of different colors are, of course, a very, very useful technology, I agree in terms of code.
And when I apply a patch, I look at it in an interface that highlights letters in different colors.
We haven't learned to highlight human thoughts yet, to be honest.
It would be cool, but the most interesting thing is the commit message and the letter.
Code is some kind of trivial thing that any person can do; it needs to be meaningful.
And from the point of view of interest, you even know, just to learn English.
In the first commit message of my patch, Tom wrote something like "wack it by me."
I had never encountered such a verb before, "wack it." What does it mean?
I went to figure it out, and it wasn't in the dictionary. I started asking around.
Yes, this is an example of how other people think, how other people write their thoughts. It's interesting.
Well, again, it seems to me, I am the boiled frog.
Listen, you said you apply patches. So you receive a letter from the mailing list.
And then what happens with the file that is attached, with the patch?
Well, you go into the Git repository, choose, like, the point that you will apply in history.
Well, if you are applying a fresh patch, you just write git am, apply mail.
And it is applied on top of the commit. You push it to your GitHub, look at it there, or look locally, well, locally in the console.
I can show you if you want. I have plenty of PostgreSQL patches applied scattered around.
Mostly it's console development.
So, wait, you applied the patch in your local Git, you made a commit, and then where do you push it?
Well, to GitHub.
To yourself. And how does it get into the main repository?
A committer at some point, when we all agree that, well, it's okay, it seems we haven't broken anything.
We definitely broke something, but that's another story.
Will do a Git push to Git PostgreSQL org.
And how do you know if you broke something or not? There must be some Git CI, right?
Oh, yes, we have CI. We have tons of different CIs.
Well, first of all, we have a basic CI on GitHub that does Make Check World on five platforms.
All PostgreSQL tests run in 8 minutes.
On different build systems, on different operating systems, there, in several configurations.
But it's not very complicated.
Well, that is usually the author of the patch or the reviewer did it, he looked that this CI does not fail.
And after that, we have a Build Farm.
This is a system where any contributor who wants can register their build agent
on their operating system, on which they set up the build.
And there, different configurations will be checked.
Well, for example, you can say that you want to test without caches.
Cache-clobber, which, like, resets the caches at any convenient opportunity.
So that you can check that you can still warm them up.
Or, for example, how Shared Invalidation works differently.
Or works differently.
Or additional extra tests are run, which can allocate a bit more memory.
There.
And they often ask the question, like, what Linux do you support?
And the short answer is, like, we support those operating system platforms that are connected to the Build Farm.
Want your TUMBA UMBA Linux to be supported?
Make a Build Farm agent, it's called Animallot.
Connect it to the Build Farm, and from time to time, new commits are pushed to you, you run them.
And if something breaks on your platform, it will most likely be fixed.
Well, it varies, of course, but most likely.
There.
I personally broke the Dragonfly Brewery BSD kernel with tests.
I wrote a test that passes everywhere, but does not pass on Dragonfly Brewery BSD.
We wrote to the maintainer of that kernel, and they were like, oops, indeed.
They fixed something in their little program.
We rebuilt the Build Agents that were doing it on Dragonfly.
We contacted the owner of that Build Farm, Animallot.
He updated the operating system kernel, and the tests went smoothly.
There are currently, I think, several hundred different configurations being tested.
We support everything that is connected to the Build Farm.
Well, and actually, the job of a committer is that when they push someone else's patch,
they check if the Build Farm is turning red.
There is also a tail of underground issues.
We have a wiki page where we investigate, like, there was such a crash,
and we need to understand if this crash is related to cosmic rays or if the stars aligned in such a way
that the computer broke.
Or if it was our program that broke it.
There are about fifty unresolved cases over the last few years.
But does "broke" mean "the master broke" or "something broke within the branch"?
I'm still trying to understand. You somehow...
We test five stable branches and all upgrades between them.
So, for example, I sent... Let's go through the whole cycle.
I sent a patch to the mailing list, decided to change one letter.
You, for example, are the reviewer, you looked at it visually, and it seems to you that everything is okay.
But you don't run the whole build on your computer, or will you run it?
I do. A full check world on this Mac, this is a Macbook Rear,
will also take about ten minutes for all the tests.
But if you don't do this, it could very well be your mistake, and no one will notice.
If I don't do it, it will affect me in some way, if in the worst case, people will think poorly of me.
So, okay. Let's say you forgot to do it.
I might not have been there at all. The committer just came and decided they could push.
Ultimately, the committer is responsible for... Everything, basically. The committer is extremely...
The committer is not you. The committer is someone like me.
- A key committer. There are a few committers there? - Thirty people right now.
I want to become a committer someday.
The reason we are meeting here now is that I was added to the list of major contributors.
It's not important. But it's like a side of honor, right?
And it's not really that important to me.
But I want responsibilities, basically.
To become a committer.
Yes, responsibilities, not just for committers.
So, pizza people can actually push to master?
Yes. Well, not just to master, to all branches.
Nightmare. But that's a huge probability of error. Each of them is a person.
It's not a probability of error, it's a guarantee of error.
I mean, when your patch is pushed, you know that something will break. You just don't know where.
And if you are a responsible author, you watch the BuildFarm, you watch the mailing lists for messages.
The most unpleasant story of mine happened in 2018.
It's actually the story of how I ended up at Yandex.
I mean, it all ended well, as you can see. I haven't been fired yet.
But it happened with the case when we found a bug after the release.
Guys from Mail came to me, Yandex Mail.
I wasn't working at Yandex as a developer then; I was a teacher at SHAD, in the next room.
Basically, I was teaching algorithms to students.
They gather really amazing students who need to be taught well, and you can learn yourself.
But that's not important. Anyway, the guys said, we have a gin index that is somehow used in Mail.
And it sometimes slows down, hangs on locks, where it sounds great as an algorithmic problem.
I will rearrange the locks now, it will be fine. I rearranged the locks, and it became great.
I wrote a scientific article for a journal.
That I rearranged the locks, look how well it works.
It went into release, I committed it. I think I committed it as Fred.
It went into release.
And in... 11.3, I think, three minor releases went by before some guys from Microsoft wrote,
"Hey, guys, we caught a deadlock in your cool system."
They wrote, attached three backtraces when three processes deadlocked.
I was like, oops, indeed, it seems my theoretical proof is not a proof, just letters, basically, in the journal.
Well, we twisted the counts then, but not entirely. We left some useful part of it; there were still some useful optimizations.
But we returned the locking model in the gin index to where it was before 2018.
I still have it on my to-do list to rewrite it properly. I rewrote it, but now I'm scared to change it because something might depend on it somewhere.
The database might stop working, and people might start suffering, which would be unpleasant.
And did you add tests for those three requests?
No.
Then I guess you didn't add them.
No, no, no.
Back then, we didn't have the infrastructure to reproduce that deadlock.
Now we have injection points for complex deadlocks, and we are adding them.
But this started literally in 2023.
Before that, we wrote probabilistic tests for deadlocks.
So, like, in 2021, I spent almost the entire year on a bug that was caught, which reproduced in production once a quarter, one request out of a million requests total.
On an S3 database, there was a good load, and sometimes a chip in the request would hang.
Well, that's according to the investigation, oh, it didn't hang, it was worse, it returned the wrong result. So, we investigated this
since 2019; sometimes it was brought to me once a quarter, at some point they said, "That's it, Andrey,
stop, please fix this for us." The first three months I was just looking at it,
in the computer like "it should work." Then in the summer, I wrote a test that reproduced it on
my laptop in a few hours. I reported this to Hackers, no one could reproduce it. Then I
shortened it to tens of seconds, after I shortened it to tens of seconds, emails started coming in,
saying "yes, I see this on my laptop too." Tens of seconds cannot be committed. I spent several months
shortening the test to 50 milliseconds, and in the form of a 50-millisecond test, it was
accepted. Modern deadlock tests usually take, they are sub-millisecond, and, like, they are
non-probabilistic. You don't have specific load generation; you just, like, you slowed down at the right
place, adjusted another process, slowed it down too, and drove the nail in, basically,
infested it from the side. If needed, I can send you code examples later; it's interesting
to read. Interesting. And I will latch onto the phrase "you can't commit tests that take tens of seconds." Do you have
a ban on long tests? Yes. Well, listen, listen, you have, say, several slots or
thousands of developers in the world who run these tests at every hiccup. You take away 10 seconds of time from each of them,
for fun, like, you had one request that failed out of billions, and now you are taking 10 seconds from each of your
developers. It adds up to some years.
Yes, I am all for it, but we just don't do it, and I haven't heard of such a strict
limitation. Listen, well, if you have a couple of dozen developers, it makes sense to make tests,
that take four hours, maybe a week. There are many databases where tests run for a week. I am in
worked on some of them. It's unpleasant, but it's manageable. But when you scale up,
it hurts. That is, I will send such a test to you in the mailing list, and someone will check it not just for
feasibility, but also for duration and will say it's too long, sorry. Absolutely, of course. I think,
that you will most likely check it yourself. I mean, you will run other tests yourself beforehand. You will feel
the fun of this iterative development is that you can do any nonsense and find out in 30 seconds,
that it's nonsense. Your LLM that writes code, be happy that it doesn't heat up your laptop,
maybe you can send a request like this, get a response, and realize that, oh, everything is broken. Listen,
well, some things cannot be tested in just a few milliseconds. For example, load tests
tests on your own database. I need to add several billion rows to the table. This is
no way will take 50 milliseconds. And why do you need several billion? Well, to check
how it works with one billion records in the table. Does it cache them, does it index them,
well, does it even work with such volumes? That's why I need that amount. In general,
you can use a fairly common trick where you leave a billion empty pages,
not writing them to disk at all, just writing the first billion. I mean, you create edge cases
without actually deploying generated empty data. This happens. So various
edge cases are tested this way, like overflow. You artificially create a state in the system
that is at the boundary. But usually, you want something. So a similar story, for example,
is with a non-committal, actually, test on vacuuming the tree. You have a file that contains pages,
which are linked in a tree structure. You can traverse this tree to search for garbage in
in logical order. That is, descending from the root to the lower level, either DFS or BFS. But if your
database operates on hard drives, that's not great because it's almost random
reading. Sgress reads this file in physical order to search for garbage, but if it detects
a concurrent split, when a page beneath you has split and could have jumped back, it
also jumps back to vacuum, in short, rescans that page and jumps forward.
For this situation to arise, you first need a large index, and second,
you need a load that will cause something to jump before the vacuum. You can emulate this with queries.
If you want to do it manually, it will take you several hours and a normal load.
But you can carefully slow everything down so that it occurs on three pages.
So, you have to spend more time developing the test and be more attentive,
so that they work quickly later, right? Yes, unfortunately or fortunately, we
spend a lot of time ensuring that the tests are... First of all, we try not to create tests
that test something already covered by other tests. Testing the same thing multiple times.
So, they are not redundant; we know they don't test everything. The coverage is good,
but the question is not about coverage. The question is how the stars align so that all the... Edge
cases are considered, the extremes... Well, it's not just about going through all the branches, but achieving different states of the system.
Mostly, these tests come from known bugs, and therefore they are emulated quite effectively.
We have slow tests. Recovery tests are very slow because they... Well, conditionally,
a somewhat resource-intensive approach is also used in the sense that there are tests
that we run on the primary. We record the entire trace that was generated in these tests,
and we repeat all read requests on the replica. This approach is not super common; once we have done all this
and repeated it. There are quite resource-intensive consistency tests for the log,
which, after applying each record, go to the master and take a page,
comparing it binary with the page that was generated on the replica. This is also a long test,
and it is included with PgTestExtra and a special configuration on the BuildFarm agent. So it
is run only by those who want to focus specifically on log development. It
takes more time, and it is executed on some BuildFarm machines. The developer
can run it on their own, but it will take some time to do so. Is the log officially part of
the Postgres ecosystem, or is it still a third-party plugin? The log, write, and headlock are the parts
that are responsible for fault tolerance. This is literally one of the lowest levels,
on which the database is built. In Lg - this is my side project, well, sort of mine. There, first of all, others started
people, and secondly, other people continued. I say "mine" because at some point I was involved in it.
maintainer, I still find people who haven't had the chance to stop working on it. So Lg - this is
a radio backup system, not the most popular, well, sort of like in terms of questions, the second or third in
popularity in the system. It is much simpler. And to commit something there, for example, there are 220
contributors, most of whom are students. Everything there is built around four-hour tests,
not like in Postgres. I'm an ordinary person, normal, there are GitHub pull requests and all that.
How does this discipline not fall apart in Postgres? So many committers, 30 people,
would have long ago given up on speed control and accepted tests as tests, like all normal people.
people. It not only doesn't fall apart, but it also improves. There's a story there that commit messages
are becoming more and more systematized, more and more uniform. Recently there was
a study, someone took the trouble to do a Git log and said that some committers put a period at the end of their commit messages.
at the end of the commit message. And some don't, so you have the first line. The commit message is almost
always now on a page of text. So you have the first line. You can put a period at the end,
you can choose not to put a period. Those who do put it, always do. Those who don't, don't put it.
never. In the last ten years. At the start, there was also student development, all sorts of things.
happened. So, what's the conclusion? The conclusion is that the level of organization among individual committers is increasing,
and it is becoming more difficult to become a committer. The people who become committers go through a bit more
a longer path than people once did in development that wasn't as popular. So, the entry threshold
is continuously increasing, and that seems to be a good thing. So, will they let you become a committer from Russia? What kind of
That's a good question. They probably think you're just putting some kind of Trojan in there right away, don't they? An even better question.
Look, first of all, they let me attend the leadership meeting where questions are discussed,
well, roughly like that, in short. I mean, questions of that kind are also discussed. And the general consensus,
is that, well, the commits are visible, the people who commit something to master, first of all, everyone knows them personally,
and, secondly, well, they kind of risk their careers for the sake of introducing a vulnerability that might not even exist.
never be exploited, which has little chance of surviving until the release, as was the case with Hazagait. Well,
do you know the story with XZUtils, where the community is called Hazagate, when there was a kind of maintainer
spent three years getting involved in XZUtils and then created a backdoor that spread across many servers.
Of course, the community is aware of the possibility of such things. It's not to say that the developers are horrified by it.
there have been some track records of attempts to do such things. We have always been maximally collaborative in our approach,
so let's work together. There is no resistance against you there, right? For example, when you come to the...
leadership meeting, and someone stands up and leaves because a person from Russia has come. Is there anything like that?
Well, you see, just as you said, there isn't any. I mean, I can talk about some caught emotions,
something else. Well, that's nonsense. It practically doesn't exist. There's nothing that could be named. It's just there,
yes. On the other hand, we have a hacker training program. A hacker mentorship program. I actually...
wanted to get into it. They don't take me. I don't know, maybe it will affect something, maybe not. What kind of program is it? It is...
supposed to teach whom? Hackers? Yes, but committers take people whom they teach. There should be a specific...
committer to say that I want this person from those who submitted their application, I will take him.
to teach. Maybe there are just not enough instructions, or maybe there really is some bias, but it...
It's not like there's a community issue, but rather that there just wasn't a specific person who could teach me.
Well, all these people don't earn anything, am I right? Neither the committers nor the maintainers,
it's all for free. All these people work in companies that are willing to fund them and fund them in such a way,
that it's kind of conditional. At Yandex, the question of money is never really on the table, to put it simply. I'll be honest,
earning more than I need and I don't know where to put it. Yandex is not against you engaging in...
progress. Yes, my spouse is in charge of questions about where to spend money because I'm not good at it.
well, sort of. I'm not an interesting candidate, something to think about. So, like, there's Amazon or Microsoft or...
EnterpriseDB is ready to pay people just to keep doing what they do. Good money.
And, well, I mean, you can open any presentation of an Amazon committer, right? Go to levels for...
your interests, and since a person at Amazon is required to use corporate...
template, in which he is required to write his position, which clearly reveals his income.
So, you can understand how much they earn. They make decent money, well, not huge amounts, it's not millions
of dollars a year. At least not multiple millions of dollars a year. Listen, well, this sounds like the life
of dreams or a dream job. You sit at Yandex, receive a salary, and at the same time
contribute to open source, mailing lists, communicate with committers, live a free life.
Every one of our listeners would like to have the same.
But they won't get it tomorrow; they will come to Yandex and say,
"You know, I will receive a salary from you, and at the same time I will
contribute to some Open Source."
How to get there?
I don't know, but I need to somehow answer this question.
Some guys from Open Source Jam came to me.
It's an event that...
You can advertise it, maybe publish it if you want.
It will be in a couple of weeks.
They say we need you to share something career-related.
Damn, I don't know what to talk about.
They say, let's announce you now.
I wrote a question in GPT about what to name my talk, which
I don't even know what it's about.
It says, you know, there are success marathons.
Successful success, Tony Robbins.
He wrote me something about Blinovskaya.
He says, you need to make a talk like that so people
understand that you’re kind of being ironic.
Because no one expects that from you.
So I made a presentation.
I will give a talk called...
From zero to success...
A marathon from zero to success.
Let's think about what to talk about.
I can tell my story.
It's true that individual stories are not reproducible.
That's how the stars aligned.
Let me ask you more specifically.
Are you doing anything for Yandex?
Yes, of course.
Or just for Podgrasa?
I mean, Yandex uses Podgrasa.
Yes, but they can use it without you.
They can take an open-source solution,
They can report bugs there if they have any, and everything will work.
But you do something for them, some local development, right?
No, I don't do local development.
I only do upstream development.
And all my guys, I have 15 engineers,
do 100% open development.
We don't write a single line of code.
Let me clarify.
We write systems.
We create systems, and we publish all of them.
On GitHub or somewhere else.
But there is a control plane part.
We need to deploy our system.
And there are pinned package versions.
Like in LG versions, there is 307...
So, my guys commit to the control plane.
From the perspective of bumping the package.
From the perspective of reverting the package.
From the perspective of writing, like,
maintenance tasks that update this...
In short, there is a small part of development
about getting our packages to our Yandex clients,
which are closed.
Sometimes there are interesting things there.
For example, monitoring.
Because this monitoring of backup integrity
is tailored for the indexing infrastructure.
It doesn't need to be pushed outside.
So, when I say that my guys don't write
a single line of closed code,
Technically, that's not entirely true.
You could nitpick, but in spirit, it is true.
All our systems are accessible to anyone who wishes.
And many clouds are built in their image and likeness.
Well, I mean, many PostgreSQL services are built
in their image and likeness.
Functionally equivalent to an indexed PostgreSQL cluster.
Maybe somewhere they lag behind in some monitoring,
some Rock'n'Sauce, CIS features are not used.
Or, on the contrary, they use things that we are afraid to use.
There you go.
Well, we are just happy about that.
And we are even ready to appoint in our projects.
external people from other clouds as committers.
Let them develop together with us.
And they fix something.
There you go.
I personally take on tasks guided by the agreement.
with Volodya Borodin in 2017,
in the very first week when I joined Yandex,
He said that, like, what you do,
should somehow be useful to Yandex.
Not necessarily to someone specific.
It's great to contribute. You can contribute like that in Linux.
It will definitely be useful to everyone.
And there is a team that deals with this.
And you know how to contribute to Linux.
The Linux community.
And we won't go into this topic, in short.
There are, in short, special people who handle this.
And this process doesn't go as smoothly as in databases.
Right.
Of course, I don't know, I probably won't reveal any secrets
about the fact that, like, Yandex lives on its own kernel,
which is adapted somewhere,
mainly for the needs of search, probably,
but maybe not just for search, but in general,
there are people who work on kernel development.
Right now, Roma is sitting somewhere,
well, probably sitting far away.
At that time, he also worked on the kernel,
but he approached the kernel from the perspective
of kernel operation in the cloud.
Well, now he's working with databases.
In general, these are very similar things.
People move back and forth.
Some are interested in committing to the kernel,
while others prefer to work with databases.
Right.
Now let's turn to the question of how a person,
who is listening to us, to get...
Okay, to join your team.
Let's say he can start from this, right?
He might want to happily
join your team of 15 people,
become the 16th and work on open source
instead of sitting somewhere
in some commercial company
and developing a purely closed product.
Right? This is every developer's dream,
it seems to me.
Or not?
Or is it not every developer's dream?
I don't think it's every developer's dream.
Let me describe the difficulties,
that you yourself have already started talking about.
The most...
Well, I mean, you started with the difficulties that
are about the process, like we don't have GitHub,
and the dream, it’s kind of,
turns out to be not so beautifully highlighted
with syntax, you know.
It's not like a dream in a commercial company,
where you use
the best tools that can be bought
for money,
or somehow obtained
by fair means or foul, you know.
And they are the newest, the most relevant,
and they look really cool.
There you go. If we don't talk about processes,
many people don't understand
how to build a career when
you do something, and it
gets committed after three years.
For me, as the leader of this team,
it's completely unclear, like,
for example, Roma wrote a
patch that changes something
in archiving, and it takes a year to review it,
and I need it for the performance review.
to decide, like, why to give a bonus.
Do we give Roma a bonus or not?
That's it.
Well, because the bonus is now.
And open source, I don't really want to...
synchronize.
with our review cycles,
and, like,
it would be awkward if, like, in two weeks...
after that...
decision on the bonus, on the bonuses.
clearly accepted,
means that everyone has already been transferred to the card,
what is needed, and in two weeks the community...
"Damn, great patch, everything is approved,
"is diverging, it will be in the release."
There you go.
Another aspect.
of this wonderful work.
dreams, you use, then,
tests in Perl,
which means the C89 language,
so,
all your development is in the console,
it's like, there, apply
mail, git apply mail, that's like your
best friend, git bisect
it's like, git blame, that's everything
that happens to you, you fix bugs,
that Tom Lane made in 2006,
and, most likely, no one in the world
except for you and Tom Lane.
won't understand what it's about.
Here. It's like,
well, I mean, you use something,
that you don't need in other places. Often people
come to Yandex and say that вот
we have mastered the indexing infrastructure,
but it is smaller companies that...
is not needed, this knowledge of
MapReduce systems, where else can I apply it, besides
Yandex, well, maybe
conditionally Google, Facebook, yes.
I support this
from the perspective that, like,
you need to understand the most complex
stuff. Everything that
is available, you need to figure it out. And this kind of
approach eventually leads to the fact that,
that, in general, you like what
you do, and you are satisfied with what
you are engaged in.
Well, and the second aspect is that the community
highly recommends to new contributors,
as they put it, scratch
your own itch.
Like, scratch where it itches, in short.
If you have some problem with the database,
start solving it. If you don't have
problems with the database,
don't try to solve what isn't a problem.
So, in this regard,
to start developing
Postgres, you need to begin with developing systems
that use Postgres.
And, perhaps, you might want to
develop Git,
or develop Linux, or develop
I don't know, some Jenkins.
By the way, these projects that I remember,
that's all.
projects
mentors that I have met.
in the Google Summer of Code program,
that were somewhere here, in Yekaterinburg,
or there in Moscow, it's nearby.
So, people are doing this.
are engaged in this, people around are doing this.
are engaged.
And it's not there
better than life in Postgres, in the sense that
there might be newer tools somewhere.
In Git, it's the same.
That's it.
Not only Postgres - this is
open-source development, and many teams that
contribute to other useful systems.
The general approach is that, like, you can
get involved in... That is, like,
if you maintain
a popular project, there will be many
companies that would at least want to pay
for your work. That's true.
On the other hand, money, like,
with a certain mindset, is never enough,
and you always want more.
And all these companies will want to pay you
specific amounts of money.
Let's take a random example, like Amazon...
I'm not claiming
that it's true, for example, that he pays a million dollars
a year.
In the moment, you can say,
oh, what good money, oh, that's a lot.
But then, after living a few years
on a million dollars a year, you might say...
I want two.
Yes. Well, okay,
not two, I'm not a greedy person.
So what?
One million one hundred fifty.
And then life will take on new colors.
That's when it will be normal.
But right now, I don't know...
I'm doing some old things
for some million, which is really nothing.
That's it.
In other words...
developers usually have enough money.
The question is purely about the attitude towards money.
Well, maybe it seems to me...
Maybe my wife wouldn't agree with me,
that I wouldn't mind earning more for myself.
I'm almost sure she would say that.
So, what are you planning to do to
earn more?
Share with me. No, no, no. I'm an engineer.
My job is to increase the amount of
happiness in the world.
I will be doing that. There you go.
I have... a lot of things that are not real.
You know, in life.
For example, like...
I have a black belt in aikido.
I'm not a real aikidoka and I haven't
been to training in a long time.
Or I'm a candidate for a PhD, right?
I'm also a scientist, not a real one. I defended my
dissertation just for fun. And actually...
I did some research, well, just a little bit.
But I'm a real engineer.
I really create systems.
It's an important part of me, and I believe
I will be doing this no matter what happens.
That is, like, it doesn't matter
what will happen,
I will be making systems.
Even if I find myself in conditions where,
like, it's really difficult to make systems right now.
So you have a question of recognition,
the question of recognition doesn't concern you?
Or does recognition not come to you through money?
You know, there's this image in orphanages,
like...
"Beauty doesn't matter." And there is
a picture, as Emma Watson said,
right? Or something like,
"Money doesn't matter." And that's something Bill Gates
said. When... After that,
as the party wrote in the newspapers, it says that
recognition doesn't matter, it's already meaningless,
it would be nonsense.
I just can't say that now.
Just because it exists, and I have no right to make such
a statement.
So...
I don't know, I can't answer your question.
I would like to say that, like, I can't
speak. You know? For some reason, you want
into committers, for some reason you want to be a committer,
even though it won't affect the amount of
code you write. Moreover,
it seems to me that you will actually write less of it. I will
review more. I...
In general, I had this thought,
that if I have commit rights,
will they revert some of their
patches for me? I don't fully
trust them. That's it.
So you want recognition?
Why is that necessary? No, it's necessary for...
It's a multiplier.
It's a multiplier of the
useful things you can do.
Well, like conditionally,
when you are the author
of patches, you can help
a certain number of users,
for whom...
The problem is in the system
that you, like, understand well.
When you are a committer, you can
help all users
of Postgres. Just, like, almost everyone.
Improve their lives. That's it.
I mainly
guide myself by the thought that, like,
well, there are things that, like,
you just want to do, that's it. Well, damn.
You have to want something.
Some technical
authority, I guess. You have power over
a large volume, over a large collection,
it turns out.
You can help a greater number of people,
which means you have more functionality in
your hands. Right now, you are only responsible for your own patches,
for your narrow functionality,
but then you will be responsible for the whole thing,
for the entire product. It can be like that.
to say, yes. But overall
In general, like, what is the desire for?
to rationalize?
A person is made up of desires, if
you don't want anything if you are just
calculated everything, that you, everything you are aimed at
striving for.
You probably aren't a human at all.
That doesn't happen. Desire
human rational and,
like, you can just want, in short.
Wanting is not harmful, it's harmful not
to want.
I formulated here in
the tactics pack, yes, that I want
this system to become the number one mining system
in the world.
The guys looked at me like, it's, like...
And just like that
then they said that wanting is not harmful.
That's it.
Well, I want to. Well, maybe it won't work out.
Well, let's try. Why not?
You lead a team
of 15 people, and you also
program.
I'm a terrible leader.
So they somehow program on their own, right?
and you also program with them. You're probably more of a technical mentor than a manager.
I'm bad at transferring tasks, and my employees help me with this, mainly the employees.
They help each other, with the process part, because the company still wants to see,
what's being done. For this, we need representation in the tracker, because the sub-task tracker doesn't have it,
it often lags behind, often isn't up to date. I would like to say that we are struggling,
but rather we are not doing anything. In general, this approach is very common at Yandex, which is like,
speak as it is, in short, even if it might somehow discredit you or something else. Perhaps
From the outside, it sounds strange when a leader says, "Yeah, I'm a terrible leader." Well, that's how we are.
It's accepted, in short, even if something is just more likely than deterministically known,
you can freely talk about it. For example, I can say that I am good at motivating.
people. It almost never happens that a person works in my team because they were told to.
to work on something. Usually, people want to do it, but the story is exactly the same, I just replace the order with...
I usually sell the task by saying that this task is very useful. Many people will stop suffering or...
will start to feel happy just from rearranging the lines. Sometimes it's quite psychologically...
a difficult project. I have one intern, Stepa, who spent six months looking for the problem of corruption and recovery.
backups in Crimplan. After six months, he found it and rearranged two lines. That's all he...
wrote. Here is a person who came from university to learn something at Yandex, and he rearranged...
two lines. On one hand, it's kind of a success because after that we can sleep peacefully, we have...
the monitoring stopped turning red. We know that our restores are reliable. On the other hand, the person...
probably wanted to learn something more than just rearranging two lines. He was aiming for those two...
lines through dozens or hundreds of tests. He was checking. He learned to go far and to be sufficiently...
persistent. Probably, yes, persistence is the third thing I would recommend. But persistence, on the other...
hand, we often say that banging your head against the wall is not the most productive thing you can do.
to engage in. Therefore, some balance is needed between persistence and using your head for its intended purpose.
And who sells you the tasks? You sell them to your guys, but who sells them to you? Or did you learn to do it yourself?
Do you not need a salesperson anymore to buy? Last time, my manager Dima Smal sold me the task.
He says, listen, we have a certain number of clusters that gathered in an incomplete form after the exercises.
There was a gap in the logs in the story. It's very important that you see, here is the duty officer, and here are his hands, with which he...
So, while walking through the SSC, he gathered those clusters, about 80 of them. Here are three more people who are in the chat...
volunteered to help him because they saw that he was struggling. You can see that he is struggling, right? Take the task. Dima...
sold this task. But on the other hand, there are formal goals that we strive for.
Well, for example, there’s something like Zero Corruption. But how is this goal implemented? I look at the mailing list,
I look at the monitoring and try to relate everything to one another. I saw a message about problems in the list,
I saw something like that in production, trying to find a way to connect the dots, as they say, and then where to go with this.
send the test. Just like I mentioned before, Back to my Lane in 2006. In the summer, the stars aligned for me in that...
that one report was internal, that requests were stuck. And there was one email in the hackers about the fact that there is...
deadlock. I connected them, and it turned out that this was actually corruption that had always been there. I knew about it...
By the way, I gave a presentation later in St. Petersburg at the PGConf, where I talked for 40 minutes about how it all happened...
that there is an extreme case that has been present in the codebase for 19 years, and no one has...
But for some reason, with a gap of just three months, they almost simultaneously occurred at Yandex,
and also externally, Dima Yuryevich also encountered it. I don't know which company he is with, so I can't say.
said where he is from. By the way, many people in the hackers and in PJS Curibux do not mention their companies. For me,
it's just... Maybe they're afraid. What are they afraid of? I would start to respect their companies much more because of that,
that I find out that they are not hindered from doing their work. I don't know where Dima Yuryevich works, but he is...
specifically engaged in work that, damn, will save a lot of bytes. Well, he... You know, the attitude in...
The attitude in the industry towards open-source is not always positive. Many consider open-source to be toys, while real work is...
it's proprietary software. Working on it internally, private, exclusive - that's real work. I don't know about you...
Is it okay to say in an interview that this seems like some kind of madness? I mean, the system is made out of this...
In short. Well, open-source is just toys, you know how people feel about it. This laptop consists of...
of open-source. This program that we are communicating through is basically running in an open-source browser.
There are bytes along the way from me to you. There is so much open-source with all the risks and shortcomings of open-source.
the way, it's just mind-blowing. On the other hand, I can understand companies that are not eager to
do something for the whole world. Well, like, if your business is based on doing something
ultra-scalable, like Yandex, right? They expect any developer to, well, you have to write
a product that works for a billion people. People who come from university say,
but I have 30 people in my group. Let's do something for 30. And that's a normal approach, like, we are doing
something for a small number of people. And then we don't have that multiplier, but we are still
increasing people's happiness. It's fine not to spread ourselves thin on open-source products, to do something for these
30 people that we are working for. Maybe we even have just one user because they really need our
program, or... Maybe I'm writing a program just for myself, and I'll pay for it myself, in short. That's a normal
approach. Anyway, okay, I'm not crazy, I understand them too. And if a person works somewhere in...
a company that, I don't know, automates the work of a specific supermarket or a chain of supermarkets,
and he wants to do open-source, he's not at Yandex, he's not in your team, he works in automation, and the ACS (Automated Control System)
builds for supermarkets, and he wants to do open-source, what should he do? The ACS is still not a supermarket. I once
set up a rolling mill at some rolling mill, in short, at AmurMetal, at the AmurMetal plant, there. There
there were none, and the ACS, in short, had zero supermarkets involved in this system. Okay, he is working on the ACS at the plant,
there is no talk of open-source, he is making a specific product for a specific rolling mill, like you.
Did I say? So, at this rolling mill, there are some components that he would like to improve. In that,
there, I don't know what it's called, in the SCADA system that he uses, if it's not some kind of...
Siemens, yes, there is something open-source, there is something to improve. Postgres is not the only system in which...
there are plenty of other useful systems in the world to work on. After all, he has, I don't know, some kind of...
a browser could be along the way, which is also open-source, Chromium, B8, can be improved. This can also be...
get started, just by assembling it, right. To see what problems there are with it. First of all...
First, you need to work on your own tools, on the tools that you consider to be yours.
It may not be something that is used super often, but you need it. When you start...
doing something that you don't need, I don't think that's the right way. So, just going out to...
GitHub, find the most popular project and start committing there without really understanding why. No, consider...
your own system, find the components in it that need attention. All systems are in...
lacking attention. Maybe some attention will need to be given, just a little bit to refine it...
to gain specific skills there, yes, to see the problems. But still, we are all in...
a severe lack of attention. This attention cannot be replaced by neural networks in any way. That's how I feel. Maybe...
I'm a skeptic. Their attention is not the same yet. Find systems that are used in your...
from the systems that are used in what you do. Ultimately, everything you do...
should benefit people. Specific people. Not just any people, but living people. Made of flesh...
and like that, you know, with hair. With hair or not, it doesn't matter. But people. Or animals. Or bacteria.
But these bacteria must be inside people. Right. If you don't use a system but have always wanted...
to build something based on that system, you will find flaws there, so focus on those flaws.
There's no point in starting from what you want to build there. Drawing a great picture out of nowhere...
You should start from the flaws of the real world. There are plenty of them. Everything here consists of human...
errors. The entire engineering history is made up of locomotives that don't run, airplanes...
that don't fly, power plants that don't supply anything. And browsers that don't display...
the right images and all that. Engineering systems always have room for improvement. Don't you think...
that we are improving the American system? By making contributions to Postgres or Linux or Docker or...
anywhere or in MySQL. We are helping in this way. It doesn't even matter who we are helping. Right now, there is a Russian...
flag across the way. In fact, we have a lot of flags here. Why not take Postgres and turn it into...
Fork it, rename it, and say now this is our database under our Russian name, and then...
continue your own development? Or even create a new database from scratch? Look, the American...
corporations that supplied us with technology have isolated themselves from us. It's true. The community...
Open Source hasn't really isolated itself from us. I mean, there are some individuals who sometimes...
hold quite high positions in the communities. Nevertheless, I would say that the way the internet is designed...
indivisible, the community still remains united and hasn't cracked yet. I don't see...
the need for sovereign Open Source. I could say that I don't see the opportunities. But there are certainly opportunities...
there are always opportunities. In any cave, with a piece of wire and a soldering iron, you can make a processor and happily...
set up your own system, which will have some limitations, but sooner or later you...
you will organize it to the necessary level. Of course, you can build Open Source in a very small community,
not to mention that in such a large country, you can create a system. We know very well that,
that solving individual tasks of the space race is quite accessible to our engineers. But people will be...
happier knowing that the world will be united. All of us, the people around, will live life more fully,
qualitatively, because engineers around the world will treat each other in a friendly manner. I am from this...
I hold this point of view and often find myself in America, sharing ideas with pleasure and...
I eavesdrop on their wonderful ideas, and they share the same point of view. We are creating...
common systems, there is no need to divide them. There are no threats in this, at least from a security perspective.
What other threats could there be? We are given the code. Why shouldn't we share the code? I don't impose my opinion on anyone,
maybe someone wants to have their own doomsday system, which should be built as if within one
monolithic entity. The Chinese really want to do this; they create their own products, and very often these are
forks of existing products, but over time they drift far from the original. You have united one billion people
under the Chinese, but all Chinese are the same, which is not true. There are many of them, they are different, interesting, and sometimes it can be difficult
to understand, but for example, we are working on Apache Cloudberry, this project, to be honest,
is a project of the Chinese company Hishdata. We approached them because Greenplum was shut down, Deo Open Source,
the American company Broadcom closed the development of the Greenplum project. We wanted to create something Russian,
but we couldn't come to an agreement. We never wanted to be alone, like we can do it, and in general...
made our fork available to everyone, but we never talk about it unnecessarily; those who need it can use it.
We don't promote it, we don't advertise it anywhere, but we understand that
development is about collaboration. So we contacted the Chinese company Horizdata
and they said, okay, fine, they appointed several of us as committers to their
project, which they had been developing for two years prior, and later they said, you know, we want
not just you, well, we want you too, but we would also like the rest of the
world, please. And then they came under Apache. Apache has very ambiguous
relationships, I understand perfectly, but it is a way to unite with a large number of
people to help a large number of people. This is a Chinese company. So,
in short, there are different kinds of Chinese people. There are Chinese people who, in this regard, are no different from...
Russians, they are no different from Americans. There are Americans who...
want something that belongs to the USA and will be the property of the US Government.
I do not judge them. They are far from my interests, from being interested in my opinion,
so there’s no point in expressing it unnecessarily. Do you have any influence on the community?
Well, for example, you say there is no tracker in Postgres, but it seems you would like to have one.
have one, and maybe many others like you. Can you somehow influence the fact that...
change the situation in the community and have a tracker appear? Or, for example, translate all...
development on GitHub? I understand that this won't happen, but nevertheless, or are you there...
just a passive contributor, and only your code is needed, while your opinion is not interesting to anyone?
You can go to the leadership meeting and say, "Guys, that's it, we need a tracker, in short." But this...
position needs to be prepared in advance, to come with strong arguments, to come with a good...
understanding of counterarguments and prepare the ground in advance. I think this is a feasible project.
Well, this is not an engineering project at all, it's not about me at all. I mean, I like the code.
Alster Turner from Percona. However, he is not among the contributors,
nor in the leadership meeting. Nevertheless, he has taken up the issue, and he wants us to consider...
other paths. In order to promote this idea, he gave a presentation at Perjicon FDV in Montreal in...
last year. Personally, he didn't convince me; I didn't hear a single strong argument.
with which I would agree. Nevertheless, it is clear that the person is working in this direction, and there are those who are interested.
can join Alster. They find it difficult to pronounce our names, and we find it difficult to pronounce...
their names. I don't know how to pronounce his name correctly. In general, in short, you can join him,
and together there will be more of you than just one Alster. In the field of coding, you write code in some way...
way, while other people write code in a different way. Surely you have some technical...
conflicts regarding the patterns that can and cannot be used, regarding code style, regarding approaches,
regarding error handling, well, some internal conventions. Do you have the opportunity to influence them?
Anyone has the opportunity to convince everyone in the thread. Everyone is ready to change their position,
convincing Tom Lane is not as difficult as it seems. The only thing you need is to use reasonable...
arguments. Quite often, the community has some kind of bias that can be easily overcome with reasonable...
arguments. This happens all the time. Well, and again, as I mentioned, attention is in short supply,
their attention, any person will direct their attention to a certain place, making it so that something...
happens. You turn your attention towards, I don't know, like this. At some point, Kolya Samokhvalov...
asked me to pay attention to the fact that there is no Transaction Timeout. I turned my attention to it, and it happened.
Before that, we had Idle in Transaction Timeout and Idle Timeout. There was just no Transaction Timeout.
Statement Timeout. There were many different ones, but there was no such thing. I paid attention to it, and it happened,
and it all works almost like that. You have your currency, your attention. It’s, well,
worth as much as it deeply penetrates the essence of changes in the database. And you take this...
currency can be directed somewhere. You can say that in this feature, I will invest it there, and it will happen.
or it won't happen if your investments are insufficient. For some major things in my...
attention is clearly insufficient. Just yesterday, I wrote an article on Habr, where I wrote about where I...
invested my attention, but it wasn't enough. Some guys from Yandex.Cloud came to me and said,
that since you ended up on the list, why don't you write an article about how to get there? I said,
the article will consist of my achievements, it will be published under my name. I can't write it like that,
like, look how great I am. Let me write about what I didn't succeed in. They said,
Okay, fine, write about what didn't work out. And this list is bigger than what did work.
Here, the first thing that came to mind, I wrote it down. And there are things written there,
where my attention was really insufficient. And not just mine, others also invested there.
many large companies. As an example, I can mention the story of logical replication.
DDL, like you created a table, and its DDL was replicated to a logical replica, a very simple feature. Amazon invested in it.
invested a dozen developers over several years. Maybe not full-time,
they were only focused on this, but there were very serious people who worked on it for a long time. And in the end, the most
It's funny that I came there as a commit fest manager in 2023, and I was like, well, you haven't done anything for six months.
was happening, I closed the patch. And I closed it. Well, I asked, rather, are you going to do anything, and I got two
months without a response, so I closed it. Well, I wouldn't say that I'm abusing my authority. Well,
Indeed, there are many such things that have been in development for many years, and then it becomes clear that we've entered a dead end,
nothing is happening, the authors clearly don't want to give up. Well, I have to say that, well,
it seems that nothing is happening right now. Is there any architect for the project? A person,
making the final decision? Unfortunately, well, the committer, each committer makes the final decision.
a decision. Almost always, if a person made the wrong decision, they revert it themselves. There are
similar to an architect, like a sculptor, in short, an entity called the Release Management Team. There is such a thing.
The joke is that the development of Postgres will be set up by five Commit Fests. A Commit Fest is a place where they accept
new patches. And one revert fest. This is a month after the feature freeze, when the Release Management Team says,
that a committer comes and says, here’s the feature that you committed, it often turns red on
Buildfarm. Either you resolve this issue, or we resolve this issue. Almost always, the committer goes themselves.
and reverts it, someone committed it. For such things, there is a special list called Open Items, when, like,
a decision has not been made about what to do. I think that if the committer does not resolve the issue for a long time, they may
lose their commit bit, and it seems that such stories have happened. We also have a Core Team, which officially states,
that it does nothing. I misphrased that. The Core Team says that it will intervene if the committers are able to
reach a consensus. And they say that it's good that this has never happened. Committers
can always come to an agreement with each other without the intervention of the Core Team. So if you submitted a patch,
and in this patch, you are changing the design, you are not just fixing a bug, but you are changing the design of some block.
of code, you are doing it differently. The committer tells you, well, it's strange, something doesn't sit right with me, I won't accept this.
history, it doesn't suit me. But you believe that you are doing the right thing. What next? You can somehow
bring someone else to your side, additional people, gather some kind of committee, whom you need to
convince. Or is the committer the final point? Other committers. Any committer can accept it. You invite
more people to the branch and say, you start complaining about your first committer. Listen, but the reality
doesn't look like that. It looks like you go to a conference, there's a bar with the right person,
you meet and the screws are organized automatically, with a beer load you kind of sell the idea that
this is actually the right solution. You convince them that in these functions
it needs to be done differently, in principle. We have 10 of them, they all do it this way. Let's say, I don't know,
in case of an error, instead of returning minus one, we should throw an exception. Well, for example. We call it e-log,
when your seek handler catches your... Alright, let's not go into the technical part. You need to convince someone.
You convinced them at the bar, but then another committer will come and say, "No, I think differently." Don't let them shut you down.
support the patches or just take a look. If those two committers say that something is wrong, then it's not worth it.
to start further. So it turns out to be a kind of democracy, a sort of majority decision or
groups. There is no single person to rely on and say, "what he said is how it will be."
There is a person who will ultimately be responsible. So if you find a person who will be
responsible for making this development happen, then it will happen. Moreover, after that, it doesn't matter,
whether you will support it yourself or not. If the committer has decided that they want this to happen,
they have every right to take your patch. You might be completely against it afterwards,
for it to be committed, but they decided that it's time to commit. And they are responsible for it.
There are architects, there are just 30 of them, you have 30 architects.
Yes, this is a problem of scaling the project, that we cannot divide the area of responsibility.
Each committer is responsible for everything, absolutely everything. There are such gentleman's agreements,
gentlemen trust each other on their word. We have a situation here that if you committed a feature,
then for the first few years you are responsible for it, and then it gradually becomes a collective responsibility. If something was committed
a long time ago, then you are no longer the last one responsible, but we all share that responsibility together. But this is just to ensure that people do not
fear committing. Robert Haas has a great talk where he has two pictures,
meaning. Everyone thinks about what a committer looks like. There is a person sitting on a throne, and I decide
whose code to accept. And the next picture shows what he actually looks like. A very stressed person,
with a lot of questions, who at the same time carries some part of the responsibility that is not needed by anyone
except for him. That's the story. And doesn't it bother you to work in a format where you are always...
you always have to convince someone for your code to eventually make it to production. After all, if it were...
if it were your project, even if it were smaller in size, you would be the king and god there. You would...
make all the decisions. Well, that's pretty much how it happens. We have quite a lot of open-source...
projects that... In general, Postgres is designed to have extensions. We have dozens of extensions there,
that are not part of the core, and there we decide for ourselves what to commit. So, often in projects like
In Cloudberry, there are times when I bring something, and it's still a draft. I forgot to press the draft button,
the committer of this project comes and clicks merge. I'm like, damn, guys, wait, I'm not used to this happening so quickly.
to work. I thought I would think it over some more. Everything is in production. Let's go. So, does it stress me out...
the need to discuss with someone? Right now, no one commits their code, including the committers, without...
discussing it. A committer can commit something without a review, but they cannot commit without discussing it. It's a normal...
It's a story where a committer posts their patch to the list, saying that they want to move something between the configs.
Something non-critical. No one replied to him for a month, so he moved it. There are times when no one responds.
a week, he moved it. A couple of days ago, Heikki committed it, sent it, in short, Heikki Linnekanagas, one of...
committers, brought a patch, like, I want to move the SLR setting from one file to another. And in a...
four hours later, I was like, well, I didn't hear anything, so I went ahead and committed it. After that, there was a long thread...
with comments, because people were outraged specifically by the fact that he committed too quickly. We didn't have...
time to respond to you. And because of this, people fundamentally looked more closely at this change and...
objections arose. If you had a proper GitHub, you would have made it mandatory to have...
two reviewers for each pull request, and such a situation would never have happened. Not necessarily. A committer...
can commit whatever they want. Well, yes, but if it were GitHub with control settings, then they wouldn't...
could commit whatever they wanted. Then we would have an unnecessary restriction, which we actually do not...
exists. Well, there wouldn't be such a situation. I mean, there wouldn't be such a situation, but overall it...
it's designed that way. A committer can commit whatever they want. It's not a written rule to wait...
reactions. So, let me return to your question. It might have been more comfortable for you to work...
let it be in a smaller project, with fewer users, but it would be your personal project,
in which you and 10 contributors are involved. But you would control the architecture yourself, you would make the decisions...
the decisions, you would be the main committer. So I have dozens of such projects. I mean, there is...
in LG, which has 220 contributors, where I appointed 30 of my committers, who are being used...
Well, I don't know, among the major users, I guess I've heard about almost Russian clouds.
for example, they are being used. The guys from AHD are not sure; it seems they mentioned at the conference that they are being used.
Western companies also use it. I think Metro Cash & Carry uses it. I've heard that it seems...
GitLab also uses it. Well, Microsoft seems to use it too. So, on one hand, it...
useful, on the other hand, it's completely mine. I can create anarchy there because, well, there is...
Admin Bit can assign committers, remove committers, do whatever they want. So, where would you feel more comfortable, where...
do you prefer? In a large project, where you are in a secondary position, or in a smaller one, where you are not...
not completely small, but in a smaller one, where you are the most important? Well, you see, I want to learn. I need...
so that there are many people who understand the systems better. I learn much more in Postgres. Here,
I feel my development, yes. I've been working with databases for 20 years, and I don't know a lot. It's very...
I often learn something new from the commit messages of Tom Lane, just by reading what he writes there.
he wrote about someone else's code. So, of course, it's more interesting in Postgres, but it's a specific place,
where there are old tools, where there are complex tasks, where something is done for years, but it's more interesting there,
much more. So, the other thing is that this tool, which is useful right now for my...
employer. At Yandex, there are 25 petabytes of Postgres data being transferred from VM disks every night.
in S3. So, the tool that constantly transfers a lot of bytes is tailored for our production. Well,
not only ours, I mean it can also work in Google Cloud and in Microsoft Azure, but...
since we have the majority there, we did everything to make it good at home. I don't think we need to...
limit ourselves to just one project. We need to, how to put it... Try everything. How to say... So,
Lastly, I wanted to ask you about higher education. Do you teach? Yes. Right now, I...
the Decentralized School of the Ural Federal University. I teach at SHAD. Our educational initiative is a school...
data analysis. If we talk about the course I teach, my mentor told me back in 2008,
like, teach what you are doing, teach that. In short, what is relevant. It doesn't matter,
what it's called. We will then adjust the educational program to what you are teaching.
It can be with a lag of 1-2 years. Right now, my discipline is called "failure of resilient systems,"
and I teach what happens when a database fails using Postgres as an example. We look at
various crashes, failovers, switchovers, and extreme replication issues. The database, in general, seems,
I'm not a very good teacher here because... Well, you see, I see how databases break,
like I currently have 25 thousand databases in production. Strange coincidence. Well, okay. In short, there may be
some other number of clusters, I mean, still tens of thousands of clusters. And my attention
is only drawn to those that have problems. So I say, like, in Postgres there is corruption,
but it exists. There are bugs that lead to data loss. But a normal person in life...
should never find out about this. Right. If I start telling people a course on databases,
I will start with the fact that databases don't work, they don't solve their task. My students will ask me,
why are you telling us about it? Well, that's an extreme case. Talk about normal cases. And I've forgotten
the normal cases already. I don't know, like, how to name a sequence. Like, when you create a table, how should you name
the sequence? Do you need to include the table name or not? Right. Or like, how do I decide how many
indexes to create? Who knows how many indexes you need. Well, just touch it in production somehow,
and you'll figure out how many indexes you need. Right. I can't give a general course right now, and I adhere to the point
of view that I can only teach what I am an expert in. My area of specialization changes over the years,
so the course gets renamed too. At one time, this course was called computer graphics.
It just gradually mutated. At first, I was interested. After computer graphics, mobile...
development. I got involved, I was writing for Windows Phone, for 75. Right. In C#. Then somehow I became
interested. I started to shift a bit towards web development. From web development, it shifted
closer to databases. From databases, it went into the database core. From the database core, it went
into failovers and outages. In monitoring and all that. Right. But that's one course. We just kind of tailored it
to what I actually teach. Right. I don't even know what the connection between computer graphics and this is.
It's a long journey. You've come a long way. Well, I hope this is not the end. We will continue to mutate further.
How do you assess the quality of students now? As they say, the grass was greener before.
Normal students. I don't like the hype. Yes. That is, developers... Many talk about the problem,
I'll tell you what I'm asking about. Many talk about the issue of adapting education to students
that is happening now. We try, we as teachers, universities in general try to give...
the student what he wants to receive. And he wants some practical knowledge, instructions. He...
literally wants to get a manual on Python and finally go to work. And higher education...
is turning into programming courses. That's the problem. I have... First of all,
when I came in, I had the feeling that those who know how to program when entering the university,
those are the only ones who can do it when they graduate. That is, we won't change anything in a person in five years. And such an idea,
that you can't teach programming. You can only learn programming. My
The main task as a teacher, in short, is not to interfere. Then my friend Sergey Mirovada told me that...
pressure makes diamonds. In short, we need to find people and push them. By the way, this idea aligns with the idea...
the Shada School of Data Analysis. Their main slogan is "It will be difficult, you will like it." So, it's a bit like that.
a sadistic idea. Well, damn. It is what it is, I guess. Still, I adhere to the belief that education...
should be about overcoming challenges. If a student asks for Python, we won't give him Python; we will make him...
write in C. If a student wants C, we will make him write in Python. The point is not to...
you know that joke. Like, when two soldiers are sitting on a plateau, a sergeant walks by. "What are you...
doing? Well, that's it, it's all over, free time is over. They took two bars and swept the parade ground."
"Why, comrade sergeant, should we take the bars? Just to clean up. No need, just no need for you to get tired." There you go.
Of course, it's a bit of a cliché joke. But the essence is roughly the same. The brain is like a muscle; it needs to be trained.
It is necessary for it to be challenged. And the things that seem difficult, even perhaps useless,
but it is precisely the challenging ones, the things that are difficult right now, that need to be utilized. Excluding emotional...
the emotional component, of course. Like, there's no need to just suffer for the sake of suffering. Well, like,
to tackle the tasks that I find difficult right now, those are the ones I need to work on. So, I try to challenge myself after all.
students and offer them something that will require mental effort. What exactly will that be?
Well, I know about database denials. Whether it's math, or writing poetry,
or, I don't know, breaking down some chemical reactions, it doesn't matter. A person
must teach their brain to adapt to the load, and then apply that mental strength
to create systems and improve people's lives. That's my thought. In short, students,
they come with brains, and we train them as best we can. The delta is small; they become quite dull
while studying with us. But we still try to resist that and leave them, besides, like,
knowledge that is, well, more or less objective. Something humanity has studied in systems. Also,
to give them the mental strength to solve problems. So, do you hire your students? Does that happen?
All my department came to me as interns, except for two people. They all grew up,
and almost all of them teach. Because you can't absorb knowledge if you don't share it. You
don't really understand something until you've explained it. So, well, I don't tell anyone that it's mandatory. I
just do it myself. And many do the same. Gaining knowledge, creating knowledge, and sharing knowledge. This
is an important component of engineering work. Without this, it turns out, crappy systems are unnecessary.
Let's finish with some kind of call to action. Let's sell something. Maybe you need students, maybe
you need employees. Give some kind of advertisement. What do you need? Contributors. I invite everyone
to review my patches. There are plenty at Comedfest that lack attention. Almost all the patches
about which I give talks. On one hand, I've talked about them in detail. On the other hand,
there are few people who look at them. Right now, if anyone is in the mood, I would be very happy
if someone reviewed my patches about the pages of the tree and about the compression of the shaft. Right now, PostgreSQL is writing
writes the logs to disk in an uncompressed form. As Alexey Milovidov says, any byte that goes into IO
deserves compression. Everything that is processed in the CPU and comes out of it needs to be compressed. Please review
my patch about wall compression, because there is a serious lack of people looking at it. I wrote about this in January.
A few people reached out to me with general questions, but no one looked at the code. So, you just need
to go to the mailing list and there, without being a commissioner, contributor, or maintainer,
just write hello, I think this is a bad patch. There are two options, you have backends,
there are many of them. 100 backends are concurrently writing requests. They write their wall records and compress them
in parallel. That is, each on its own processor. I chose this path, where each backend
compresses separately because it scales better. There is another way, you can just, when they have all
serviced their records and the wall writer writes the wall to disk. It’s easier to compress there because you already have
a sequence, you don’t have to think about which bytes to reference earlier. So, maybe it’s worth
considering, if you think that a simpler approach is the best, try it and tell me that Andrey,
your architecture is rubbish, we need to create one serviced compressor. This is the main key
turning point that we need to agree on. Either we compress in each backend in parallel with
low coupling but high parallelism. Low coupling means we will have a low
compression ratio. Or we do it in a single thread, but with a good compression ratio. Or if you suddenly
think that this is a good idea, let’s do both options. That is also welcome, but figure out
what is happening in the system. For this, I will send a link to the presentation, if it is published; if not,
I will at least send the slides. And a link to the discussion on Hacker News. Just go to your email
client, send a message to this thread. And why do you need this discussion? What will it add? It will give weight to it.
your patch? Or are you just interested in an outside opinion? First of all, I would like to build
a mental model in my head. We lack introspection tools in our minds, tools
of reflection. Even understanding which tooth hurts a person is difficult. Good ideas are born between minds. That too
this is one of those fundamental things at Yandex. If you want to think, think about someone else. Good ideas
ideas are born in communication. Even if that person is just sitting there like a yellow duck, in reality, they
is internally building this model. And they will later tell you where you messed up, where you don't understand something. Come on over.
let’s discuss. For now, people are not very engaged. I even have this poster hanging about it, so that...
discuss with someone. But everyone has their own tasks; a person's attention is a very valuable resource. You can't just take it for granted.
You can't just take it for granted. By the way, don't give your attention to TikTok or YouTube or, I don't know, any content.
Save your attention, give your attention to me. There you go. The most important advice for today. And you yourself...
Do you use TikTok, scroll through Instagram feeds? The community in social networks. Because of this, I...
I use social networks. I see where everyone is planning to go and to attend kick conferences. I have a classmate...
I post pictures of how my son and I build sandcastles a few times a year. There you go. Overall, I maintain...
a social profile. It is one-third made up of running and... photos from runs and those kinds of things, and from...
vacations. Everything else is technical content, which is mainly aimed at the community I am involved with.
I communicate. There you go. So, I'm not on TikTok, I can't dance. It's a pity. I would right now...
I would really like to learn how to draw, dance, and play the piano. Well, I play a bit of the piano.
Wow, that's impressive ahead. Yes. Alright, let's provide links to your social media. Let them subscribe.
They are watching you. Subscribe, add as friends. What’s there to subscribe to? I don’t... Let them come to...
the conference. I think we will publish the video before the conference you mentioned. Maybe...
They will come to listen to you. Yes, I constantly meet at conferences. So come,
we can discuss something. Thank you. I appreciate it. Thank you very much. Let's create systems. There you go.
Ask follow-up questions or revisit key timestamps.
This video discusses the intricacies of open-source development, focusing on PostgreSQL. It covers topics such as the naming conventions for PostgreSQL, the history of its development, and the process of contributing patches. The discussion highlights the challenges and rewards of working with mailing lists, the importance of commit messages, and the continuous efforts to improve testing infrastructure, including CI and build farms. A significant portion is dedicated to real-world scenarios like bug detection, deadlock issues, and the rigorous process of test case development to ensure system stability. The conversation also touches upon career paths in open-source, the role of committers, the influence of companies in open-source projects, and the personal motivations driving contributions. Finally, it explores the educational aspects of open-source, emphasizing the development of critical thinking and problem-solving skills through challenging tasks and collaboration.
Videos recently processed by our community