И36: В.И. Хориков | Лучшие практики юнит тестирования
835 segments
- Vladimir, good day. It's morning for you. - Yes, good day.
We have a huge number of questions for you. We read you and love your books on unit testing.
We want to know your current thoughts on unit testing and testing in general at this moment.
I'll start with a general, simple question that, nevertheless, concerns many.
Why should a programmer write tests if he is not a tester?
If he is just a programmer who knows how to write good code, his code works, he deploys it to production, and everything is okay.
Why should he engage in creating some unit and integration tests?
Yes, that's a great question. Every time on every project, I, well maybe not every time, but often have to
go over, so to speak, this very question with people.
The most, most important reason to write tests is that the project's code,
The speed at which you write your project varies depending on the stage of that project.
In the beginning of the project, when it is just starting, the speed is usually at its maximum for writing code,
because there hasn't been any code in the project yet.
You, so to speak, have no problems editing or refactoring the code because it has just been written.
But over time, as you work on this project, the speed of development decreases.
That is, you start to have some workarounds in the code, and the so-called technical debt begins to accumulate.
Perhaps somewhere you didn't build the architecture as needed from the start, and therefore new requirements
don't quite fit into your architecture, and you need to adjust the architecture itself somehow.
But, again, there isn't always time for that, so you have to create some workarounds.
So, this baggage accumulates over time with the existing code, and it requires constant attention.
In order for your development speed not to decrease, you need to constantly dedicate time and attention to the code
and continuously refactor it. That is, reduce technical debt, reduce its so-called entropy,
in order to maintain the speed of your project development.
So, constant refactoring is necessary to maintain development speed,
and for the refactoring itself to be successful, you need to have some kind of safety net.
And tests are precisely that safety net, where you can safely refactor,
change your architecture, and some parts of the code to meet new requirements, so that later you won't have to
write any tests or have ones that will, so to speak, hold you back.
And thus, maintain the development speed of this project.
Actually, this is a somewhat lengthy answer, but in short, tests are necessary for preservation.
They serve as a safety net that allows you to maintain the development speed of your project over time.
This sounds like common sense, at least to me, but a huge number of teams do not write tests at all.
The question is, how would you advise convincing the team that this is necessary?
Well, again, this comes more with experience, because in small projects,
if you were to build a graph of how necessary tests are depending on the length or complexity of the project,
it would probably look something like this. So, over time as the project works, on the X-axis, so to speak,
the complexity of the project increases and, as a rule, exponentially.
And at some point, it exceeds your ability to assess the project or think about it.
And in order not to exceed this limit, you need to refactor it again.
Otherwise, if you do not do this, you will have more and more bugs in the project with each new release,
or your development speed for this project will continuously decrease because you are spending more and more time on manual testing.
Another way to think about unit tests is as a way to replace manual testing with automated testing,
so that over time you have fewer regressions, meaning fewer bugs with each new release.
Initially, as a rule, unit testing is not that important for the project,
simply because the project itself is small in size and any bugs that arise in it,
you can quickly estimate where exactly they occur, in which part of the project, and you can fix them quite quickly.
But over time, again, a situation arises where you can no longer keep all the code of the project in your memory,
and you need help from the code itself, from unit tests,
in order to maintain your code, your project in such a good state that it has no bugs.
And if we have never written tests, and our project is already several years old,
and no one has ever dealt with it, would you still recommend starting this process?
It depends on the project, because I would say that not every project requires tests,
well, at least different types of tests are suitable for different types of projects,
and not every type of project, so to speak, is well-suited for unit testing,
because there are certain domains where the very approach to testing,
well, some areas are just quite difficult to test.
For example, if you have a project involving calculations,
where there are a lot of computations, such as accounting calculations,
then that logic is quite simple and testable.
You can isolate it into functions, methods, and it’s quite easy to test.
But if you have a project, for example, related to converting HTML to PDF,
that is already much more difficult to test, at least in terms of unit testing,
because in order to understand if the HTML was correctly converted to PDF,
you need to open that PDF, meaning there is a lot of manual component in testing,
so there are still parts that could be tested,
but there are fewer of those parts than in some accounting software,
where everything is quite simple and obvious.
And how do we know if we have enough tests? Have we written them well, is everything OK,
or do we need to continue demanding more from the team?
I know your attitude towards Code Coverage as a metric,
But what would you then recommend using to understand when to stop?
Well, Code Coverage is indeed a good metric in that sense,
that it at least allows you to assess how much,
what part of your project is covered by tests.
Here, again, it depends on the specifics of the project.
If the project is quite standard, like a Business Line Application,
and testing it is straightforward, then you need to, so to speak, evaluate,
you should aim for a higher percentage of coverage than, for example, in some other software,
which, again, generates a PDF from HTML.
This is one metric, Code Coverage, which will allow you to assess, so to speak,
it will not allow you to assess the quality of your tests,
but at least it will give you a rough estimate
of how adequately your project is covered by tests.
Here’s another good metric that is more related to business —
it’s simply how many bugs you encounter with each release.
If there are quite a few bugs, or if you spend a lot of time
on manual testing of each release to identify and prevent these bugs,
then this is also a good indicator that, most likely,
your test coverage in the project is insufficient and needs improvement.
And what do you think about artificial intelligence in this regard?
Is it not helpful for us in the long run, or maybe already for increasing coverage?
I am not a specialist.
I should have asked about this; it’s on everyone’s mind.
Of course, I am not an expert in AI, in artificial intelligence, unfortunately.
But I can say how I use it in my projects.
And it is a good aid to speed up work,
to accelerate the writing process, including tests.
It also helps to identify some gaps in coverage.
That’s quite good, especially if the function has...
The method underTest has a clear set of input and output,
Then the LLM model can suggest to you which of these inputs
and combinations of these inputs you have not tested enough
and may even write tests for you for that.
So, as a rule, I use it when I have...
If it’s a simple unit test, for example, I don’t know, again, a calculator,
where there are, say, several inputs, several numbers
and some sets of inputs for the output,
then you can just hand it over to ai-ai... Sorry, I will call it ai-ai out of habit,
because my wife and I discussed this topic,
and she jokingly called it ai-ai, and since then I’ve also started calling it ai-ai.
You can hand over the writing of all these unit tests to ai-ai,
and it will write them quite well for this function.
That is, it won’t miss a single one of the combinations.
Everything will be great. For more complex, for example, integration tests,
I usually write one or two examples, then say, "do something similar for me,"
and it also does quite well. As a rule, there’s no need to edit anything.
If needed, just a little bit of editing is required.
So it works very well in the sense that if you give it an example,
it will write it excellently. From scratch, as a rule, it only manages
to do that in simple cases. If there’s some small function.
In more complex cases, I really like how it writes tests from scratch.
But again, if you give it examples, it works a lot better.
What do you see as the prospects in five years, in ten?
Maybe it will completely eliminate programmers from the process?
Unlikely. I believe that ai-ai serves more as an amplifier for us.
That is, it multiplies your output as a programmer.
If your output isn’t very good, then multiplying that output will also result in
more code, but that code may not be very good.
But if you pay attention to code quality and try to maintain it at a high level,
then ai-ai will allow you, again, to be more productive.
and write more high-quality code in a shorter amount of time.
So I don’t think that ai-ai will be able to completely replace people.
At least, I hope so.
But again, I don’t think that will actually happen.
It may be necessary to have fewer programmers.
Again, that’s not a certainty, because the more...
And what does ai-ai lead to? It essentially leads to a decrease in the cost of programmers.
A decrease in the cost of something usually leads to an increase in demand for that product,
that is, for programmers over time.
So it’s not a fact that the demand for programmers will decrease,
but it is certain that the productivity of programmers will increase.
Another point is that, again, even with the use of LLMs, you need to pay very close attention
to technical debt,
because, again, if you are writing some project from scratch,
or a small project that you already have, which needs to be completed,
AI can help you finish it quite well,
but AI does not monitor technical debt, meaning it does not keep track of the interoperability of that project,
and it may insert some workarounds on its own that work now,
but could negatively impact the maintainability of the project later on.
And you need to be very careful about what it writes.
Just as you need to pay close attention to what you write or any...
maybe junior developers on your team, you also need to pay very close attention to what AI writes,
because it has no measurement of entropy, and over time, what is written,
it may happen that it will simply be very difficult to maintain, including the AI itself.
Because AI, as if it also wanted to say, is also a person, but of course, it is not a person,
but also, like a worker, it finds it difficult to work with code that has a lot of workarounds,
So it is your responsibility, so to speak, to keep the project's entropy at a low level.
What difference does it make how many workarounds we have in the tests?
Well, tests are tests, let it write them poorly, let there be many of them,
Let them be convoluted, as long as they cover the functionality, right?
I would say that the tests, the test code, or the quality of the test code may not be as important
as the quality of the production code, but nevertheless, it is quite important, because the same
is code that you will have to deal with over time. If, for example, you have some bugs
that are identified in the project, meaning the test will reveal some bugs, or simply fail for some reason,
you will need to understand why the tests failed, meaning you will need to go through the tests,
look into it, figure out why they failed, and in order to spend less time maintaining these tests,
you need to maintain the quality of these tests, the quality of the test code at a good level,
just as you do with the production code of your project.
So here you need to keep an eye on the quality of any code, including tests,
otherwise, again, it will slow down your work, slow down the process of identifying false positives.
false positives from true positives, so the quality of any of your code is quite important.
I will quote your book, there is a phrase that says, "the end result for projects with bad tests or no tests is the same."
You expressed it more radically there, why now?
I would say that the end result is indeed the same, but I think, as far as I remember,
there was also a caveat, maybe before this phrase, that the end result will be the same,
but a project with bad tests will last a little longer than a project without tests,
meaning, the result... the same result, I mean, in the sense that both types of these projects,
they will become unmaintainable over time, just the moment when this...
point in time at which the project will cease to be maintainable is slightly different with tests or without tests.
Right now, there seems to be a trend with the increase in speed, the increase in the quality of containerization, and the increase in the number of frameworks,
a trend towards an increase in the number of integration tests and a decrease in the number of unit tests.
Programmers seem to say that unit tests are not necessary,
because we have everything covered by one big integration test or several,
which spin up our entire application in a container or a group of containers, where full integration of all functionality occurs,
and why should we test the details of these internal algorithms, which you refer to as accounting,
that calculate the sum of two numbers, if we are checking everything globally, and everything matches,
and in the end, transactions go through, and the total balance for the user is correct.
Yes, that's a great question, and it can actually be broken down into several themes.
First of all, answering the question of why not just use integration tests for everything,
integration tests are complex, more complex to write than unit tests, and they are more difficult to maintain,
they run slower than unit tests, and therefore you won't be able to, most likely,
cover all permutations of your project with only integration tests,
because the permutation, the higher we go up the stack, the more code there is in those layers,
that are covered by integration tests, and the more possible permutations there are within that code.
That's why you need a huge... there's a kind of combinatorial explosion happening there,
that you will never have enough of your integration tests to cover everything,
to cover all these possible permutations.
- What do you mean by permutation? - Well, for example...
- For example, give one. - Yes, well, let's take an example that...
I'm just trying to recall some example from my project, well, okay,
let's take the same accounting software, then.
Let's say there is some clever calculator inside it that needs to be tested,
and this calculator is called from, say, two other systems or subsystems,
two other modules.
To test all of this with integration tests, we need to test, it turns out,
together with one system that uses it, a subsystem,
and together with another subsystem that uses the same calculator.
And it may happen that the first subsystem passes some of its parameters to this calculator,
and we need to test in different combinations of these parameters.
And do the same with different combinations of the second subsystem.
At the same time, these subsystems using the calculator may have their own internal state,
which we also need to take into account in these integration tests.
So all of this adds up to various use cases,
that is, to different states that your system can take on,
and all of this needs to be considered in these integration tests.
I’m not sure if my answer makes sense, but in general, the more code you have,
the more possible states in your system that need to be tested will be covered by your tests.
And if we take these states from different parts of your system,
they multiply, meaning that if, for example, the calculator itself can have 10 possible states,
and you only need to test those 10 states, then if the first subsystem also has another 10 states,
then you need to multiply 10 by 10, and you will already have 100 possible states that need to be tested against each other.
And this is very difficult to do with integration tests, especially,
because, as a rule, if you are integrating and testing the entire system,
then your test may involve some API calls through the Network,
and may even access the file system or the database.
It is practically impossible to do all of this with integration tests.
You need to isolate certain parts of your system that will contain pure logic,
meaning they will contain pure logic without side effects,
which you can test with unit tests. As a rule, this should be complex logic,
meaning there is no point in isolating some simple logic, I don’t know, if we are talking about C#,
then some simple properties with getters and setters are not worth unit testing or isolating into separate methods.
If you are testing the logic of a calculation, then this method for calculating something,
makes sense to isolate. In the book, I referred to this, but I don’t remember what I called it in the Russian version,
but in the English version, it was like this: your code can be either tall or wide,
meaning it can either be literally spiral, either high or wide, and I used this as a metaphor,
so if the code is tall, it means it has high complexity. If it is wide, it has many dependencies.
And you need to ensure that your code is either tall or wide,
but not both at the same time, because otherwise it will be difficult to test,
because it is hard to test code that has many dependencies, but at the same time, code with high complexity,
with significant complexity in calculations, needs to be tested,
because, well, this code has the highest probability of bugs occurring,
meaning you need to separate this code into complex code and code with a large number of dependencies,
and this complex code, provided it has a small number of dependencies,
will be easy to test with unit tests. As a guideline that I recommend,
it is to have a large number of unit tests that test these pure functions,
but also to have a set of integration tests that will cover
the entire flow, but without permutations, because, again, testing permutations
is quite difficult with integration tests. However, it is important to understand
that you need to see that your integration system works in conjunction with its subsystems,
so, again, I think many are familiar with this concept of the test pyramid,
this is exactly how it works, meaning you should have a large number of unit tests,
that test small parts, and again, these parts should be meaningful,
meaning you shouldn't test some simple things that have a low probability of errors occurring,
you should also have a smaller number of integration tests,
that check your entire system together, to understand that not only do the parts work,
but the whole system as a whole also works correctly.
Programmers often complain that with a large number of tests,
especially with a combination of integration and unit tests, as you just described,
every bug, every error leads to the failure of several tests at once,
some of which are unit tests, some are integration tests, and it seems to complicate the work,
because it's unclear what to tackle and which of these tests to fix,
and which of them turned red fairly, and which turned red by mistake.
And programmers say, it's better for me to have one test fail, one turn red,
I will understand where to fix it, than to have tests overlapping two, three, or even fifteen times the same thing,
as you just described, if you write microtests for microfunctionality,
then the next level is slightly more complex, and then another, and so they become integration tests, higher and higher up the pyramid,
it turns out you cover that very simple function of adding two numbers in accounting software,
covering it fifteen times with many levels, and when there's a defect in it, as I mentioned, ten tests fail at once.
What should we say to such programmers?
Yes, well, some duplication of such coverage is inevitable, especially if you are using unit integration tests,
what you should do, in fact, because it's a good practice, it's impossible to avoid this,
and on one hand, it seems like a downside, but on the other hand, if your functionality is important,
and if it affects a large number of other subsystems, then this is actually a good signal,
that when an error occurs in this functionality, a large number of tests start to fail.
It's a good signal in the sense that you are catching an error in a critically important part of your software.
So, I would rather evaluate it as a signal rather than a drawback.
There's also the question of what exactly we mean by integration tests,
because here we are already delving into the difference between the London testing school and the classical one,
because this argument was also made by proponents of the London school,
which... well, a little background on what the difference is between classical and London testing.
The London school is a school that advocates for using more mocks,
including to break dependencies.
between classes, even if the dependencies are in-progress, in-process.
That is, not just, for example, replacing databases or some external API systems
with mocks, but also the classes of your domain model, meaning the classes that
are fully in memory, have no side effects, and relate
to some out-of-process dependencies.
On the other hand, the classical school says that
mocks can be used, but only for such external dependencies.
That is, out-of-process dependencies, as I called them in the book.
So, the proponents of the London school say that, OK,
if we do it this way, then even if we introduce an error in one component,
and our unit tests are testing, for example, two classes, the tests
covering the second class, which depends on the first class, will also fail.
So, my answer to this question is the same: it's a good signal,
we need to assess it as a good signal that this code is better not to introduce errors into,
because a lot of other code depends on this code.
So, again, I return to the topic of the difference between these schools,
I am more of a supporter of the classical school, meaning I consider unit tests to be tests
that do not involve external systems, meaning they have no I/O dependencies.
So, in the London school, such tests will also be considered integration tests,
at least some of them will be considered integration tests,
even if they do not involve any databases or external systems,
even if you are just testing two mutable classes, it will be considered integration tests.
Well, again, I am a supporter of the classical definition of unit tests.
However, or maybe it's not "however," you have a quote in your book,
you say, I will translate, "tests that work directly with the file system
do not fall under the definition of unit tests." So you are essentially suggesting
to mock the file system, including file opening, closing, and reading operations.
It's not necessarily a specific recommendation, but we just need to differentiate the categorization of tests.
and how I would recommend dealing with these tests.
So yes, categorization-wise, if we are talking about categorization, such tests will not be considered unit tests,
they will be considered integration tests.
However, I would still recommend not using mocks for the file system.
I would recommend using, well, accessing the file system directly.
That is, the tests will call code that will directly interact with the file system.
So, why is that? In the book, I made a distinction between different out-of-process dependencies.
There are, I called the first type of these dependencies managed dependencies and unmanaged dependencies.
Managed dependencies are those dependencies that are only accessible to your code and are not accessible to external systems.
This usually refers to the file system and databases, meaning that external systems do not have access to the file system,
with which your code works or to the database with which your code works.
In this case, the database or file system can be considered part of your system,
because it becomes a black box for external clients.
And integration tests should also cover these components.
That is, if you are writing an integration test, it should work with a real database,
with a real file system in order to check how this entire black box works as a whole.
Otherwise, if we mock the file system or the database, it usually leads to...
first, worse coverage, because you won't be able to see how exactly your code,
how successfully your code integrates with the database.
And the second reason is that it deteriorates... one of the properties of unit tests is resilience to refactoring.
That is, you will have more false positives, because, for example, you might change
the pattern of working with the database, for example, refactor a table,
to change the SQL that you send to the database.
But at the same time, you can do it in such a way that you can do it without changing the observable behavior of your system.
That is, if your system is using mocks, you will need to change the mocks that work with the database,
every time you change the pattern or communication pattern between your system, your code, and that database.
Well, I mean, returning to the original question, I would say that if it comes to doing...
if we are talking about managing dependencies, such as file systems or databases,
I would recommend not using mocks for them.
I would recommend using mocks only for those systems that are accessible not only by your system but also by other systems.
For example, a typical example is some kind of message bus or external API systems, like payment systems, Stripe, and others.
But not databases and not file systems.
Then, in fact, all tests or the overwhelming majority will fall into the category of integration tests.
Yes, that's right. And that's good because it increases coverage.
Again, you need to check not only how your application works in isolation from external systems,
but also how it works in integration with those systems. This is essentially what the name of integration tests refers to, in my opinion.
The London school holds a different opinion.
But I adhere to the opinion that integration tests should check
how your application works in integration with other systems, including external ones.
Yes, most of your integration tests will work directly with the database,
because, as a rule, the biggest external dependency for your system is the database.
Other external systems are generally not as prevalent,
meaning there aren't as many of them, and the interactions with them are not as frequent as with the database.
So yes, I agree.
And in the book, you do not recommend using such auxiliary in-memory databases,
which replace real databases, but are not mocks,
but rather fully functional SQL databases.
Yes, exactly. And that's because any substitutes for your database degrade the coverage of your tests.
Again, because if you switch, for example, from a real database like Oracle, Postgres, or SQL,
to some SQLite, and test your code with SQLite instead of Oracle, Postgres, or MySQL,
then this will lead to your tests potentially producing either false positives or false negatives,
because, well, the functionality of MySQL is not equivalent to the functionality of Postgres on a one-to-one basis,
and some things that work with Postgres will not work with MySQL,
And perhaps even the other way around, some things that work with MySQL will not work with Postgres.
That is, the coverage will not be one-to-one, and in any case, even if you write tests that work with MySQL,
you will still need to check how your system works with Postgres, perhaps manually,
just to ensure that the new release works properly with the real database.
Just to avoid the extra work of manual testing,
I recommend writing your tests in such a way that they work with the actual version of your database,
that is, the same version that you are using in production.
Very often, what we see in real projects is a locally installed database,
which is used for testing and is accessed by all programmers.
Each programmer has their own login password, and they even share the same credentials for access.
They all work with this database, all their tests are tied to the database,
and when the product goes to production, it has its own database.
When you join this team, you are immediately given access to their specific local database,
and you have to test with it. How do you like that setup? Seems right to you?
Yes, it makes sense to keep in mind that it is shared among all programmers, right?
There is some common one on the server in the office, yes, and it has the same or almost the same setup as in production.
Yes, I have often found myself in such a situation too. It's not a very good situation, let's say.
Because running tests automatically on such a database won't work,
simply because you will interfere with other programmers or testers,
or even other programmers who are running these tests.
In other words, for your integration tests to work successfully,
you need to have one database, well, one server for each programmer.
Nowadays, it's quite easy to do.
Many databases you can just download and set up, run locally through Docker,
and work with that database, so there shouldn't be any problems with this in modern times.
A shared database, as you described, can be OK for manual testing.
Well, usually, QA and Dev environments also have one shared database, which is normal.
But for local runs, it's still necessary for each programmer to have their own copy of the database,
otherwise, you simply won't be able to run your integration tests properly.
It should either be a static copy, or it should be created from scratch with each build,
filled with data, and deleted at the end of the build?
Yes, that's a somewhat controversial question.
In my opinion, creating a new database instance every time you run it
would be ideal, but it works quite slowly even in Docker.
Therefore, I would recommend maintaining the same local database.
That is, you can just download the image, run it, and run it in Docker each time.
This way, it will just be one instance that you will always use for your integration tests.
and not recreate it from scratch every time.
Another reason is that you not only need to create the database,
but also need to keep its schema up to date.
This is another big topic on how best to do this.
But it's easier to do when you have just one instance, when you need to apply scripts
to upgrade your schema on the same instance of that database.
Then it will be easier to keep this database up to date,
and you won't have to wait for it to be initialized.
I've especially seen some teams that initialize a new database for every test run.
They even teardown the old database and create a new one between tests.
It works very slowly. I wouldn't recommend such an approach.
Tests start to run very slowly.
My advice is to just use the same database.
These are your databases that will run locally.
It won't be a problem if you always reuse the same instance.
Since we're talking about speed, is it really important for tests to run quickly?
And what does "quickly" mean? What are the criteria for test speed?
Should it be seconds, minutes, what do you think?
Yes, it's important because the faster the tests run, the quicker you get feedback,
if something goes wrong with your code, and the more often you can run the tests.
If your tests, including integration tests, run very slowly,
For example, if the entire test suite takes half an hour, then naturally you won't run the entire test suite every time you make a change.
At best, you should run it when you plan to commit a PR or make some significant change.
This is not a very good situation because the more often you...
The earlier you run the tests, the more often you get feedback, and the less you...
...you waste time moving in the wrong direction. If you see that...
I'm here for an interview, so you can just stop and start fixing it right away if you...
...and at the same time, adding more and more code on top of this bug will make it even more difficult later on.
...you have to revert and change everything, and so on. So, yes, the first problem is that...
...it will take more time to move in the wrong direction, but secondly...
The second problem, again, is that you simply won't run these tests as often as you should.
In an ideal case, you would run the tests, and at the same time, some tests would just be executed, for example...
When creating a new PR instead of running them locally, well, again...
...this is probably still a component of the first problem, that again, you can write a lot...
...of code and you'll find out that some things are written incorrectly only after you've already created the PR instead of...
...to identify this problem earlier while you are still programming. Last week we...
...we worked with the Apache Kafka source code, and we managed to run all the tests completely in...
...two hours on a machine with 30 processors; it was a very large machine we had, and everything ran in parallel.
The compilation plus testing took us about two hours for around 900,000 lines of Java code. That's how it was.
...you would have to live with this. Imagine you came to such a team and you were invited as a consultant to...
You did, yes, but again, I’ll tell you, there are Java files, and of course, there are many modules inside.
There are some modules that contain 2000 Java files, 3000 Java files, and accordingly, tests as well.
We are multiplying there simply tens of thousands of tests, these test methods, they are mostly unit tests, including.
...integration tests. Well, we didn't delve into it that deeply, but I assume these are unit tests because...
...they are run with the command `gradle gradle test`, and there are no integration tests, no nothing.
I think there are also tests of the next level. It's strange that they take so long to run because...
If these are unit tests, then even if there are thousands of them, they should run very quickly.
I have a suspicion that they are probably setting up some kind of infrastructure, maybe even...
...they are referring to this tour, which is why everything works so slowly. But of course, I haven't looked into the project, so...
I can't say for sure, but Docker wasn't needed, meaning there was no Docker on the server, so that's all...
Without containers and any other infrastructural dependencies, just a clean server, simply a server.
And Java is installed on it. Next, we clone Kafka there; maybe it somehow started itself, and then...
...in the tests, the invocation might call itself. Is that possible? I just think that, well, Docker should have been used for everything.
The operating system doesn't care about the server; it should have been there. They would have called it, and they would have told us otherwise.
So, yes, it's strange. What to do with such a huge amount of code, like 900 thousand lines, can you imagine?
How should one deal with this and how to test it? Imagine such a situation, yes, I wouldn't say that...
I would have some good solution here because my experience is mostly in business.
...line applications where there is just such a depression of applications with a standard API that is standardly...
...it refers to the database and other things, so here we are more involved in the infrastructure, for example...
It's the same database, the same PostgreSQL database. I'm sure it has a lot as well.
...tests, and it's possible that it also runs quite slowly. It's not a fact that there's a way to optimize it here.
...to improve the situation, just because the project itself is large, and yet again, it's strange.
Why do unit tests run so long? This is such a thing, but let's say, for example, with some RAM, for instance.
The same Hibernate, I know that there are many tests that work directly with databases.
But just because there's no point in mocking ORM tests that test the ORM itself.
...because the essence of the ORM is to work with databases, so all the tests are either integration tests or...
Almost all tests, again, it's hard to suggest anything other than maybe...
...standard advice is to break tests into several suites that are more...
...critical tests should be run regularly, including by local developers, while some less critical tests can be run...
...let's say, for this and some parts of them, for example, on each PR. So this is the only...
...option that could be suggested here. But again, enterprise development is...
...quite different from developing components like libraries or large...
...components and databases or Kafka, so the set of advice here is also quite different.
Overall, how do you feel about modularization or breaking a project into smaller...
...pieces? For example, Kafka could be split into 20 repositories, each of which could be built in 5...
...minutes. Yes, that might help here. And in enterprise, is this good or bad? There are just advocates...
...for both approaches. There are people who support monolithic repositories, insisting that everything should be built in one...
...place, even if it takes three hours, but we want the build to come from a single point. There are other...
...opinions that are against this and say, let's break it down into 20 microservices...
...or 20 libraries, each with its own lifecycle, its own configurator, and its own...
...configuration, unit tests, with its own build, and then there will be some central unit...
...integration tester that will assemble 20 microservices and also ensure in just a few minutes...
...that they complement each other. Yes, this is another big debate. In general, on one hand...
...we have a monolithic approach on one end, and on the other end of the spectrum, we have such...
...microservices and really micro microservices, which are just small pieces of code. They may not be pieces, but...
...small pieces of code, and everything integrates with each other through the network via API calls, and well, here...
...we need to maintain a balance, just like in many aspects of development, we need some kind of balance.
...to maintain, that is, both ends of the spectrum, I believe, is not very correct.
...that is, in my opinion, you should start with a monolith because until you begin development, until...
...you have little code, it will actually be unclear to you where exactly it is better to set some boundaries.
...your project, therefore, to understand these boundaries at all, you need to start the project as a single whole.
At the same time, it is not necessary for this monolith to be poorly structured; that is, you can...
...maintain a modular monolith, as they say. That is, within this project, even if it is a monolith...
...it can still be well-structured, meaning it can be divided into projects, and the compilation, in theory...
...can also be quite fast if you have properly divided this project. For example, if you...
...are working with some component, which, let's say, you have allocated to one core.
...the project is in one package, and then the parts that use the core are also allocated into separate ones.
...package or projects. If you are working only in one package and in one project and do not...
...touching the core, then the compilation will also be quite fast because it will be...
...compiles only one package, it won't compile all the others. So even if the project itself...
...it is large, if you approach the boundaries within this project correctly, then there will be no problems with...
...there shouldn't be any issues with compilation, up to a certain limit, because if your project is all...
...grows and grows, then after a while, it simply becomes impractical to manage everything in the same...
...repository, it can even be in the same project, which, well, rather yes, I would...
...you have divided this into separate repositories. So if you are starting to transition from...
...from a monolith to microservices, I would recommend having each microservice in a separate...
...Git repositories, simply because there will be too many overlaps between these microservices.
So my recommendation starts with a modular monolith and transitioning to microservices only if...
...the size of this modular monolith starts to become too large, and if you have such clear boundaries...
...start to overlap, for example, if you have a team that is always working, you need to start...
...projects, another team is always working on a different part of the project, and there is simply no point in having them all in one...
...team, it makes no sense for them all to be in one team and work with one repository.
...can be divided in this way, but again, even in this case, I would recommend that these microservices...
...make them quite large, meaning they won't be exactly micro, but rather just services. So I...
...I would recommend adopting an approach where each microservice contains one bounded context. This...
...in terms of where you have not just some service within this microservice, but an entire bounded context.
...context, then in your project within the company, there won't be hundreds or even dozens of those microservices.
...but there may be 3 or 5 of these large microservices, which would be more appropriate to call services.
...that communicate with each other. Otherwise, if we adhere to such a standard...
...microservices and architecture, we encounter another pattern called distributed monolith.
...when you essentially have a monolith, but you have broken it down into several microservices, and simply...
...replaced the internal method calls that used to work simply within the same process.
...you have now divided it into small pieces, meaning instead of just calling those methods internally...
...you have now separated them and are calling them over the network, which is very incorrect. So here, it's like...
...you need to look for a good indicator that you have correctly divided your system into microservices, which is...
...whether you deploy these microservices together or you can deploy them individually if you...
...you can deploy your service separately from all the others, that means it's a good sign, which is more likely...
...you have defined the boundaries correctly, so that there is not a high level of coupling between your...
...services. If, on the other hand, you have to deploy one service together with other services, then this...
...indicates that the coupling is high and your boundaries are incorrect, and most likely several of these...
...services should have been one service, and you shouldn't have had to create a test service at any point.
...not web services for production, but additional services, additional whole modules into which...
...you would test the main ones. Maybe if you provide an example, I can then give a more precise answer. Well...
Let's say you have a web service that also performs accounting tasks, for example, calculating...
...the salary upon request, and we deployed it. We wrote an integration test for it, but that's all...
It worked for us locally, in one way or another, integration tests are run in a container.
...are run on the test service, and then when we deployed it to production, we don't know the outcome.
...what we got, does it really perform the functionality that it should, and how does it...
...there, to test it, it doesn't run, so again, we have to manually send people there, and we don't expect that clients...
...will test it. Maybe we can run some kind of robot that will go through...
...in the production system and there it will click around. In this case, I would recommend just writing some...
...smoke tests that will cover the most important parts of your system, meaning they...
...will emulate users and will use the same interface that your users use.
So again, it's like manual testing but through some kind of robot.
There are also such options; I've seen people create some additional endpoints within the system...
...so that testing can be done not only through the UI but also through API calls.
For example, part of the tests could go through the UI...
...and then, to understand the internal state of the system, at the end of the test, for instance,
...we send API calls to check if the internal state of the system is correct...
...after executing the test steps.
This is a less clean approach; if possible, I would still advise against doing it this way...
...and just rely on the UI, because that is the most stable and reliable option...
...when the question is posed on the UI, as that is what the end user interacts with.
So if there are any issues with the UI, then any, well, most test failures...
...that arise in this regard will be relevant, meaning all your test failures...
...will likely indicate that something has happened either within the system or with the UI itself.
Some additional APIs, well, such additional holes...
...essentially in your application, which you create just to check the state of the system...
...are less, let's say, clean in the sense that you should have the ability...
...to check that state through the UI itself. Sometimes, this can be quite a complicated task.
Just because this state is scattered across different components of the UI...
...and sometimes you have to add some of these, so to speak, technical API endpoints...
...to quickly understand the internal state of your system.
So in this regard, you can reach some compromise if otherwise it is quite simple to do.
But again, on the other hand, the number of these smoke tests should not be too large...
...because your smoke tests should simply check that the system is working...
...since everything else should have been verified by your unit and integration tests beforehand.
Ideally, that's how it should be, but you know for sure that no matter how much you test, bugs still appear...
...regardless of the coverage, no matter how beautifully you write the tests...
...and no matter how much they guarantee stability, things happen differently in production.
And many people criticize tests in principle for this reason.
They say, guys, you spend time on testing, you write your integration tests for hours, or even weeks...
...but when it all goes to production, the reality is that there are still errors.
How to convince them otherwise? Yes, I've heard such feedback mainly about unit tests.
That is, people say that we write a large number of unit tests here...
...but they essentially check nothing, because yes, the tests may be green...
...but when it all goes into production, you still have a large number of bugs.
I would respond to this by saying that the unit tests are simply written incorrectly...
...because this usually indicates that your unit tests either check too little code...
...meaning your unit tests have low coverage, giving you little protection against bugs.
So, another common feedback that people usually give about unit tests...
...is that they fail too often, which is the opposite situation.
The first situation is when they do not detect bugs in the system, meaning they are green...
...but there are still bugs in the system. The second opposite situation is when they fail the test, but there are no bugs in the system.
So, the first situation is a false negative, and the second situation is false positives.
False positives are also quite a common problem, where your tests fail...
...but the system is working well, and this leads to you simply losing trust in your tests.
You just see that they fail for no reason, maybe due to flaky tests...
...or perhaps because internal implementation details have changed...
...without any change in the external behavior of your system.
So, you start to lose trust in the tests over time, and even if they later begin to detect real bugs...
...you just skip them, mute them, or ignore them. This is also a bad situation.
Both situations, false negatives and false positives, are problematic.
Well, one could say that false negatives are a more serious issue...
...when your tests do not detect anything at all, compared to false positives.
But I would say that they are roughly equal in terms of severity.
So, it’s important to pay more attention to the quality of the tests because the first situation...
...with false negatives indicates that your tests do not cover enough...
I wouldn’t say it’s insufficient coverage.
I would say it’s coverage that does not necessarily cover your business logic.
So, you might have high coverage, but that coverage itself may be...
...of parts of the system that are not critical for its operation, which should be covered...
...or at least you should try to cover. First and foremost, these are the most critical parts of the system.
This usually includes business logic and certain algorithms within the system that are, firstly, complex...
...and secondly, due to their complexity, they tend to have the highest number of bugs.
This is to eliminate false negatives, where your tests are green, but there is a bug in the system...
...meaning you need to increase coverage of the critical parts of your system.
To eliminate false positives, which occur when your tests fail but the system is functioning...
...you need to strive to write unit tests in such a way that they do not test implementation details.
So, why do these false positives usually happen?
This is because you tie your tests to implementation details...
...which tend to change frequently. However, you can, for example, perform refactoring...
...and during this refactoring, you change the implementation details...
...without changing the external behavior of the system itself.
This is essentially the definition of refactoring, where you change some internal parts without altering the behavior itself.
So, you need to look at what exactly your tests, your unit tests, are testing.
If they are testing the implementation details, then with every change to those implementation details...
...your tests will also fail because they are tied to the implementation details.
You need to ensure that the tests are testing the observable behavior...
...that is, the external behavior of your system, and not these implementation details.
I'm just returning to the conversation about databases, about testing databases...
...which we just discussed. Working with a database is precisely such an implementation detail...
...where you don't need to check how exactly your code interacts with the database.
You need to check the final result of this communication between your code and the database.
In other words, you should look at the state, at the data that ends up inside this database...
...after your system has completed its work.
Not at what queries I send to the database.
Yes, exactly. The queries can be anything.
So, SQL does not need to be checked here. This is the biggest anti-pattern you can create...
...unless, of course, you are testing some ORM like Hibernate.
If we're talking about Enterprise Level Applications, then you definitely don't need to check...
...what SQL you send to the database. You need to look at the state of the data in your database...
...after you have sent that SQL.
In other words, we should essentially blame the developers for the bugs that exist, and for the tests that exist.
A few qualifications.
Developers are always to blame. If the code doesn't work, the developers are to blame.
If the tests are poor, then the developers are to blame. There are really no other options.
And there's a question that I don't have an answer to, which is why I'm very interested.
So, a pull request comes to you. You do a code review. Everything seems to be green.
All previous tests are also green. Nothing is broken.
And some tests were added there. You see that the functionality has changed,
and the person added a few tests. How can you determine if this is sufficient?
Are these green indicators on these three new tests and on all previous tests enough
to accept the pull request? Yes, that's a question. It's actually hard to understand right away.
Because in order to assess whether the coverage by these tests is sufficient,
you need to understand the domain model itself. That is, what was changed, added, and so on in the PR.
Without this understanding, it's difficult to evaluate, firstly, the adequacy of the test coverage,
and secondly, how well you covered all the permutations that are possible in the new functionality.
One option is, of course, to rely on test coverage here.
Because test coverage, despite the fact that this metric alone is insufficient,
it nevertheless allows you to assess the trend in the project, especially if you have a breakdown in the project
into the domain model, business logic, and those parts that are more infrastructural,
which may not significantly affect the behavior of the project.
So, if you have a separate project or package with the domain model,
it is good to evaluate the test coverage specifically of that separate part of your project
and pay attention, first and foremost, to the coverage of that part of the project.
For example, if the change in the PR in your case was related to the domain model,
then you need to pay special attention to this.
You can also pay attention to test coverage; if, for instance, it has changed,
and the percentage of coverage has decreased, then you can start digging deeper here,
why, for example, certain things were not covered by tests.
Again, this metric is not a panacea because it simply assesses the number of lines
that were covered by tests, or the number of branches that were covered by tests.
It does not evaluate how well these tests were written.
However, in combination with other factors, such as just code review
of these tests, to understand how well they are written in conjunction with this test coverage metric,
they usually provide a more or less normal understanding of whether, first of all, there are enough tests
in terms of quantity, and secondly, if you look at the PR itself, at the code of these tests,
you can understand how well this code is written.
So it gives a more or less approximate good assessment, I think.
And here’s another question that I also don’t know the answer to.
We talked about databases as external dependencies, we talked about Kafka, we talked about buses,
whatever they are called, event-based systems, file systems, but there is another dependency
that is becoming very popular now, which is large language models.
They are essentially the same as databases; we also access them through some API,
but they are not deterministic in their behavior.
If we are confident that a database will respond the same way in the next run of the same test,
then an LLM will respond differently each time.
And now there is a big question among programmers about how to test them.
We can test everything else, but here we have a dependency that is, by definition, flaky.
Is the LLM local or external?
Well, it can be external, but we can mock it, set up some local version.
Well, not mock, but create a local double, which will also behave unpredictably.
Yes, actually, I haven’t worked, honestly, with projects that directly interact with LLMs,
that is, it’s hard for me to say; I’ve worked with IAM in terms of users,
but I haven’t worked as a programmer integrating with that IAM.
So...
Let’s fantasize, because I also have limited experience, insignificant.
And we always just mock them strictly; we simply put a stub in their place that always responds the same.
But it seems that this is somewhat rough.
Yes, maybe you can give an example of what exactly this LLM is used for?
Well, for example, you have a service that you proposed yourself, as an example.
We take an HTML page and convert it into a PDF page.
And then, for instance, we send the PDF to the LLM, and it recognizes what is written there.
And gives us the English text.
Not just that, it doesn’t just provide a translation, not just word for word,
but it reads our 12-page document and outputs one paragraph...
A summary of sorts.
What it understood. Yes, a summary, that’s the kind of service we should...
Yes.
And every time...
You will give it the same PDF file, and it will produce a different paragraph of text each time.
Yes, I don’t think there will be the same output for that PDF.
Yes, I don’t think there are any options here other than to mock it, unfortunately, or to stabilize it.
Just roughly set it up to respond in one way.
Yes, because it’s such a dependency that, as you said, it’s flaky, first of all.
Secondly, by definition, it’s an unmanaged dependency because it’s accessible not only by your system but also by other systems.
Right, yes, I think the only option here is to mock it.
If you need to test the integration of your system with this LLM, I don’t think there are any other options besides doing it manually.
That is, maybe the developers of this system can write an integration test to understand how well it parses these PDFs, for example.
Right, because, again, the parsing here will likely consist of two parts.
First, there’s the parsing of the text itself, and second, there’s the summarization.
That is, the summarization might not be testable, again, because it’s such a creative process.
But the parsing is more or less deterministic.
So, at least for the first part, they could write tests.
But again, as users of this service, we can’t do anything here because it’s a complete black box for us.
The only option for us is to test it manually, to make sure that, well, everything is working more or less fine.
This service responds adequately to the PDFs we send it.
And then we just have to trust this service; if any problems arise in the future,
we will again have to test it manually. I don’t think there are any other options. Unfortunately.
There is an opinion that the trend is generally moving towards replacing deterministic software modules with neural networks.
Roughly speaking, right now you have a module that takes HTML and quite deterministically, using Java and algorithms, converts it into PDF.
But tomorrow, it might be some kind of black box where we won’t understand what’s inside.
It will simply be able to take HTML as input and, through some magical means, output a PDF.
And such modules will start to exist throughout our system.
This won’t be some exotic resource, like in my example, where an LLM is somewhere and we occasionally refer to it.
Instead, these will be fully functional modules within our software product, and there will be more and more of them.
Ultimately, programmers will just connect these black boxes together and say, take HTML from here, send it there,
then from there, increase the salaries of these employees based on the information in the PDF file.
Then, after increasing the salaries, you’ll make a backup of the database, I don’t know how, but do it and store it in Amazon S3 Storage.
I don’t know how, you’ll take the passwords from here. And from such unclear, fuzzy components, everything will somehow work.
And here the question arises, how...
Yes, this doesn’t actually differ much from the situation we have now.
Because even now we have such non-deterministic dependencies, for example, the random number generator.
Or, I don’t know, taking the current datetime. These are simple examples, but they are, again, quite similar, actually.
So, there will always be some parts in the application that are deterministic.
Again, take, for example, the input, not the output from the LLM, and place it somewhere, I don’t know, as I said,
to increase the salaries of certain employees based on the output of this LLM.
So here, it will be possible to test the deterministic part,
which relates to the fact that once we have received this output,
We need to check that we recorded this output in our database, and that it was recorded correctly afterward.
This part can already be tested normally, using, so to speak, classical methods.
So, again, we simply replace the parts related to the LLM with MOKs.
Test doubles return some test data to us, and we check that after we received this output,
we are working with it correctly.
So we will actually, just like now, test the output, test the result, yes, how it will be received?
Yes, yes, exactly. So, again, the random number generator is a black box for us.
We cannot influence it in any way; we can only mock it.
Similarly, obtaining the current datetime is also a black box for us.
We will just stub them; our stub will return some test output, and we will simply check how we work with this output.
Yes, well, now a couple of general questions. I'm curious, do you have any plans for new books?
Because the last one I read was from 2019.
Yes, yes. Unfortunately, I don't have any plans. I have actually stepped back, so to speak, from blogging.
I wrote my last article on my blog a long time ago, maybe two or three years ago.
No, I have topics, but first of all, I don't have the time, and secondly, I realized for myself that I am actually not a public person.
I had a bit of a soul-searching experience, maybe a couple of years ago,
where I realized that I never liked being a public figure, actually.
All the podcasts I participated in before, plural-site courses, and the book, including the blog as well,
I did as part of my growth as a programmer, as an individual contributor, as an IC.
But I never liked being in the public eye, so to speak.
And I thought, well, why force myself, so now I have kind of stepped back from it.
I have more plans for internal projects, meaning I would like to dedicate more time to development.
Especially now that we have such a good assistant in the form of AI.
I have many plans to start developing more of my own projects.
But I probably don't plan to return to the public eye anytime soon, and I also don't plan to write any books.
I often get asked about this.
In fact, writing a book, I think you know, since you have your own book too.
Writing a book is a very labor-intensive process.
It took me almost a year, and that was quite a fast process; for many, it takes much longer.
For example, there is Roy Ocharov, who also recently published his book...
Well, not too long ago, maybe a year or two ago, he released a new edition.
He started writing his book at the same time he began writing his book "I."
That was in 2018-2019, I don't remember exactly.
But his book came out maybe a year or two ago, meaning he worked on it for about four years.
Long, yes, yes. I understand he might have had some struggles.
In general, this process is not quick, and usually, no one goes through it more than once,
because, well, the ratio of effort expended to output is not very high.
It can be justified the first time, because, well, when you have a book, it's like, oh, great!
Such credentials are essentially a big business card, as they say in English, a 300-page visit card.
So, writing more than one book just doesn't make sense in that regard.
There you go. If there is some passion, then maybe it would have been worth writing it.
Well, for example, I had thoughts about writing about DDD, because those are the two main topics,
that interest me — unit testing and domain-driven design.
But now there are quite a few books already written about DDD, including one by Vladik Khonenov.
I haven't actually read the book myself, unfortunately, but I've heard a lot of good things about it,
So I don't think there's any point for me to go into that era, so to speak.
And, well, simply because too much has already been written by other authors of good books.
So, in the field of testing, I felt that there were also a lot of books, but I have a unique perspective,
that would add some external... in the sense of new... new knowledge, so to speak, in relation to what has already been written.
Unfortunately, I don't have that kind of confidence in DDD, so most likely I will only have one book in the near future.
And your framework for C# that is related to functional programming? It's still alive, healthy, and developing, right?
Yes, it's a little library. It's called the C# Function Extensions No-Get Library. Yes, it's alive. I'm trying to maintain this library,
meaning I review some new PRs. Although I haven't published anything for several months now,
I try to check in at least once every few months to see what's going on.
Do you need help with it? Many C# programmers are watching us. Maybe we can promote it? Invite them over.
Yes, yes. Why not? Or are you managing on your own? I'm not particularly active there right now either.
Help is needed in the sense that many... well, not many, but sometimes people come in and ask questions.
I don't have time to answer those questions, so if there's an opportunity and desire to help with that, then you're welcome.
There are some things on the to-do list, in the issues, that would be good to plan out.
Well, in principle, the library is already quite stable and mature, so there isn't much development happening inside it.
Well, the library itself is quite small; there's nothing particularly special about it.
It just has result classes and a few classes to simplify development, like a basic value object, and maybe a couple of other classes.
It would be good to have support if someone is willing to answer questions from others who are familiar with the library.
But again, for that, you need to have some experience with the library as well.
In any case, you're welcome to the project; it's on GitHub. Maybe we can share the link. I would be happy about that.
Alright. Then that's it; I have no more questions. It was very interesting.
Yes. Thank you for the invitation. It was interesting to chat. This is my first podcast in a long time, so...
We were really looking forward to it. Many people asked me about it, and we waited and waited, and finally recorded it. I think it’s very interesting.
And most importantly, we will renew interest in your book. It’s been how many years, but...
Yes, by the way, I translated the book myself. It was important for me that the translation into Russian...
So how was it? I wrote everything in English, then it came to you, and then through Peter, I think, yes, Peter did it...
They bought the rights for the Russian translation, and I directly contacted them simply because it was important for me that the Russian translation was good,
because, as a rule, translations into Russian are all bad, because they are usually done by non-programmers.
At best, they are people who are somewhat familiar with the technical part, but still not programmers,
so they don’t know how to translate correctly. I remember, when I read some books in Russian,
you open a book in Russian, and it’s just poorly written. So I had to take the English original and read it,
which was not very pleasant, because back then I didn’t know English very well.
So it was important for me that the book was well-written in Russian, which is why I translated it myself,
and the Russian version is even better than the English one because I corrected some mistakes that were in the English version.
That's great. It’s like you wrote the book again. Well, not from scratch, but yes. That happened.
Back then, there were no LLMs to automatically translate. Yes, yes, there was nothing like that back then.
Well, I might not have written it from scratch because I was sent a translation that a translator did.
But the quality was such that I essentially had to rewrite every page after them.
Well, it’s still better than writing it from scratch because it saves some work.
Nevertheless, there was a lot of work. On each page of that translation, I had to make edits.
Well, we really love the book. I read it, not that I read it twice, but I read it very carefully and made a lot of notes.
Thank you.
There are many books on testing; I probably know about a dozen books on testing, but yours is definitely in the top 3.
Thank you, yes. That was exactly my goal, to bring in more, so to speak, new information.
That wasn’t in other books.
It worked out.
Well, that’s it, goodbye.
Thank you.
Yes, thank you.
Goodbye.
Ask follow-up questions or revisit key timestamps.
The video discusses the importance of unit testing and testing in general for programmers. It explains that tests act as a safety net, allowing developers to refactor code and maintain development speed over time. The discussion covers various aspects of testing, including the role of AI in test generation, the differences between unit and integration tests, strategies for managing test environments, the importance of test speed, and the challenges of testing non-deterministic components like LLMs. It also touches upon the debate between monolithic and microservice architectures and the nuances of different testing schools (London vs. Classical). The speaker emphasizes that while AI can be a powerful amplifier for programmers, it doesn't replace the need for human oversight and attention to code quality and technical debt. The conversation concludes with a discussion on the challenges of false positives and false negatives in testing and the importance of focusing on critical business logic.
Videos recently processed by our community