No More Seat Costs: Semaphore Plans Just Got Better!

Back to the list
Episode 52 · Jan 11, 2022 · 29:31

Jan Giacomelli on the Benefits of TDD

Feauring Jan Giacomelli, Staff Software Engineer at Ren Systems
Apple Podcasts Google Podcasts Spotify Youtube

In this podcast episode, I welcome Jan Giacomelli, Staff Software Engineer at Ren Systems, previously with typlessAPI. We talk about test-driven development, unit tests, flaky tests, and much more. Jan shares some exciting war stories from the trenches of testing, explains how he was able to reduce the feedback loop from 45 minutes to only 8 min, and advocates for TDD. Listen to our insightful conversation or read the transcript.

Key points:

  • The benefits of TDD and good test coverage
  • TDD at typless
  • TDD: patterns and anti-patterns
  • The problem of testability
  • Unit tests
  • Flaky tests

You can also get Semaphore Uncut on Apple PodcastsSpotifyGoogle PodcastsStitcher, and more.

Like this episode? Be sure to leave a ⭐️⭐️⭐️⭐️⭐️ review on the podcast player of your choice and share it with your friends.

Edited Transcript

Darko Fabijan (00:02):
Hello, and welcome to Semaphore Uncut, a podcast for developers about building great products. Today, I’m excited to welcome Jan Giacomelli. Jan, thank you so much for joining us.

Jan Giacomelli (00:12):
Yeah, thank you for inviting me.

Darko Fabijan (00:14):
Great. Please, just go ahead and introduce yourself.

Jan Giacomelli (00:18):
Yeah, my name is Jan, and I’m senior software engineer. I work mostly with Python in AWS for the last seven, eight years. I’m a huge advocate for test-driven development, and in the last years, I also did a lot of work building and optimizing CI/CD pipelines on GitLab.

I’m very proud that I managed to bring our pipelines from 45 minutes from push to production with included manual testing to around eight minutes without any manual testing. So everything just goes to production, every little change, as little as it is.And since then we hadn’t taken out of production any point.

I’m just starting a new role, so the story is beginning one more time. So I just set up a new GitLab pipeline. We did a migration from GitHub. And now our pipelines are running around 30 minutes, but I know where to start a journey to bring them below 10 minutes.

The Benefits of TDD and Good Test Coverage

Darko Fabijan (01:26):
Great. Yeah, fast feedback loop is super important. That has also been our obsession for almost a decade now, and great that you have such wonderful results. I mean, keeping it under 10 minutes is definitely super important.

To jump to the beginning, I think the vast majority of our listeners are on board with the idea of TDD and why testing is important. Maybe you can run us through couple of points that you see as most important in your day-to-day life and life of your teams. What does TDD and generally good test coverage bring?

Jan Giacomelli (02:09):
One very important thing for me is to develop in very small steps to introduce very small changes which can be integrated in a later steps to a successful and working solution that will solve the problems of our customers. So I’m a very busy guy, so my phone is ringing all the time, my emails are incoming all the time, Slack messages.

And yeah, I can mute them, but at the end of the day, if I need to collaborate with my team, I try to be available as much as possible. So one of the goals when I started programming as my living, as my full-time job, was to be interruptable, so to be able to switch context quickly, to not wait for something very long in order to see whether it is working or not, or just spending a lot of time figuring out where was I before someone asked me something.

Jan Giacomelli (03:08):
So yeah, test-driven development definitely helped in this matter because if you develop in very small steps and if my work before the interruption was 10 minutes long or 15 minutes long, it’s very likely that when I go back and I will open my IDE, I’ll be able to wrap my head around in a second because, “Oh yeah, I was doing that, and that test is still failing, so I needed to make sure it passed.”

TDD helps you to work in smaller iterations and be able to wrap your head around what you were working on before you got interrupted

And another great motivation for me was to have a system that’s working at all times. So I was really struggling because there were some bugs to fix or some new features to add, and I did some coding, I did some manual testing. But it took quite some time because I had to spin up local server and then user interface and then to click across application. And then I forgot to check some of the cases which were working before. I pushed that, someone reviewed that, which is probably in hurry, and then broken code went to at least staging if not even to production.

TDD at typless

Jan Giacomelli (04:13):
And when we started typless in 2017, that was our daily job to take down the production as soon as possible. Every change almost introduced us something breaking, and it was super frustrating.

So I started to research and to try to find a way to prevent this from happening and to spend the least amount of time possible on a problem, because it may seem that you develop fast if you don’t write tests, but then you push that, then you need to wait for someone to check that this is working and then it goes to production and someone realize, “Oh, this is not working as expected.”

And then you receive a bug fix. But yeah, that’s a new ticket. So we think, “Oh, I closed the previous one last week and now I have to deal with this one. This is actually the same.” And then you break something else because there is no behavior cover with tests because maybe there was a requirement to have a 100% of test coverage, but someone just added a certain not none for the assertions, and you actually broke something but tests are still green.

Jan Giacomelli (05:23):
So many of ways in situations in which you can find yourself. After five years of practicing test-driven development, it’s been a year since I can say, Okay, I’m 90% or even more faster by doing test-driven development than without it, because it’s just under my skin. That’s the way how I see things. That’s the way how I verify them. That’s the way how I think about things.

I’m 90% faster by doing test-driven development than without it

-Jan Giacomelli

Everything needs to be testable, whether it’s just a requirement and I need to be able to verify that this requirement can be satisfied because there may be some logic gaps and I don’t even need to write a test. I can just think about what the test would be and I can be like, “Oh my God, I can’t do that because this was lead to a race condition,” for example, or, “This user cannot be in status of deleting and onboarding at the same time. I either need to wait for onboarding to finish to start deleting, or I shouldn’t delete it because it’s onboarding,” and stuff like that.

Jan Giacomelli (06:29):
Yeah, it helped me a lot, but it was very frustrating, especially at the beginning, because I didn’t know how to write tests. There is also I a huge mess in resources about the terms, what is unit test, what is integration test, what is test double, what is mock and stuff like that.

So it was really hard to find a way to test behavior, to test things in a way that my test are resistant to refactoring, so that actually that’s just desired behavior, not every little function or a method in my system, because it doesn’t make sense, I need rewrite it or split it into three, or I will find some abstraction and I can introduce a new class, a new function, a new module, whatever. Doesn’t matter.

I want to make sure that my business rules are still followed, that my client can still, for example, delete all of its data because it’s required by GDPR. It’s a functionality that must not be broken at any point in time because I can pay a huge fine if I don’t meet those requirements and stuff like that.

TDD: patterns and anti-patterns

Darko Fabijan (08:16):
Something that you touched upon and I really want to ask more about that is you spent a decade in Python and helped a lot of engineers also onboard them into the world of TDD and generally writing applications by having them well tested by engineers who are also developing them. And you mentioned struggling with resource and terminologies and all that.

So it would be great if you could give us some patterns that you have seen, and anti-patterns, with a general approach of onboarding people to the whole concept, but also maybe once they are living with that for some number of years, some patterns and anti-patterns that you have seen that work in practice.

Jan Giacomelli (09:01):
Yeah, so one thing that’s very common in Python is that when the new functionality is required, then is only one if statement or one else statement added to an already completely too long function, which may take 400 lines of code, because there is no static typing, basically anything is working.

And as you can imagine, it’s very hard to test such function because it may contain 25 different behaviors which are to some degree independent of each other, but at the same time, because of the co-structure, they are not really independent of each other.

So that’s sure one anti-pattern that I see a lot. I haven’t seen that much of, let’s say, Java code bases, although I’ve seen all sorts of stupid things done in Java code bases as well, but I would say there is-

Darko Fabijan (09:59):
I can tell you, there are. There are awful things there, too.

Jan Giacomelli (10:03):
Yeah, yeah. The example, we required 100% test coverage and a cert not now was from one job project I was working on. So yeah, I’ve seen many… But I’ve seen way more Python code comparing to Java, for example.

Long function

Darko Fabijan (10:18):
Concretely about that. I mean, long function, that’s like a very standard thing in our practice. Have you guys over time worked with some linked in tools that essentially maybe limited the length of the function or in some way complexity of a function?

Jan Giacomelli (10:39):
Yeah, so we use Flake8, which has… I think it’s Macabre or something like that. It’s called a tool for automating check of complexity. And yeah, we did add some rules there, but it’s usually not enough because those rules can usually be played out somehow.

You may use something else than if. I’ve seen a lot of tricks to just comply with the required metrics, such as code coverage or function complexity, and then maybe one thing is abstracted away to another function, but previously big function still calls now abstracted one, so the dependency is still there, is still impossible to test. So yeah, you may get to read of some if statements, but there is still a problem of testability.

The Problem of Testability

Darko Fabijan (11:34):
And better to fix the culture.

Jan Giacomelli (11:36):
Yeah, but it’s way harder to fix the culture. It’s easy to just set the max complexity is 10 or 8 or whatever, compared to actually change the culture of your fellow engineers in a team.

So that’s one anti-pattern and I’ve seen a lot. And then another thing that I’ve seen a lot is direct coupling to database. So database is almighty thing which knows everything, and at some point, maybe even do everything because why don’t you just write store procedures and SQL functions, and then you don’t need to write Python at all.

In very simple cases such as, for example, content management systems, which for example, Django was designed for. There are usually no complex business rule or complex collations of anything. And you can leave there by almost omitting unit test.

You can actually just test your endpoint because your endpoints are directly coupled through the views to your models, which are using active records. So that means one instance is one row in database.

Jan Giacomelli (12:42):
So in such cases, there is direct uplink, but you can live with it because there is nothing complex to do. For example, I’ve seen a lot of direct uplink to database when dealing with machine learning problems. People have a lot of data and they need to do something with machine learning models and they write a class or a function which ties together, let’s say, NLTK model with the database access layer.

And when you try to test it in any way, you would rather kill yourself than to do testing. So you just push it and hope for the best, but even in the deterministic problems where you can exactly predict the output, it’s very unlikely that this approach will produce any good results. But when you add all the machine learning problems from model drifts and data changes throughout the time, because people change, the world is changing and everything. It’s just a pain in the ass to do anything with it.

Jan Giacomelli (13:42):
And you don’t know to which part the business logic is going, to which part the NLTK models are going, and to which part the database access is going. You just have one big mess, which should produce some sort of, let’s say, is this review positive or negative? But if you need a database, just train your model. That can be a little bit annoying.

I think that very good thing in Python that it tries to be as simple as possible. So when using, for example, pytest, you can reduce your boiler plate codes to a minimum comparing to unit test. You can also do all sort of stupid things with fixtures, but you can do stupid things with basically anything in this world, so you can abuse any tool. If you have a hammer, everything looks like a nail.

Jan Giacomelli (14:34):
I think that the good thing in Python is that definitely that writing a test can be a very, very, very simple task to do, and also it can be very, very simple to learn how to do it because the boiler plate is really at minimum when using pytest. But on the other hand, because it’s simple, it can survive 1,000 lines of spaghetti and it’s working today.

In Python, writing a test can be a very, very, very simple task to do, and also it can be very, very simple to learn how to do it because the boiler plate is really at minimum when using pytest. But on the other hand, because it’s simple, it can survive 1,000 lines of spaghetti and it’s working today.

-Jan Giacomelli

Unit Tests

Darko Fabijan (15:41):
You touched upon a very concrete testing framework that comes built in into Python as a starting in library. Can you tell us a bit more about yours experience with unit tests? You also mentioned in our prep call just the definitions of what unit tests are and how different people understand it differently. And also then other types of tests that come into play at some point, integration, acceptance, and pen tests.

Jan Giacomelli (16:09):
So when I was starting, I really struggled with testing pyramid and what the unit tests are or what they are not, and what integration tests are and stuff like that. But at the end, my first and the most important rule for any testing, whether it’s a part of test-driven development, or it’s me writing end to end test for API, doesn’t matter, the first rule is if this tests fails, does it mean that this code shouldn’t go to production and vice versa? If it passes, can I confidently ship that to production?

The first rule is if this tests fails, does it mean that this code shouldn’t go to production and vice versa? If it passes, can I confidently ship that to production?

-Jan Giacomelli

If I cannot do that, if this test is flaky or if it does not give me confidence, then it’s useless, then it’s better to just remove it because I won’t take it into account at some point, because if everything is red, you stop seeing that it’s red. Or if everything is green all the time, then what’s the point? I can remove that and everything will still be green.

Jan Giacomelli (17:09):
So that was my first rule of thumb, and after these day, I still use that as a first measure for any test that I write. So if these stats is here just to satisfy coverage percents, or just to make my shape of test environment, instead of something else, then it doesn’t make sense to have it there. Butter just get rid of it, because it won’t do any good. I need to maintain it. I will have to check if it fails for some unknown reason, and it should pass at this point.

This is the most important thing that I would say is when writing any test. And then when you have this confidence, you start thinking. So when you have one test and you just need to make sure that your method stores your object as a rolling database, that’s fine.

Jan Giacomelli (18:05):
But when you have 1,000 of such tests, then there is the question of speed. So if I need to wait couple of minutes just to run my test, excluding end to end test, I won’t do it. If it takes couple of seconds, I’ll do it regularly because I want to see and I want to know whether I did what I think I have done, so if I made requirements, as soon as possible, because why I would wait if I don’t need to? So then in this matter, then, you need to start to think about… I can use, for example, test doubles to get rid of database access, to run all my unit tests and what can I do? I can implement in memory repository instead of a PostgreSQL repository, which must pass my contra test for PostgreSQL, but it’ll just store my object in memory in list or whatever. It doesn’t really matter.

Jan Giacomelli (19:10):
And now my test can go from couple of minutes to couple of seconds. And if I need to wait five seconds for 1,000 tests to run. I will run them 10,000 times per day because it just feels good to have everything green. And in this matter, clean architecture helped me a lot.

So how to draw the boundaries between different parts of the system, how to separate the data access layer from my business rules, how to separate the main business rules, such as that my user cannot have empty username for example, or that it must be a valid email address or anything else that I just want to make sure that ensures the integrity of my model or object. It helped me really a lot to start seeing those boundaries. And when those boundaries are there, you see that your system is testable because it’s very easy.

Jan Giacomelli (20:08):
For example, it’s very easy to inject in memory repository if it has the same interface to your use case, comparing to some mocking with patch, for example, in Python. So you can say that in this module, I want to mock my PostgreSQL repository and do something with mock, but mock will tell me what I want to hear, not what I need to hear.

So that’s why it’s better to just implement a simple test double which passes a contra-test and use it for all the tests. And my test will be fast and there will still be the same test. If I was able to sleep good with them when using PostgreSQL repository, I will be able to sleep then when using in memory repository. And I can just use integration test for example, to verify that my PostgreSQL repository is communicating with database as expected and that everything is there as it should be. And that’s it.

So our test suite was all good and then it exploded…

Darko Fabijan (21:03):
One question connected to this, but also to something that you mentioned at the very beginning, that you came to projects that were like maybe 45 minute area to get the feedback, and then you went to 8 minutes or something like that. And through this, what you have been talking about, this is another tool to also improve that feedback loop.

Do you have experiences where it was very good and then the test suite started exploding, so you just figure out, “Okay, three months before we had a feedback loop of 8 minutes and now suddenly it’s 12 minutes,” and it’s actually that maybe there are more people on the project just tests are getting slower just by the way that they are written and to make my question really concrete, do you have experience with doing iterations? So, “Okay guys, we need to now have an iteration to do some changing in the way that we write test on how we will optimize our test.”

So yeah, if you can shed any light on such potential experiences.

Jan Giacomelli (22:13):
Yeah, yeah. We’ve had such problems, but it wasn’t that linear. It was one day the tests were running in, let’s say, 3 minutes and the next day they were running for 15 because we added so many tests in order to make sure that everything was working. So yeah, we sat down and then we tried to eliminate all the tests that were not producing any value.

So for example, we had to check that data are extracted correctly from bunch of invoices, but we were able to find examples which were green more or less all the time. So if anything changed in a code, they weren’t really failing. So yeah, we tried to change our codes to do some stupid things, and then we saw, “Okay, we can get rid of these test and this one and this one,” and then yeah, we went from 15 minutes back to not 3, it was 4 or something like that, or maybe 5, but it was a huge difference.

But yeah, we had to experiment with and do some sort of triangulation to see where the redundancy is.

Darko Fabijan (23:27):
Yeah, that’s a great example. There are usually quite a few redundancies in test suite, but the thing is that it’s not always easy to discover them and then you want to stay safe and then you don’t make move.

Jan Giacomelli (23:40):
Yeah, we were lucky there that those tests were covering more or less the same part of a system, but we wanted to extend our safety net as much as possible. So that’s why it was maybe easier to find redundancies inside a test because we knew which part of the code discovered by then.

Flaky tests

Darko Fabijan (24:01):
Clear, clear. And maybe one last question in this area, if you have any practical advice. 10 years ago and today again, we see a lot of people struggling with different kind of flakiness in their tests. And there are many different ways that people run into that. At some point, maybe in prep call, you mentioned you will use a race condition somewhere, and then if you just try it a good number of times, it will start showing itself.

So do you have any maybe practical advice in area of flakiness, detecting it, dealing with it, and maybe some war stories that you can share.

Jan Giacomelli (24:38):
Yeah, flaky tests are usually assigned that your code is not, let’s say, deterministic. So there is some part which can change under some conditions. Usually this is some dealing with dates and times. This can really quickly lead to flakiness or with some external systems such as elastic search clusters or stuff like that. At least I had the most issues there.

So usually I just try to draw on a blackboard on which cases the tests are failing. So for example, there was no real pattern, so then I just write down into our blackboard every time that the test failed and the time of it and pipeline ID. And then I think at one point we realized… Yeah, one test was failing if it was running around midnight, because if it started just before and after, so it was very rare occasions because we usually didn’t work work at this point the day.

Darko Fabijan (25:49):
That was my next question. Are you working at midnight?

The reasons behind flaky tests

Jan Giacomelli (25:52):
Yeah, but sometimes if you did a lot of changes, you may push, let’s say 10, 15, 20 times from, let’s say, 11:00 till 2:00 in the morning, and you may encounter this issue because this test, for example, was not that fast. I don’t remember what was the reason, but it was a good one. So yeah, it was not that hard after all to hit this break, so it may take a minute or two. At some point I realized, “Yeah, okay, this is this problem.”

But there is no thumb rule I would have. Usually it’s just sit it down and take a look why is this happening? Does it look like something that you maybe could reproduce in some way? Do you rely on anything else than your code? Do you want to touch file system or database? I think that most tests that are flakey are usually assigned that there is some external dependency that is not there all the time.

Jan Giacomelli (26:54):
So this might be the order of tests. This might be a row in database. So for example, when writing end to end test, you should create user, log in with it, do your stuff, log out, delete user. If you rely on an existing user, someone will come, one developer, one day, and it’ll remove it then. Why our test is failing, what did you break? Oh, there is no user. That’s an example of dependency which is not stable. But if you create all your resources, even for end to end test and remove them afterwards, okay, only for the cases that there is actual defect and you won’t come to a point where you would remove, for example, this user, then it’s very likely that your test will be flakey at least at some occasions. And it’s not good if your tests are flaky because they make you to start having doubts about them and whether they’re useful and it’s tempting to say, “Eh, nevermind, end to end test failed. Yeah. We know that it’s failing sometimes.” Just move to production.

Darko Fabijan (28:05):
Yeah. From my experience and experienced other people I talked to, yes, it boils down to discipline, essentially documenting what is happening. And I spoke with quite a few people that, taking a spreadsheet, essentially, as you said, with the blackboard, just documenting all the cases, and over time a pattern will emerge and someone will come in on that day with enough energy under his belt and will deal with one of those and just letting it get out of control and then poison the test suite, because if there are like quite a few of them, then people can really lose the trust in the test suite, which is very unhealthy.

Jan Giacomelli (28:48):
Yeah, that start the start of decreased amount of test and the qualities start to decrease and everything just goes down to a point where it’s very hard to develop new features, your feedback loops are longer and longer, and you just wait each other to finish your work and to resolve merge requests and stuff like that.

Darko Fabijan (29:09):
Well great, Jan. I mean, I think people heard quite a few very good war stories and that are backed by a lot of experience in the domain. Thank you so much for sharing all this with us and good luck with your career.

Jan Giacomelli (29:26):
Yeah. Thanks.

Meet the host

Darko Fabijan

Darko, co-founder of Semaphore, enjoys breaking new ground and exploring tools and ideas that improve developer lives. He enjoys finding the best technical solutions with his engineering team at Semaphore. In his spare time, you’ll find him cooking, hiking and gardening indoors.

twitter logolinkedin logo