Justin Searls: How to Grow Your Engineering Team Tenfold in a Year and Keep Test Suites Healthy

When your engineering team grows from 10 to 100 engineers in the course of a year, there are so many things that you need to focus on, from operations and developer tooling to testing. Maintaining the health of the application is perhaps the most difficult part of all. Where exactly do you start?

We sat down with Justin Searls, the co-founder and CTO of the Test Double agency. For many years, Justin has been consulting organizations on how to best tackle team’s growth and ensure that good practices are in place when teams grow.

We talked about how to grow engineering teams without losing sanity, how to divide work without stepping on one’s toes, and keep your test suite maintainable.

Here’s a brief overview of what we covered in our conversation:

How does software let people down?
How to scale up your test suite when your team grows tenfold
Rearchitecting things when team scales up
Conway’s law
Rapid growth of an engineering team: patterns and anti patterns

Listen now or read the edited transcript.

You can also get Semaphore Uncut on Apple Podcasts, Spotify, Google Podcasts, Stitcher, and more.

Like this episode? Be sure to leave a ⭐️⭐️⭐️⭐️⭐️ review on the podcast player of your choice and share it with your friends.

Introduction

Darko Fabijan (00:02):
Hello and welcome to Semaphore Uncut, a podcast for developers about building great products. Today, I’m excited to welcome Justin Searls. Justin, thank you so much for joining us.

Justin Searls (00:12):
Hey, thanks for having me. For anyone not familiar with me and what I do, I am the co-founder and current CTO of a software agency called Test Double.

I’m a lifelong consultant. I came out of college and worked at a big consultancy, $2 million projects and financial institutions, and slowly worked my way down to smaller teams.

Through my various experiences as a consultant, I have learned all kinds of ways that project teams fail to hit deadlines. They struggle to make high quality systems and code. They also struggle to develop and communicate shared values and normalize on conventions.

At the same time, they are underappreciated by an industry that tries to oversimplify what software development is. In a way, there’s this metaphor for some sort of industrial or manufacturing process that can be easily distilled down and automated. So I’ve been on a mission to surface all the ways that software lets people down.

How does software let people down?

Justin Searls (01:31):
If you follow me on Twitter, you just see me posting bugs left and right. It’s to say that we’re in a pretty broken industry and we need to both build awareness and start to think together about what we can do to make it better.

We’re in a pretty broken industry and we need to start thinking together what we can do to make it better.
-Justin Searls, co-founder and CTO of Test Double

I think the only way forward is to have a pragmatic and optimistic lean towards figuring out creative solutions to keep the industry moving.

That led me to focus on lots of secondary tasks that developers are tasked with, like operations, developer tooling and testing, agile process and helping product owners best articulate the needs that they have. How do we solve secondary and tertiary concerns that are outisde a particular algorithm?

That’s where I’ve spent most of my time.

How to scale up your test suite when your team grows tenfold

Darko Fabijan (02:52):
Thank you for that great introduction!

I’m going to use the opportunity to introduce us through the topic. We’ve been doing CI/CD for the last 10 years. In the last couple of years, we’ve been very lucky to have some very successful clients. For instance, they’re a team of 20-30 engineers doing something and having a common understanding how to do things.

But then, success hits the team, and they have to scale to a hundred or two hundred developers over a year or two. We love having those customers. But we also want to help them as they’re facing various challendes regarding the health of their test suite, scaling of their test suite, etc.

Flaky and brittle tests are also in that category and level the expertise and technical maturity of the team as a while. We tackle the hardest topic of all: how to make sure that as applications are growing, good practices stay in place. How does the team that’s growing rapidly level up the health of the test suite?

The question is where to start.

Justin Searls (04:31):
You touched on something that’s really important. In a lot of companies, especially SaaS, the marginal cost of onboarding a new customer is typically near zero. That means that if you get a sudden influx of thousands of paying customers, you’ll have to think about how to scale up things like support, etc.

Since the marginal cost of acquiring new customers is almost nothing, it led our industry to make a fallacious assumption that we can just scale up all these other things, too. That might be true of AWS resources. We can scale up the server count, the CPU count and the amount of RAM on a little slider.

However, scaling up a team from 25 people to a hundred people in a rapid way requires a lot of care and communication. How do we divide up the work in a way that we’re not stepping on each other constantly?

As a business owner or a manager, I have to figure out how to solve a thousand things at once.

Rearchitecting things when team scales up

Darko Fabijan (07:25):
When teams rapidly grow from 25 to a hundred people, architecture of the application might not be made in such a way as to support this growth. People struggle to make sure that they have a reliable CI/CD in place. And we’re here to help.

What ends up being part of the problem is the architecture of the test suite that usually gets inverted. There are hundreds and thousands of very expensive tests that are not needed. People struggle to scale CI/CD along with their teams and their architecture.

Do you have any advice on how to scale applications made by a team five time smaller than the team which now has to run with it?

Conway’s law

Justin Searls (08:43):
Yeah, there’s two big things. First, I often find myself referring to Conway’s law, which is that our technical systems tend to reflect the human organizations that create them. If your application has lived for seven years and just 5 or 10 people were working on it, the systems from the architecture of the code is going to be optimized for both a team that size of 5 to 10 people as well as the individual proclivities and the dynamic and the chemistry of that 5 to 10 people.

According to the Conway’s law, our technical systems tend to reflect the human organizations that create them.
-Justin Searls, co-founder and CTO at Test Double

There’s a transition that happens as we encode our preferences and our patterns of behaviour into the things that we automate.

If a team grows from 5 or 10 to a hundred people in the course of a year, it’s a growth sprut that has no analog in biology. Conway’s law is a constraint where the speed at which the system can catch up to the organizational change is going to be limited.

That means that we have to be mindful of any mismatch between our human structure and the technical architecture that we have in place. I might have a monolithic application where every test by default just uses a browser and goes through the full stack of everything just to make sure that the feature works. Or maybe I’ve got a separate browser test for every single resource in my crud application.

As I scale linearly, the system gets slower and the tests get more numerous. Even tests that used to run in a second now take two seconds to run.

So the system gets bigger, everything takes longer. Going from 10 tests to a hundred, you might notice the build slowing down by a factor of 20, 30 or 40, not 10. That’s where we need to be mindful of the inflection point that we’re in. If we’re undergoing rapid growth, we need to pause and think about what does scaling with stability look like, what do we have to start and stop doing as developers.

Rapid growth of an engineering team: patterns and anti patterns

Darko Fabijan (12:05):
When it comes to rapidly growing a monolithic applcation, what are some of the patterns and antipatterns that you’d recommend or notice?

Justin Searls (12:23):
One of the things that is a super power of a batteries included framework like Ruby on Rails for example, is that you can if you just follow of the cookie cutter objects that it gives you, in a single controller action, you can accomplish 15 different things at once. You can handle, of course, the literal behavior of updating something in the database, you can handle cross-cutting concerns like logging and authorization of role. You can do air handling, and you can do additional checks.

Justin Searls (12:56):
You can just keep on adding lines to each of those controller actions as they get more complicated. And that is highly productive when you’re starting out.

But you have to be mindful of any time that you combine or you embed multiple responsibilities or reasons that code could change down the road into a very tight little function or method or unit, it’s going to create a tangle that’s going to result in future people.

If you’ve got multiple people working on it, having reasons to go into that code at the same time, which is going to lead to tests that contradict one another in separate feature branches, or you’re going to have version control conflicts, or you’re going to have unspecified behavior where maybe each person’s tested their individual corner of a piece of functionality, but never really thought about it from the perspective of a stakeholder or how the system’s actually used.

Justin Searls (13:53):
And once you get into production, you realize, oh, there’s something in the cookie in production because of our new A/B testing software and no one really thought about that.

And so of course the value that wasn’t present in either of our testing environments is now somehow interacting with these other features. And so the first advice that I give is to start aggressively de-tangling and choking each of the responsibilities that any feature performs into isolated, small, well-named units that are then composed with a level of abstraction that is consistent so that you can have a 30,000 foot view of what a particular feature does.

The first advice that I give is to start aggressively de-tangling and choking each of the responsibilities that any feature performs into isolated, small, well-named units that are then composed with a level of abstraction that is consistent so that you can have a 30,000 foot view of what a particular feature does.
-Justin Searls, co-founder and CTO of Test Double

Justin Searls (14:34):
And it just says on the tin I do X and then I do Y and then I do Z. Then you can dive deeper into a tree, whether those are literal files and folders of subordinate behavior that supports that feature, as opposed to just continuing to grow longer and longer functions with more and more ifs and else’s or overly clever abstractions that maybe take advantage of a lot of clever stuff like reflection and introspection to dynamically handle cases that make it harder to grip around.

So I guess the first thing that I talk about is writing simpler, more obvious single purpose code.

Darko Fabijan (15:55):
The majority of people regardless of the framework can end up in this situation, so the advice is pretty universal. Previously, we’d need to set up a lot of things, a whole environment, and then run through that controller action. That can cause a lot of headache along the way.

Justin Searls (16:36):
Last week, I wrote two new gems. One that integrates with the Pipedrive CRM, and one that defers tasks to the end of an HTP request so that they don’t block the user form receiving their HTP response called later.

I’m using those two gems that I made in conjunction to when somebody fills out a contract form at testdouble.com to greet us, they can immediately get a confirmation that an email will be sent to us.

But when I came to test that, I started thinking, well, maybe the later gem by default is asynchronous and goes to another thread?

Speaking of me making gems, I make a lot of gems. I made a new mocking library called Mocktail for Ruby last year, which I’m really excited about. Maybe this is an opportunity to use that and realize that mocktail is thread safe. And so I got to inline all of this stuff.

Justin Searls (18:21):
So it’s like, oh, maybe this is an opportunity to use that and then realize like, oh yeah, mocktail is thread safe. And so I got to inline all of this stuff.

And so I add inline features to the maybe later gem and then I realized, oh right, puma’s on yet another thread. And so like all of that, just to answer the question of did this code get run the way that I expected from the parameters that I passed in?

And I could have saved myself all of that pain if I had chosen to test just one layer beneath the controllers. If I had just decided, hey, instead of slamming all of this behavior into a controller action, what if I just had a single class called handles contact form submission or something?

And if the code is all there, then I am in control of just invoking it directly and calling it like a normal object.

Justin Searls (19:04):
I have all of the power in the world then to very, very carefully ascertain both the response that I get back and any sort of secondhand signals that it sends off.

But that betrays I think the impulse of a lot of testers, a lot of people who are excited about testing is like they want to have a maximally realistic test. They want to know that when I load that webpage, this thing is actually happening, and that is valuable and it’s worth knowing, but I think that you need to calculate the overall return on an investment of the activity.

If you’re able to get to a point where each of your controller actions are all just one liners calling to normal plain old objects that are easy to put under test with just an indication, then I don’t know, you can just hand wave each of these frameworky classes that just handle the routing of stuff and call that, that’s just configuration.

Justin Searls (19:58):
And if that layer stops working, the website won’t load. So we can probably just not have every single test redundantly exercise all of this HTTP complexity that makes everything so much more difficult for us to ascertain.

Those are the kinds of trade offs that I think a mature team can navigate. But it is really difficult when you’re growing really quickly and you don’t have that experience and you don’t know what sort of negative consequences await you if you just continue to add more naive, highly integrated tests.

How to keep tests from becoming unmaintainable

Darko Fabijan (20:38):
Just a week or two ago, I was speaking with creator of Cucumber, a BDD testing framework, Aslak Hellesoy, and he was coming back to exactly what you in your example now demonstrated.

We want to get a maximum reliability and we also want speed. We also want to cover a business logic in the UI and combine different things together, we will get entangled into something.

There are too many constraints or too many things that we want to solve and that’s how we could end up with test suite, which is just unmaintainable.

Justin Searls (21:17):
Yeah. And that’s I think one of the reasons why an interdisciplinary profession like ours where we have a technical component and we have an interpersonal component to the work that we do as developers every single day. If it was all interpersonal, then we wouldn’t have a rug under which to sweep a lot of uncomfortable decisions.

But because we have that technical component and because our test suite can’t call us out in a retrospective meeting or can’t advocate for itself that like, hey, I shouldn’t be wasting my time on this redundant action, I can choose to look at all the trade offs like you just identified and say like, well, let’s just have our cake and eat it too and just continue to shove all of this stuff into our test suite.

And that can kick the can down the road on either maybe a hard conversation between me and a colleague who we value different things about the tradeoffs that we’re making.

Justin Searls (22:22):
I remember 10 years ago back in the olden days before cloud CI was a normal thing, I had a client whose build logically took like three hours to run. It was a really, really long build. And so that wasn’t fast enough feedback.

They also had just hired like hundreds of developers and so they had to decide, do we change fundamentally how we do our work?

When I was a kid, we used to rub wax paper on slides in playgrounds to make them go faster. Do we just rub wax paper on this build and make it go faster?
-Justin Searls, co-founder and CTO of Test Double

Or do we like, when I was a kid, we used to rub wax paper on slides in playgrounds to make them go faster, do we just rub wax paper on this build and make it go faster?

The AWS was new and that’s where they were running stuff. And so what they did was they just added as many AWS instances, EC2 instances, as they had integration tests. So basically every single Cucumber test had its own instance and they just kept those spun up all the time.

Justin Searls (23:13):
And so they were able to get that logical three hour build down to whatever the slowest individual test was. And they got it down to like 20 minutes or something like that.

The problem then was that everyone kept working in exactly that same way, learning no lessons and that super linear test build problem came back.

And so you revisited a year later and now if you were to like actually find an ordering bug in their tests and try to run them all in order to debug that or to bisect what is this test pollution that’s causing this brittleness. I think that it had gone from three hours to like 28 logical hours end to end.

Justin Searls (23:46):
The team had sufficiently snuffed out any real feedback about what was going on under the hood. People would’ve seen a new rack of servers coming into an ops room and ask questions about why we are suddenly spending hundreds of thousands of dollars a month on CI servers.

When hard questions can be pushed into invisible spaces, we need to be very mindful that we are not ourselves doing a disservice to just dealing with the hard question right now.

Darko Fabijan (24:32):
Clear. Absolutely. If something can be seen with eyes or not is one of the curses of our craft. And now with hidden racks of servers, that’s an additional element to it.

Trainings that Test Double offers

I want to go back. You mentioned that one of the things that you are doing some trainings with the teams and you also work with many companies that people know about, GitHub being one of them, can you share some of your experiences and what are the situations where people invite you to come in and help them to level up their team in various ways?

Justin Searls (25:10):
Yeah. Thanks for asking that. Test Double, our consulting agency, we’ve been around for about 10 years now, and we’ve been super fortunate in the last few years to be trusted and brought in by engineering organizations that we truly love and respect. And they’re doing amazing things.

Companies like GitHub, Zendesk, Betterment, these are organizations are doing great work. It’s not so much that they see that our engineers when they join their team have some capability that they lack or that they’re insufficient in some way. They just understand that there are so many facets to building great software that getting some outside perspective sometimes is necessary and valuable.

And so to your point, a lot of times a client will tap me specifically and ask me to kind of like just do an assessment of how are we testing or try to think deeper about repeated systematic failures that the team is running into that they don’t have any creative solutions for.

Justin Searls (26:14):
And for anyone listening, what I can say is no one has this stuff figured out and I am not really sure that there is a one size fits all solution. Because the more that you grow as a company and the more that you choose to automate stuff to make the business run, the more that those automations are going to be a tight fit for the particularities of your business. If you’re just writing commodity code, it’s not differentiating. You may as well just use some off the shelf piece of software.

The more that you grow as a company and the more that you choose to automate stuff to make the business run, the more that those automations are going to be a tight fit for the particularities of your business.
-Justin Searls, co-founder and CTO of Test Double

Justin Searls (26:50):
Us as developers, every time that a product owner is asking us to do something, 99% of the time they’re asking us to invent something that’s never been done before, at least not in the context of this particular business.

And so we need to have a humility. For example, I’m in a training with a bunch of developers who are smarter than me, and they’re looking at like, hey, we’ve got all of these flaky tests and we don’t know why, but something like 2% of tests will just always fail on every given build, meaning we can never get just a clean run and a green build.

And we run them out of order and we run them in a parallelized way to try to identify test pollution, but ultimately end of the day, what are the strategies that we can use to tackle that?

Justin Searls (27:34):
And because every company has a completely different blend of services under the hood and priorities for are what they need to check in order to feel comfortable about deploying to production, it’s important that we recognize that you might as a developer feel pride in a particular algorithm that optimizes the core thing your business does. Maybe it’s coming up with creative ways to schedule something in logistics and that’s your special sauce.

It has to become our special sauce to think about how to tackle these meta problems like an exploding CI build duration, or the number of flaky tests. It might mean that we need to actually develop bespoke tools to identify, ah, yeah, we need to categorize these tests are flaky and once you’ve marked it, a human marks it, and then it goes into an optional suite, which goes into a queue, which then somebody can take a closer look at and try to analyze for data pollution.

Justin Searls (28:33):
That way you’ve taken the immediate pressure off so that somebody who’s got the clarity of mind to focus and really problem solve about each of the individual problematic tests can do that without it simply being like, oh crap, I got to release, I got to release, hair on fire. I got to push, push, push, push, push. What can I do to shut this test up so that I can get my feature out.

Justin Searls (28:54):
Responding to the particularities of your company and the urgencies that it’s under and the priorities that it has is the only way to understand what kind of solution is going to work for us particularly. That might require you to develop new processes or invent new tools or new workflows that you’re not going to find in a blog.

Or you’re not going to find an off the shelf solution for, and it’s going to take time and hard thinking and permission to spend and invest real time on it, even though it’s maybe not a feature that a customer’s going to see, the customers and the business will benefit from a smoother, lower friction ability to deploy and deliver new software.

Justin Searls (29:35):
And so it’s like we just have to acknowledge that’s part of our job as developers, and also get better at making that case to explain the value of these kinds of activities to the people that we’re working with who maybe aren’t so technical or don’t feel that pain as acutely

Darko Fabijan (29:50):
Great explanation. And also I think a great insight into what technology leaders, people who are leading the team recognize that these challenges are not something that they might be facing particularly in their team. It’s an industry-wide problem.

Just an example. Fairly often people ask, “Okay, are your other clients facing the same thing? Facing the same challenges?” And yeah, of course they are. A lot of them are dependent which level of transparency we are going to discussion and so on, but yeah, all great advices and tips.

Justin, if our listeners want to learn more on these topics, related topics, what are some pointers where they can find more about you, your talks, generally your consulting work.

More places to learn about Justin’s work

Justin Searls (30:39):
Yeah, thank you. So I tweet a lot just kind of top of mind stuff as I run into it as my last name Searls on Twitter. If you’re looking for kind of more substantive stuff, most of the things that myself and my colleagues spend a lot of time thinking about, we’ve written about either at our blog, which is blog.testdouble.com or we’ve got open source tools for us.

So you can find us on GitHub and check out all of the dozens of repos of often small libraries that just solve particular problems that we run into at clients.

Justin Searls (31:12):
And then increasingly over this year in particular, we’re planning on building more practical content to help developers on YouTube. So we’ve got a YouTube channel, it’s just also under the Test Double tag. And I hope you follow us and stay tuned for the stuff that we’re doing there because honestly, sometimes I view the consulting business as a way to fund our helping developers for free and public habit, and I’m just thrilled that we get to spend the time to share some of the things that we’re learning while we do the work.

Darko Fabijan (31:41):
Yeah. Great. Thanks for that. Thanks for all the tips and advices and yeah, good luck.

Justin Searls (31:47):
Yeah. Thanks so much. Thanks for having me.