John-Daniel Trask on Shortening Feedback Loops for Agile Development

In this episode of Semaphore Uncut, John-Daniel Trask, the co-founder, and CEO of Raygun explains how a short feedback loop enables teams to rapidly develop high-quality products.

JD Trask is a technology leader with almost 30 years of software development experience.

Key takeaways:

The shorter the loops are, the happier the customer gets
Putting the customers at the heart of what you do
Understanding the landscape of error tracking and crash reporting tools
Taking the mission to the next level
A good tool is like an extra team member

Listen to our entire conversation above, and check out my favorite parts in the episode highlights!

You can also get Semaphore Uncut on Apple Podcasts, Spotify, Google Podcasts, Stitcher, and more.

Like this episode? Be sure to leave a ⭐️⭐️⭐️⭐️⭐️ review on the podcast player of your choice and share it with your friends.

Edited transcript

Darko (00:02): Hello, and welcome to Semaphore Uncut, a podcast for developers about building great products. Today, I’m excited to welcome JD Trask. JD, thank you so much for joining us.

JD: It’s a real pleasure, my name’s JD Trask, and I am the co-founder and CEO of a company called Raygun that builds tools for software teams to build better quality products. By way of background, even though I am a co-founder and CEO today, I am a hardcore tech geek. I learned to code when I was nine, I started selling commercial software at high school on floppy disks in the ’90s. I don’t write quite as much code today as I would like; I still do a bit at home to try and keep the skills sharp, and I love playing with new stuff. But my day-to-day is leading the company, and I still love hearing what the devs are coding on. I’m passionate about software, and I’m passionate about business, almost at equal levels.

The shorter the loops are, the happier the customer gets

Darko (03:12): Certain elements are timeless. There are those things that you must keep track of and ensure that they are working. Can you give us a brief intro around how you help customers make sure that their software is of high quality, that they can reduce bugs, and what are some patterns that you see in that area?

JD: Everybody assumes they don’t have that many bugs. I’ve never had somebody try out Raygun and go, “I don’t know if it’s working because I have no error reports.” There are always error reports! I always describe error reporting as kind of the black box flight recorder. Stuff blows up, and you need to know why it blew up so that you can fix it quickly. Ideally, it’s about the customer, giving them a better experience.

We were running CI and CD servers back in 2004. Usually just running it on your own workstation because nobody was setting up servers for that back then. I realized CI and CD feels like a superpower, because you’re like, “Oh my goodness, I can so smoothly get to production.”

But how do you really make the most out of that investment? It’s to create the feedback loop. I can get to prod fast; how quickly can prod tell me about what’s happening? We stepped in to say, “Hey, we can close the loop. You can get to production fast. Now you can iterate and cycle through production blazingly quickly, fixing bugs and all that, so that you have a higher quality product for your customer.” It’s about faster loops to prod, to ultimately make your customers happier and not be caught flat-footed on an issue that you had no idea about while customers were having a really bad time.

Putting the customers at the heart of what you do

I want to touch on a point you raised earlier about enabling the teams to be in control. Everything I’ve just said actually helps with that because it demonstrates ownership end-to-end of what it is you’re producing, through to who you produce it for.

This sort of tooling, the DevOps movement, all of these things, can sometimes feel like a whole lot of responsibility landing on a dev team. But at the same time, the reason that you’re doing it is to try and say, “Okay, well, we can make sure that the customers are happier. We can empower those teams.” And when it comes to ownership, I always think it’s really important that the software developers appreciate who it is they’re building for.

I make some comments that are probably fairly insulting to software developers at times, and I make those comments because I’ve been there, and I am there sometimes. I can get in the zone and end up far more fixated on how I’m building the code – because I love elegant code! I love thinking about how the code’s going to hang together, and all of that stuff’s important for maintainability and performance, etc. But ultimately, it doesn’t really matter. What matters is the customer.

The thing about getting more ownership and control within the software teams is the more they talk about the customer, the more the teams outside of software development can also understand the software team. They’re communicating effectively in the language of business. Now they’re talking the same language rather than, “You’re talking about JavaScript, and I’m talking about NPS scores.” We can all get on the same page. We all have the same ultimate master, which is the customer.

Understanding the landscape of error tracking and crash reporting tools

Darko (15:52): How do you see adoption and change in the industry related to the service that you’re providing?

JD: Firstly, I’d say, using an exception tracking service or tool doesn’t negate the need for software tests. They go well together, and the best practice you can have is, “Hey, this exception was reported. I’m going to go and fix it.” The first thing I do is write a test that proves it fails, and then I’m going to fix it until the test passes. If you just get into that pattern, it works really well.

Before we built Raygun, we actually had a product called Lightspeed. It was an object relational mapper, that talked to databases like Active Record in Ruby or Entity Framework in .NET. It’s one of those sorts of things that it’s not that much code, but it’s a very flexible system. We ended up with was close to, I think it was 30,000 unit tests on this object relational mapper.

The second piece that made that so powerful was we could actually make changes with so much confidence because we weren’t building on sand. You weren’t worried that you fixed something way over here and the backdoor breaks. To be honest, I’ve often struggled to see software projects have enough tests that people get to the point where they can have confidence in large-scale refactorings.

When we first launched Raygun as a crash reporting service only, at the start of 2013, there weren’t too many folks out there doing it. Early on, we would see that nearly every customer that adopted Raygun was coming from nothing; it wasn’t beating a competitor, it was expanding the pie. It was actually introducing somebody to the concept.

Fast forward half a decade, a little bit more. We see a little bit more stuff in the competitor category, but I fundamentally still believe that we’re still at the very early days of tracking those areas. I mean, even in CI/CD, I still am blown away by people who aren’t using that today. It just seems so obvious as such a huge win. It doesn’t take long to see a return on that investment. I feel very strongly the same applies to crash reporting. But as I say, I wouldn’t think of it as a replacement for unit testing. I think of it as a way of actually understanding what your code is doing in the wild. Production is just the biggest test environment you’ve got, and you might as well get the details from that.

Taking the mission to the next level

Darko (23:39): What’s coming up on Raygun’s side? Is there something big you are working on?

JD: We’ve got a lot of stuff coming. We’re about to overhaul our user section. One thing that we do in our products, that’s probably a little bit different to a lot of the monitoring/DevOps category of tools, is that we try to put the customer right in the product. So, it’s not spooky weird stuff like Google does. You have to choose to opt in to identify who the customers are.

You can go, “Okay, well, here’s our VIP customer, how many errors do they have? What’s their average load time? What do they do navigating around?” And understand that. Similarly, when you go and look at things like the error dashboards, we say, “Don’t worry about the number of errors, worry about the number of affected customers for this particular error type that’s occurred, because that’s who you need to fix it for.” For example, you might have an error that’s occurred a thousand times and impacted one customer who has it in a loop, or you might have an error that’s occurred a thousand times and affected a thousand people. You need to prioritize towards helping a thousand people. So we push all of that stuff right up into Raygun, you can explore everything about your customers in there. So that’s going to be cool to come out.

We just launched our APM product support for Ruby. We’re launching support for Node.js before Christmas. The crash reporting stuff. We’re actually overhauling the ingestion pipeline at the moment to make it super scalable. So Raygun today processes, I think we peak around a billion API calls an hour, some of the world’s biggest brands run their stuff through us.

Let me bounce the question back to you. What’s coming up at Semaphore?

A good tool is like an extra team member

Darko (26:47): We are trying to develop a layer that should come in, let’s say, probably Q1 next year. It should give you insight into the test suite because we have many customers asking for it. To figure out which tests are failing, how often they’re failing, what are some brittle tests? What are the flaky tests that you have in your test suite? What is the group of tests that are holding your team back from moving faster into production? So that’s one big area.

JD: Well, I can definitely say about the tests and finding the flaky ones, that would actually be very valuable at Raygun. We’ve had a few flaky tests in recent times that I know a few team members are working on. There’s always information there, but I always think the best way for these things to work is if, in a weird way, the team almost feels like it’s a virtual team member, like it’s helping the team go faster by being somewhat proactive with those insights and helping them understand things that maybe weren’t obvious.

You know tests can be flaky, but which ones are they? Or when that starts to become flaky, come and tell me about it so that I can get on it before we’ve realized that this test has failed every Tuesday for the last three months, but nobody noticed that pattern until now. That’s the sort of thing software can be very good at, as I’m sure it’s the same as Semaphore.

As you’re scaling these engineering teams and you sit there sometimes, and you’re like, “You know what? It’s going to cost this much to add one more engineer, and that person’s going to come with management over here, they’re going to need hardware, they need software licenses, all this sort of stuff.” Let’s say you had a team of 20 engineers. You only need one piece of software to add 5% more effectiveness, which replaces the need for the person. The software is probably going to be a heck of a lot easier to manage.

Now, that obviously works better and better at scale, but that’s how I think of it. I’d love people to be like, “We have Raygun because it helps our teams so small, but achieve more.” That’s part of the vision.

Darko: Thank you, JD. It was a pleasure talking to you.

JD: It’s been a real pleasure. I appreciate the opportunity to be on here and to have a chat. Thank you very much.