In this podcast episode, I welcome Kris Buytaert, consulting CTO at Inuits.eu, DevOps evangelist, one of the organisers of DevOpsDays. We talk about the conference, how to introduce CI/CD to teams, and what are some patterns and antipatterns for infrastructure as code. We also discuss why teams are reluctant to spend money on testing and operations, and what happens if they don’t.
Listen to the full conversation or read the edited transcript.
Table of contents:
- The reality of conferences in COVID times
- Introducing CI/CD to teams
- Infrastructure as code: patterns and antipatterns
- Drawbacks of tech education
- Why do organizations not spend money on testing?
- How to get budget for DevOps
Like this episode? Be sure to leave a ⭐️⭐️⭐️⭐️⭐️ review on the podcast player of your choice and share it with your friends.
Hello, and welcome to Semaphore Uncut, a podcast for developers about building great products. Today, I’m excited to welcome Kris Buytaert. Please go ahead and introduce yourself.
I’m Kris, I live in Belgium. About 20-25 years ago, I started playing with Linux and realized that open source was what I wanted to do. I started out as a developer doing projects in Java and other languages.
As I progressed in my career, I started doing more operational work because I knew how to make clusters and work machines. Some 12 years ago, I spoke at a lot of conferences on how we did high availability and scalability and automation, etc.
I bumped into Patrick Debois at a CloudCamp in Antwerp. He had this crazy idea to bring together 70 best of our friends to a small conference and get both operational background, HL background, cloud people, and developers in one room. That’s how we started DevOpsDays in 2009.
Since before that, I’ve been helping organizations to deliver software in a better way using open source tools.
The reality of conferences in COVID times
Great introduction! What’s the reality of DevOpsDays in COVID times?
My last DevOpsDays was New York, 2020. I haven’t been to one since, I went to OSMC beginning of the month, in Nuremberg.
Online conferences are really not the same as in-person. The real power of DevOpsDays was the open space, getting to talk with people who have similar experiences, who can share ideas. It’s hard to mimic online.
There are still DevOpsDays popping up left and right; some of them happen in-person. One happened in Tel Aviv recently, and a lot of people were happy to attend it in person.
But there’s DevOpsDays that have been postponing 3 or 4 times already. It’s not easy.
Introducing CI/CD to teams
I couldn’t agree more: the best part of any conference is when you get to talk to a bunch of people.
One of the main things that you wanted to achieve with DevOpsDays is to bridge the gap between operations, DevOps, agile, and structuring work. You gave a lot of talks on how to introduce teams – DevOps, development teams – to CI/CD as a topic.
Can you give us an overview, what’s the best practice?
A lot of the work I do is consulting, which means I end up walking into organizations that are struggling, that are failing, and that actually need help. Which does mean, in a lot of cases I only see the shit that’s around.
But people are struggling because they have a problem they need to solve. They cannot get to production at the speed they want, they have stability problems in production. Basically, they have problems delivering software.
Oftentimes, you talk with different people in the teams and realize that their whole effort into trying to deliver software hasn’t included operational people. It has just been developers doing some testing, then building a pipeline. They’ve been doing continuous integration but how it’s being delivered is totally ignored by the rest of the team.
On the other hand, there’s successful transformations. It’s the ones where operational people were involved, and they we’re upfront willing to automate faster, build things. They want to be able to support their developers teams.
Then you tell them that you’re going to do infrastructure as code and continuous delivery on their infrastructure. You’re going to teach concepts like how to do promotions, test coverage, all those things. When you get teams to understand that, it’s much easier for those people to support their developer teams.
When you get teams to understand concepts like infrastructure as code and do continuous delivery on their infrastructure, it’s going to be much easier for those people to support their developer teams.-Kris Buytaert, consulting CTO at Inuits.eu
The trick to get more people of an SRE style role, is to understand and learn those things. Your CI/CD infrastructure needs to also be something where you can do continuous delivery on. You should be capable of constantly upgrading, knowing that you can release at any point in time.
When people become fans of the whole ecosystem, then they start to understand, like, “Yeah, this is how it works, we can show it to other people”. That really helps organizations forward but sometimes you come into an organization way too late for that.
Infrastructure as Code: Patterns and Antipatterns
Six or seven years ago, infrastructure as code was becoming a reality for more teams. If you compared those times with today, what are some patterns and antipatterns? How easy is it to get people to jump on this train?
There was a fun thread on Twitter a couple of weeks ago where the CNCF kind of defined what GitOps was about a lot of the people that have been around doing infrastructure as code for ages were like, “Yeah, but this is what we’ve been doing for close to a decade and a half. We do desired state, we do version control, we do actually some tests on this.”
That was even missing in the CNCF definition, “We do test on what we we do, and then it gets deployed, and it stays in that state, and we were capable of reprovisioning things, we’re capable of doing all of this automated.”
The honest remark there is I haven’t seen a lot of organizations who are capable of doing continuous delivery on their infrastructure as code, but they do exist.
Sadly, what I see now, mostly people that are struggling is if they claim we do GitOps, they do continuous isolation, because they basically have multiple branches, scales, running all over the place. Then they have some operator deploying their state of the application in their container ecosystem.
In a way I understand where a problem space is, where they’re coming from. But that is absolutely the opposite goals of doing continuous delivery of your infrastructure. Because the clue and the really important thing is testing, the other part is we do trunk based development.
So, if you have multiple long running branches, you’re going to shoot yourself in the foot at some point. So, I think a lot of those things are circular movement, where people bump into a problem space. They look into ways how to solve them within that problem space, don’t know yet that within their own community, that there’s been people who solved similar problems before with other technologies.
A lot of those people are going like, “So, yes, we’ve solved this problem before, if you just take this, this and this pattern, and you reuse your tools the other way, you kind of have solved it.”
But they go out and they start building new tools, where they then each time do some incremental improvements, but sometimes forget about, hey, this already exists.
Those who don’t know Unix are doomed to reinvent it – poorly.-Kris Buytaert, consulting CTO at Inuits.eu
The joke about it is those who don’t know Unix are doomed to reinvent it, and then a lot of people add poorly. The only thing we should look for is there are obviously always incremental improvements, but what they exactly? Why are people trying to solve a problem the way they currently solve it?
That is, to me, what’s really the important part, what is actually the problem you’re trying to solve?
Yeah, I have seen that, and actually talked with a lot of guests on the podcast about those cycles that we have in the industry. Do you think it’s also related to the generations of people coming into the industry, and not knowing what was done before and what is available?
It’s definitely to do with new people joining the industry, not knowing what exists, and it’s also getting impossible to know what exists. There are so many things moving, there are so many things popping up.
There is, however, a struggle where a lot of those people come into the organization and say, “Hey, we know how to do things,” and just forget about there’s other people also out there who have experience, learn things.
That definitely is important, that sometimes knowing the pattern and seeing what’s happening is much more important than knowing the specific tools, and figuring how to do such and such configuration with such and such tool. But we’ve seen over years that a lot of what is needed in organizations is really teaching people: we do these things because… We are going to implement this feature, because if we don’t, this is going to fail.
New and young people, in a way, don’t have that operational experience.
They haven’t seen things failed yet. That creates conflict because they think things are not going to fail. People with experience say, “Yes, but maybe hold your solution differently because you’re going to hurt yourself”.
It’s a challenge in both ways. On the one hand, the new people need to accept that input. On the other hand, the old people need to accept there’s obviously incremental improvements and they could learn from each other.
But it’s still a hard problem, preventing people from failing fast.
But it’s still a hard problem, preventing people from failing fast.-Kris Buytaert, consulting CTO at Inuits.eu
Drawbacks of tech education
Some analogy here. I’m trying to convince developers that tests are here to help and guard you. The reality is, almost 15 years since I graduated from the university, most universities are still not teaching testing. You can boast around the programming languages you know but you almost never wrote a single unit test.
Can you do some mapping how that maps to the DevOps world? How do you see education in this area of DevOps, what are good patterns?
You touched on a couple of things. People coming from university and having been taught GitOps, they basically fall into this gap like, “But this is not what they told us it would look like.”
The real world is different, yes, there’s people who are doing it this way, but that is not the majority they see. There’s already a gap between what the schools are teaching, because they want to do the new technology thing, and what they actually should be teaching.
The second gap is, you said they don’t cover testing and things like that, I can perfectly imagine that they don’t do, but they don’t even cover the operational part.
Most bachelor and master programs are focused on building things, not running things. So, there is a huge gap. I really don’t know a lot of educations where you can actually become site reliability engineering, or that kind of roles, they just don’t exist as far as I know.
If we’re educating people to a point where they become software developers, and their goal is to write code, and get that out there, then we’re doing something wrong with education.
The second part is, it’s not only education which is a problem, it also is how a lot of the software projects are being financed. If you ask the average business, they want functionality. If you give them a €100,000 budget, they’re going to spend €99,999 on functionality. Then they’re going to realize like, “We need to keep this maintained and up and running for the next couple of years.”
Why do organizations not spend money on testing?
They realize they have one euro cent left to do so, and that’s going to be a problem.
Even the bigger problem is when somebody asks you, “Can you build this piece of software?”
You go explain them and say, “Sure, we can do this, we’re going to spend this amount of time into writing it, and this amount of time testing it.” They’re going to say, “No, no, no, no, don’t spend this amount of time testing it,” because they don’t understand how software deployment works.
They don’t understand that it is needed, and your testing budget from that point of view is always going to be under pressure, you’re never going to have sufficient budget to test it the way you want.
I haven’t even touched on security yet, that’s an even larger not existing budget, because it’s built in, it’s by default. That’s what the average organization expects, so not only schools and universities are not teaching people how to build software, and how to do the whole life cycle around it.
We also fail to teach the people that come out and that end up in the business that, hey, you need to think about than just building the code, and just having the functionality there. You need to talk about those non-functionals, you need to talk about how to do testing and how to automate your testing.
If you don’t do that, those people will not understand.
Still, on a daily base, you have customers and organizations internally struggling with budgets, because we’ve allocated this amount of money to build this, why do you need 10 times more? That’s the daily struggle.
So if you have that problem, schools are not going to tell you, “Hey, you need to teach testing,” because they see like, “Yeah, but nobody spends enough time on it, it’s not important enough for us to spend time on.”
Schools don’t teach testing because they see that nobody spends enough money and time on it.-Kris Buytaert, consulting CTO at Inuits.eu
That kind of gets you in a situation where people are not going to be taught about it, because other people are not paying for it. It’s a sad situation.
Do you have any tips and tricks of managing upwards and helping engineers in organizations to get those budgets, or reassign those budgets to things that are important?
Yes and no. So the scary part is I’ve got a couple of cases where the actual transformation started happening after a near-dead experience. Senior management realized that they needed to start managing things differently, because if they kept going the way they were going, they would go out of business.
So, that kind of near-dead experience, like having experienced real pain and having seen the almost disaster level business impact, it’s a really good way to let people think about how to deal with test coverage, how to deal with resilience, how to deal with availability, and quality in general.
But, let’s hope that not every organization has to experience that near death thing. The thing that does work, however, is if you have product teams where the product owners are involved in creating the backlog and stuff.
That you put those people on call, because those are typically the people who forget to prioritize nonfunctional requirements, who forget to spend time on test coverage.
Once you get those people to be woken up at 3:00 AM on a Sunday morning, because there’s yet another out of memory issue, which the previous group of on-call engineers has been nagging about for at least a year, that it needs to be fixed because they’re being paged.
When that happens for the third Sunday in a row, that is when those product owners start realizing, “Why am I being paged for this? This needs to be fixed,” and I experimented with that concept with a couple of teams already a couple of years ago, six, seven years ago in the Netherlands.
We saw change happen, because you see those non-functional requirements, you see those bugs that are basically getting people out of bed, you see them eventually being solved.
That improves the quality, and that means that people are finally starting to realize like, “Well, there is a hidden cost for people being paged, there is a hidden cost for having to use much more resources than we should be using, because there’s resource allocation issues in our code.”
I think that is really a good tip, if you can get those people to be the ones who get woken up when there’s a problem, things might change.
If you can get those people who are being woken up at 3am because of a memory issue to realize that there’s a hidden cost for people being paged, things might change.-Kris Buytaert, consulting CTO at Inuits.eu
How to get budget for DevOps
Yeah. When do people change their diet? After they get a stroke or a heart attack, that’s the wake up moment of changing.
A question in this regard. As our teams hit a milestone, we end up rediscovering and reevaluating the budget for maintenance. Just today, we have been talking that we’d want to have a number of metrics regarding our databases or how systems are operating.
But they’re being pushed under the carpet for a long time. So we were talking, okay, how to allocate time, how to have those iterations every N weeks.
Every fifth week we have a budget for a week to focus on something. And then we do that over time, and then we have to reinstate that budget. What is your experience in that area?
I’ve seen the same, not with doing it every couple of weeks, but it’s really hard trying to say, “Hey, we’re going to spend 10, or 15, or even 20% of our budget into actually doing continuous improvements”.
Because the pressure from the product owners, that they end up pushing features, feature, features. You really have to have a strong commitment from your management, that you should be doing this, and once you have that, it often stays happening.
But then you get this big customer who walks in, and they get prioritized, and it changes again. And even then they get reprioritized.
So, I’ve yet to see organizations that can really do continuous improvement on their code base. It’s as you described in a lot of cases, it goes good for a couple of months, and then it sadly disappears again.
Yeah, so reinstating it, reinstalling the practice is just one of the ways to do it. You having such a vast experience in the DevOps area, looking forward, you mentioned SRE as a role. Which is to my understanding, kind of a newer role, which is in some organizations becoming more of a mainstream?
So, DevOps really never was a role, and the way Google has defined such reliability engineering, it’s pretty much what they claim is their implement of DevOps.
But, if you look at describing what somebody does, such reliability engineering has much better description of what most senior system engineers are doing, as opposed to telling that they’re doing a DevOps engineering role.
Which is still like, “What is a DevOps engineer?” Is that a Java developer who knows how to deploy code, or is that a Linux engineer who knows how to debug stack trace? It’s none of it to me, a DevOps engineer is not a role.
A DevOps engineer is not a role.-Kris Buytaert, consulting CTO at Inuits.eu
So, in a way, if we talk about people in this industry, then business critical engineering, site reliability engineering has much more of an explanation of what people actually are doing or supposed to be doing.
Yeah, you put it nicely and got me thinking. It’s been a struggle with a lot of people that we have been talking to. We are talking about DevOps, and people are saying, “Hi, guys, meeting a customer, this is John from DevOps team.”
There is that struggle of understanding of what are his area of expertise, and what is he responsible for? It ends up varying vastly between teams.
So, when organizations come to me and they say, “We want to do this DevOps thing,” what I typically tell them is don’t call it the DevOps project, call it we’re going to do engineering whatever. Then cherry pick from all the things you’ve seen, what you want to achieve as an organization, and throw that out as a plan.
People might end up saying, “Hey, well, this is DevOps. Okay, good.” But if you say, “Hey, we’re going to do this DevOps thing,” what you’ll end up is six months of discussion about what does that really mean?
Whereas, you say, “We’re going to do faster delivery,” that you have a goal set, these, these, and these are the things we want to achieve.
Those are the steps and you’re on a journey as an organization, and that is basically what DevOps is about, improving yourself, improving quality of your software delivery there.
But, setting out that goal as this is our organizational role, and we call it this, it’s going to be much more struggle less than saying, “We’re going to do DevOps.”
Yeah, thank you, Kris. These are very nice closing lines, the explanation, thank you very much for your time. We all hope the DevOpsDays and all other live events we’ll resume soon, but all we can do is keep our fingers crossed. Thank you again so much for your time.
Have a comment? Join the discussion on the forum