3 Mar 2022 · Software Engineering

    Introducing a Next-generation Parallel Config Management tool

    9 min read
    Contents

    James Shubin is a DevOps/config-management hacker and physiologist from Canada. He writes “The Technical Blog of James.” He also works on a next-generation config-management project he started called “mgmt.”

    In this interview, we discuss Shubin’s open source project and how it can contribute to the cloud-native landscape.

    You’re the lead developer behind mgmt, which is a “next-generation distributed, event-driven, parallel config management” tool. Could you tell me a bit more about the project, and why you think it stands out from other configuration automation tools available on the market?

    Historically, people have thought of infrastructure as pretty static, that it doesn’t change — you just set up four servers or ten servers, etc. The biggest point that’s different now is that we take things and make it very dynamic. Over time, every second of the day, you have constant changes in the infrastructure. You feed all those inputs into mgmt, and the output is also something that’s very dynamic, so we can change any variable inside and have a fast, event-driven, parallel configuration-management tool.

    The three principal architectural arcs of mgmt (parallel execution, being event-driven and distributed topology) make it extremely fast and instantly responsive, which some people might find a bit risky in production. As Julien Pivotto writes in his blog post, “with traditional configuration management, when you screw up something, you still have the time to fix your configuration before it is applied everywhere.” He implies that with mgmt it’s no longer the deal, and “that makes it scary.” Do you think that the config management world is ready to deal with a tool that focuses on creating complex distributed systems that are fast and fully autonomous?

    Yes, it’s very scary, but it’s also necessary. People say, “Oh it’s too fast. I don’t have time to control nor see my stuff.” The truth is even people using Puppet or Ansible or other tools use those tools to destroy their infrastructure as well. If you really want mgmt to be slower, it’s very easy to have it do something and then wait five seconds and then do something else. That’s already possible, but it’s a silly feature that realistically we don’t want to have. The way we compensate for this (and the way everyone should be making and doing it) is that you describe your infrastructure in code that should be as safe as possible, because we want the compiler and the tools that do our analysis of the code to take away all the error possibilities that are feasible before we even compile.

    Take a language like Puppet, for example — the equivalent of the null pointer, or what they call “undef,” is something that’s possible in Puppet, and if your code is running you could have this bug. This kind of error scenario is not possible in mgmt, because we don’t have the concept of null. You always have to have a set value. It doesn’t mean your program can’t have bugs, but it’s just one way that we remove a class of error that makes your code safer.

    You’re using etcd for implementing distributed topology in your solution. In one of your blog posts, you refer to it as “a marvelous piece of engineering.” This piece of software has been accepted as an incubation-level hosted project by CNCF, and it’s a core part of Kubernetes. Could you tell us a bit more about how you use etcd in your project?

    The reason why I really use etcd is that I need an algorithm that is able to do distributed consensus. In the case of etcd, this is the Raft algorithm. It’s very difficult to implement this algorithm at a reasonable scale — it’s a huge project in itself. We need that problem solved because in the cluster of machines we want to be able to make some decisions that all the machines can agree on. Kubernetes needs this, and so does mgmt.

    The way we use it is a bit different. We took all of the etcd code and actually imported it into our codebase as a library — compiled that all in to make a single mgmt binary that includes all that code. The main difference in parallel between mgmt and Kubernetes is that mgmt uses that information to fundamentally work as a distributed system, whereas Kubernetes is fairly centralized in its approach. There’s a single point that decides about stuff in the Kubernetes cluster. It still uses etcd to know whether the machines are alive and where to put things, but in my opinion, it’s not truly a distributed system with multiple masters.

    With mgmt you’ve also created your own language, right?

    Unfortunately yes. It’s one of these horrible things, because for years I knew I needed to solve a particular problem. I had ideas about how to solve it. I knew that I needed to implement the small domain-specific language (DSL) — a language that’s not useful for general-purpose computing — just for this problem. It enables me to execute very few lines of very safe code (it’s less error-prone) to describe infrastructure in real time.

    This is fundamentally an unsolved problem: How do I tell all the computers in this data center, or across the world, what I want them to do? How do you do it? Do you write C code, write JavaScript, write Golang? It’s not an easy question to answer. I decided that for infrastructure I could build this special language that answers those problems. I think it’s successful. There’s still a lot more to come in the language, but it does cool things that make it very well suited to building safe and fast infrastructures.

    Do you consider mgmt in the future as somehow contributing to the cloud native landscape? Do you think it has something substantial to bring to this table that other solutions haven’t?

    I think this sort of ties into your previous question. If you’re gonna use Kubernetes, for example — Kubernetes developers have written a ton of code, and the way they want you to tell that code how to behave is by picking from one of a hundred or a thousand different switches basically in a YAML file. Here’s a giant YAML file; here’s the spec of what it looks like; you can choose these values. That is how they get you to talk to those computers and tell them what to do. The problem with this is that there’s always someone who doesn’t have a feature that they want. So then there’s a “Please add a new thing to do this” situation, which is a big mistake, and the question is, “Is it really flexible enough?”

    Mgmt lets you build your own thing. Let’s say you wanted to actually build something that worked exactly like Kubernetes does. You could theoretically write your own little module in mgmt code that basically behaves how Kubernetes behaves, and that would just be one software project. Someone else could build something entirely different for their needs. I think this is going to be the big defining feature. We’ve put quite a lot of effort into the module system so that ultimately someone who isn’t me, and has some idea for a big piece of their infrastructure that’s generally usable across many different companies, will be able to implement it. I think people will eventually start writing these big modules, and that module itself can be a standalone product, which is powered by mgmt.

    You’re a big enthusiast of DevOps and Open Source. Do you think these two worlds can go hand in hand in the long run? As you write in your blog post, “I’m not against using or contributing to permissively licensed projects, but I do think there’s a danger if most of our software becomes a monoculture of non-copyleft, and I wanted to take a stand against that trend.” Does which Open Source license you choose for your project influence its adoption process?

    This question is a big one, and then we’ll never get to the end of it. Fundamentally the choice of license is the constitution of your community. It’s the legal promise. That is what you tell your community: “This is how we want to be treated, and this is how we will treat you.”

    I think that diversity in licensing choices is very important. Historically, there are what we call permissive licenses such as the Apache v2, the MIT license and others. Then there are stronger copyleft licenses such as the GPL v3, v2, LGPL and the AGPL, which is the very strong copyleft license. By putting mgmt under the GPL, I’m effectively telling my community, “I guarantee to you that I’m going to keep my work open and you actually benefit, because if you send the patch you’ll have a guarantee that your code will not be taken proprietary.” Otherwise, it’s basically getting free labor; and some people are okay with that, but other people aren’t. There’s even a joke: “Free software is free as in freedom. Open source is free as in labor.” A mean joke, actually.

    I want us to do the right thing. It’s unfortunately very difficult, because at the moment, I’m just living off my savings, and I’m not getting paid to do any of this work. At some point, I’ll run out of money and I’ll have to find a different plan. I’m hoping that I’ll find some funding and people who want to donate time and money, patches to the project, because if we don’t then we just end up with proprietary solutions that aren’t self-funded.

    Looking to deliver your next project with Kubernetes?

    Sign up for Semaphore’s free ebook on CI/CD with Kubernetes

    Article originally published on The New Stack.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Avatar
    Writen by: