Krzysztof Zalasa from Google Cloud about solving real business challenges with GCP

In this Semaphore Uncut episode, I chatted with Krzysztof Zalasa, developer and customer engineer at Google Cloud. Today, he shares with us the insider news on what’s hot in the cloud. Listen to our chat or watch it on youtube below:

Connect with Krzysztof (@krzysztofzalasa ) and me (@darkofabijan) on Twitter.

Watch this Episode on Youtube

Edited Transcript

Darko Fabijan: (00:16) Hello and welcome to Semaphore and Cut, a show where we talk about developer tools, experiences, and the people behind those products. My name is Darko, and I will be your host today. Feel free to subscribe to our channel if you are watching us on YouTube. And with me, I have Krzysztof Zalasa. Thank you so much for joining us!

Krzysztof Zalas: No problem. It’s a pleasure.

Darko: Great. Let’s give you a chance to introduce yourself.

Krzysztof: Okay. I’m working at Google Cloud as a customer engineer. During my daily activities, I’m supporting customers of Google Cloud in terms of architecture, top components and solving technical challenges.

During my past year at Google, I saw tens or maybe even hundreds of solutions architecture. I discussed it with customers and supported them in amazing challenges.

Darko: And prior to joining Google you worked as a developer?

Krzysztof: Yes. Before joining Google, I worked as a software developer. Then as a solution architect, cloud architect–things like that. But I’ve been writing software for more or less 20 years. For now, my primary language is Java, but sometimes I also use Python and Ruby. That’s everything I need at Google, and it’s also my personal opinion that developers should learn every language they can. They just need to be open to learning new tools.

Increasing demand for voice recognition software

Darko: (02:46) Our viewers might be interested to hear what patterns you are seeing in architectures. What is something that is trending this year?

Krzysztof: Yeah, I see some patterns. Something that is not obvious to some developers is voice recognition. They think about writing code in Java and Python, for example. But now, there is this new interface that is very interesting.

I’m the father of two sons, and they are too young to write something in a way a mature person can do it. But they can ask, “Hey Google, please play Optimus Prime on TV.” And it just works.

So it’s opened my mind to become more familiar with Google assistant. I believe it’s very important to be familiar with voice recognition and have control over your software using voice commands.

Of course, it’s not so easy to implement, so many customers ask us at Google Cloud for help. It’s quite hard to define it as a software when the software will be able to recognize voices, especially in 100 languages. It’s a very hard topic and it’s something that you can just use pre-built API to build real apps. I was really amazed that developers reach out to me and ask how they can implement crazy ideas with this software, like how to blow a balloon. You just blow into your microphone, and the balloon on the screen is growing. You can implement these kinds of ideas just using voice control.

I believe it’s the future, and real businesses are interested in it. For instance, Enterprise Rent-A-Car is focused on using voice recognition in their call centers. Let’s imagine that you need to hire people speaking in 20 or 30 languages who also have a technical background. It’s very challenging to support peaks of calls in this kind of environment. Voice recognition brings a new opportunity to solve these kinds of issues.

And it answers a very important question. At Google, every single person is asking one question: Does it scale? It’s not only about Gmail or other apps. It’s about every single process we are doing. So, I believe it’s a very nice way to really scale something, and often leads to a broader audience. As I said, my kids are able to watch the exact cartoon they want, and they cannot read or write yet. So it’s really, really amazing.

Everybody is asking about Kubernetes

Darko: (12:04) Great. Let’s move on to something which is affecting most of us–Kubernetes and the cloud. Are there any patterns with Kubernetes that you are seeing?

Krzysztof: Yeah. Obviously, Kubernetes is also on my list. It’s also part of almost every single conversation with customers. So for now, you won’t be surprised when I tell you that every single app development team is thinking about microservices. Every single modern app which is created is prepared using a microservices approach, and obviously, docker is behind it.

In general, I believe that most people will agree that it’s the industry standard. The standard case when you’re thinking about Kubernetes is to deploy the application with microservices. Some developers think that it’s hard to manage connections between them. This is where a service mesh comes into the discussion. People are experimenting with console or with Istio, and it brings connectivity, security. A lot of things are simplified.

I’m also working with a partner engineering team which brings software like Elastic to Kubernetes to install it with just one click. Or you might also use Helm if you wish, but it’s quite easy to install apps. A developer might create a cluster in one command-line interface. You might deploy MySQL, Elasticsearch, or everything with a few Helm commands. Then you have a running environment. Let’s think about how challenging it was ten years ago. We failed upgrading operating systems. We failed tens of tests.

I don’t know if you had the chance to hear about Agones. It’s our open-source extension for Kubernetes, which makes it possible to teach Kubernetes how to manage game servers. When you are building games, game servers are stateful. It’s not a stateless workload.

Of course, you have stateful sets in Kubernetes, but usually stateful things are a bit more challenging in Kubernetes. There is an add-on to Kubernetes which you can extend its capabilities and teach Kubernetes to support game servers if you have a special kind of workload.

The importance of site reliability engineering

Darko: (16:52) Going back to the section about Kubernetes, you mentioned Istio. I can share a bit of our war story. Last year when we were preparing a new version of Semaphore to be launched, we spent maybe a good month before deciding it cannot go into production without Istio. We said, “It will not work, we cannot manage it, and we don’t know exactly what is happening with the communication between those services.”

We had to learn Istio, and we were able to get the data from proxies that are sitting all around the place. Now we have some charts and graphs and rates and all that. Now we understand what’s happening.

Krzysztof: Yeah. So that’s one more topic which maybe it’s worth discussing: Site Reliability Engineering (SRE). It’s about keeping your application alive in production. For some developers, it’s important, for some, it isn’t. It depends on your team and your background.

But I believe that even basic background and basic knowledge about the servers is very important for every developer. It switches your mindset a bit. Not only to create very good clean code but also to create something which will be possible to maintain in production with the desired level of reliability. If you are working on a very critical part of your app, such as logging into your system and it’s not working, it’s probably not good for you.

I believe SRE is also a very good topic to at least watch some presentations on or maybe read docs provided by Google. In previous roles, I had challenges where we tried to get our management to implement more features related to reliability, but they told us, “No, no. Features, features, features.”

When it fails in production, it was annoying for both sides. Site reliability engineering provides service level objectives and an error budget. There is a clear numeric indicator. We might want to focus on speeding up development or put more attention to our reliability and the scaling issue, backups, etc.

I believe it’s fine for developers to explore this because it’s about using development practices; not just a bunch of scripts, but real software which solves challenges. It also makes your life easier in terms that it defines how to handle on-calls, like how to implement good monitoring to not have to wake up at three a.m. for no good reason, right? No one likes it.

Testing practices and high availability for CI/CD

Darko: (24:17) I have an idea for the last question; maybe I can use you as a consultant. There is the approach of building your system using a number of microservices. So you might want to spin up in your cluster, deploy everything there, run end to end tests across everything.

But what I discovered over time is that it’s not always practical. You end up just deploying your canary release, getting, some traffic and deciding whether you want to push it further or not. Between those two approaches and working on the field with many teams, how would you say that those two approaches compare?

Krzysztof: What I observe that customers who are using this canary releases, for instance, they are developing features faster. Because it’s quite complex to synchronize 20 microservices, developed by field teams. I also have customers who have tens of development teams. Let’s imagine that they need to synchronize their releases and conduct tests. When some of them are pushing to production about 20 features per day.

Krzysztof: Without CI, it’s very challenging to keep your integration tests working because every single week there is a new feature added to some services.

Darko: Yeah. You could say that CI, Continuous integration, is one of those things that need absolute reliability. Your CI system is pushing your test and your builds. “Please make it reliable; features will come.”

Krzysztof: It’s a very interesting topic. Customers who are not using CI/CD, but they are using, for instance, Jenkins. These kinds of tools. Usually deploy a single instance, a single Jenkins master, in a single region with not HA, and then implements the whole deployment process.

Krzysztof: Second thing, when the single instance is down, they have no option to release. From my point of view, and my work in previous roles, I prefer software as a service pipeline. You define them, put in a software as a service solution, and just use it. This solution provides an SLA for you. It’s not your responsibility to keep it alive. You might just think of your business and how to add value to your app instead of thinking about how to add the reliability into your continuous integration.

Darko: Yeah, yet another system that has to have really high availability.

Krzysztof: Business people in most companies are not asking, “How is your deployment process working? Is it reliable?” They are just thinking about features and fixing bugs. But if we cannot release because our CI/CD is broken, they start thinking about it. So I strongly encourage you to start thinking before you get into trouble.

Darko: It was a very interesting conversation. All the topics that we touched upon. The first one about voice and machine learning. Still, they feel like science fiction. Although they are in your living room.

Again, thank you so much for joining us. Good luck with all the challenges and architectures that you are going to tackle in front of you.

Krzysztof: Thank you. It was a pleasure to meet with you to discuss a very interesting topic, and if you have any questions, please use the comments on YouTube.

Darko: Yeah, sure, please do. And subscribe, if you haven’t already. Thank you Krzystztof.