In this episode of Semaphore Uncut, I talk to Michael Grinich, founder and CEO for WorkOS. We chat about how software companies fall foul of the ‘Enterprise Chasm’, and how WorkOS helps them cross it. We also learn about the engineering approach that the WorkOS team takes to maintain the reliability of their service, which is critical to their users.
- How to cross the ‘Enterprise Chasm’
- WorkOS enables focus on core product experience
- WorkOS unifies a fragmented space
- Building an infrastructure SaaS takes rigour
- Dependency on ‘black box’ APIs requires constant health monitoring
- Build personal connections as well as data connections
- CI/CD enables a fast bug response loop
- Maintenance of critical infrastructure needs a special approach
- What’s next for WorkOS
Listen to our entire conversation above, and check out my favorite parts in the episode highlights!
Like this episode? Be sure to leave a ⭐️⭐️⭐️⭐️⭐️ review on the podcast player of your choice and share it with your friends.
Darko (00:02): Hello and welcome to Semaphore Uncut, a podcast for developers about building great products. Today, I’m excited to welcome Michael Grinich, founder and CEO for WorkOS. Michael, thank you so much for joining us. Please feel free to go ahead and introduce yourself.
Michael: I’m the founder of WorkOS, which is a company that provides APIs for really easily adding enterprise features to your app. So, things like single sign-on, directory integrations, SCIM provisioning, all these integrations with IT systems that you need to go close enterprise customers, they’re complicated to build, we just make that API that you integrate that’s really, really easy to add those features to your app.
How to cross the ‘Enterprise Chasm’
Darko (05:51): There is a great talk by Michael that I discovered during preparations that I was doing for this podcast, it’s called Crossing the Enterprise Chasm.
Michael: The talk Crossing the Enterprise Chasm is all about when you build a product and you decide to go upmarket and go sell to bigger customers. The chasm exists between those early adopter customers who don’t really need the security features and the more mid-market or larger customers who need things like single sign-on or SCIM integration for directory sync and provisioning. Or they need audit logs, or access control policy, or all these different features. And how companies get stuck in this mode because it ends up needing to be something that you ultimately have to focus on to get those customers.
It turns out crossing the enterprise chasm, it’s more complicated than just an API. What we provide solves a lot of engineering time. And it’s just SaaS, you just plug it in, you’re good to go. But there’s a lot of other things that you need to do when you make that transition, including change how you price the product, change your packaging. So potentially have different features and different levels, you need to think about your marketing, and your messaging, and your overall go-to-market strategy, and even think about hiring salespeople. WorkOS doesn’t help you hire salespeople, we just do the infrastructure, but that talk is a great overview of many of those concepts.
WorkOS enables focus on core product experience
Darko (07:51): What are some of the challenges that people might solve by using WorkOS?
Michael: WorkOS is very much like any other infrastructure SaaS company, and just how the economics fall out is a win-win on every side. So, say you’re you guys, Semaphore, you’re moving upmarket, and you have a handful of customers that are saying, “Hey, we want SSO. We need to have integration with Okta.” And some people are saying, “Hey, we need Microsoft SSO,” or some people are saying, “We need OneLogin, or Ping, or these other SAML systems.”
And so you take an engineer, and you say, “Okay, go spend a few weeks building this.” So usually, that SSO code that you write is written quickly, it’s kind of a one-off, it’s usually done for a single identity system, and you’re just trying to get it done as fast as possible so you can get back to building Semaphore, the core product experience that you have.
What that ends up resulting in is a pretty brittle integration and only with one identity system. And so you have to go back, say a few months later, and then build the Microsoft one, and then try to generalize it.
And those integrations with these enterprise identity systems or directory systems, although they don’t feel like your core product experience, they’re some of the most important code that you’re writing because all of your enterprise customers are going to interface through them. There’s this hard trade-off where it’s something you don’t really want to spend time on. You shouldn’t. You should focus on your core product experience. And that’s where WorkOS can step in and actually provide a best of class solution.
WorkOS unifies a fragmented space
Single sign-on is the most important thing people need first. It’s the first thing, probably this is what people have asked you for at Semaphore, but there’s a lot of other features, like I mentioned, like the directory or SCIM integration, plugging into the fragmented HR systems out there, audit logging for compliance, and data retention, and that whole world. These are the features that we get excited about, we obsess over. So if they sound boring to developers, at least the good news is you won’t have to think about this if you use WorkOS.
Darko (12:51): Yeah, I remember just starting to dig through the LDAP a couple of years back, and I guess very few people who are listening to this will want to dig into that spec.
Michael: SAML’s even worse. SAML, I feel like there are maybe five people in the world that truly understand XML canonicalization. It’s just such a hard thing. And the problem with SAML is the spec is really, I don’t want to say poorly written, but it’s under-specified, so people implement it in many different ways. There’s not a reference implementation or reference test. And so we’ve seen things with SAP’s SAML connector is totally non-standard and different than how Okta provides SAML, or Microsoft has a different protocol called ADFS. Then our job at WorkOS is to plug in and unify all these things together. The unfortunate thing with these open protocols is that they end up being defined early on with the IETF and put into some kind of spec or standard, but then it ends up being hyper fragmented, and everyone is slightly different.
Building an infrastructure SaaS takes rigour
Darko (14:33): What are some of the technical challenges that you are seeing in this phase of creating WorkOS?
Michael: I just find it so rewarding to build tools for developers to see what other people can build on top of it. One of the challenges around this is that there’s just a different level of seriousness you have to have around correctness and security right out of the gate, especially with what we’re building around authentication. It’s some of the most mission-critical code that your app might interface with. There’s not really a ship it fast and hope it doesn’t break attitude, you have to just be really rigorous around everything. And you can just incorporate that into the way you work, but it involves a different level of maturity or seriousness around building infrastructure like that.
And that sometimes can be a challenge, especially for engineers who haven’t worked in that kind of environment or system. So that’s one thing, and we’ve done all the stuff around external compliance, like SOC 2 Type 2, and we went through a code audit and security review, and all that stuff we do great on, but there’s also the internal attitudes, and internal IT security, and how we secure our own systems, which is in some ways more important than just the external certification.
Dependency on ‘black box’ APIs requires constant health monitoring
The other thing I would say is that there’s a challenge in building a system like this that’s an integration layer. WorkOS is this integration plane that plugs into many different systems where we’re engineering against other components that we can’t really test end to end. It’s almost like Black Box. When we plug into Workday, for example, which is one of our integrations, we don’t have a reference implementation of Workday, and Workday keeps changing.
We’re constantly adapting and plugging into these unknown systems. The way you develop a system like that requires a lot more just fundamental observability and monitoring into the system. So you know at any moment the health of how it’s behaving. And you can think about it as an organism that’s always evolving in some way for the market. And that’s just very different than if you build, say, a Rails app or a web app where it’s all self-contained, and you can mock everything and test everything end to end. We end up having to write a lot of code to ensure correctness and just build this robust system interfacing with third parties. That’s also a different challenge. It’s something they don’t really teach you how to do in school, and it’s pretty unique.
Build personal connections as well as data connections
There’s no silver bullet, it just requires a lot of understanding of that problem and focuses on it. There are a couple things that we’ve done. I think the first thing is we’ve built relationships with these companies. So we’ve talked to them and said, “Hey, we want to work with you.” So funny, sometimes when people see WorkOS, they ask me, “Aren’t you guys competitive with Okta?” And we’re not at all. Okta’s actually probably our top integration that people plug in to get Okta through WorkOS. We know a bunch of people at Okta. I’ve talked to Todd, the CEO, a handful of times already.
And it’s funny, people at Okta actually send customers to WorkOS because Okta serves the IT admin, and WorkOS helps developers plug into these systems. I think one thing is just being open and connecting with those people in the ecosystem and saying, “Hey, we’re not competitive with you. We want to work with you.”
The second thing is we just honestly test the hell out of all this. And oftentimes, this requires creating test credentials for so many different services. And a lot of these systems aren’t used to developers testing, and so we provide developers test credentials for Okta, our customers can have a test run through there, but for something like BambooHR, for example, when we built that integration, we just kept signing up for trial BambooHR systems, and we have that in our CI/CD. I think it’s running the tests through Semaphore. And the poor engineer on our team that kept setting these up, the Bamboo sales reps keep calling and be like, “Hey, are you buying BambooHR?” He’s like, “No, I’m just a developer. I’m plugging into your system,” but that’s the downside is actually working with all these different systems.
And we internally have a list of test credentials for all the major HR, and identity, and directory systems out there. A lot of it is just the hard work of doing that testing, which means that our customers don’t have to test it. We do that for them.
CI/CD enables a fast bug response loop
Darko (20:29): What’s the approach that you do there? Are you really doing end-to-end everything with live providers before deploying to production?
Michael: There are different strategies to take. We build unit tests, have integration tests, have smoke tests, and have UI tests that actually run on our dashboard application. We have another app called the admin portal, which is the UI that’s shown to IT admins to set up and configure stuff. So it’s like self-serve setup SAML. That has tests in it end to end. Yeah, we run tests using live credentials against these third-party systems. But the thing is oftentimes, when things ultimately end up breaking, it’s not a bug that we introduced, it’s one of these third-party systems that we’ve plugged into that’s changed in a way. The testing in that regard is often just alerting of the issue. It’s actually not can point to a line of code where the issue comes up.
Our testing philosophy or our testing methodology includes a lot of observability and bug alerting and how our team responds to those incidents. And we have a pretty fast loop around identifying an issue that has come up in terms of a broken authentication or broken provider and fixing that within minutes. But I would say the reason that we can focus on that is because we have such robust underlying CI/CD and our CI tests are so comprehensive. When we’re introducing code, we’re really confident we’re not introducing a regression there, and so it’s a really key part of how we develop. I actually don’t know the numbers, but I wouldn’t be surprised if half our codebase is tests.
Maintenance of critical infrastructure needs a special approach
If you think about WorkOS as an API for doing authentication, any API request that fails could be a user failing to log in, ultimately. And so we have a really, really high bar in terms of those APIs succeeding. And there’s a lot of infrastructure stuff we’ve done so that requests are never dropped even when we do deploys if it has an intelligent routing layer that handles that. But essentially, any time there’s a 500 that gets thrown, it’s a hair-on-fire problem for the team to drop everything and go work on because keeping that core utility and that core correctness up is the most important thing that we do for our customers.
That’s just a different modality of software engineering. You’re operating on this live beast as it’s moving, and it’s a little bit different than say if you’re working on a static analyzer or a compiler, and you can just put your headphones on, and listen to some techno, and just write code for three weeks. That doesn’t really work for building a system like this for customers.
I think the interesting thing about Semaphore is a lot of people in the past haven’t thought about testing, or their integration system, or the deployment system as being the critical path, but as more and more companies move towards continuous deployment, and this is just the model, it’s effectively an outage because you can’t ship new code and it stops your team. And so I can definitely tell you if Semaphore went down, our team would not be happy, and it would stop us from being able to update quickly and serve customers in that way. That’s a different mode for modern companies.
What’s next for WorkOS?
Darko (26:15): What are some of the next steps for WorkOS in terms of features or integrations? What do you see as a next big thing?
Michael: It turns out the world of enterprise features is just a huge laundry list of things that companies need. And you can almost see this if you want to guess what WorkOS’s roadmap is, just go to an established SaaS company like Slack, go to their pricing page, look at the enterprise pricing, and look at all the bullet points there, features. Those are all the things that we’re building as APIs for customers. So there’s a tremendous amount to build.
One of the exciting things about building WorkOS, why I find it so satisfying, is we get to enable developers to spend time working on the features that are interesting to them, that are more unique. You jump to the finish line for all of your enterprise features by using WorkOS. My dream is to allow people to focus on their unique features and build more apps and get them used in more companies. And for WorkOS, just to be infrastructure that’s powering that behind the scenes.
Darko (28:55): Great. Let’s then wrap up with that. Thank you so much.
Michael: If anyone’s listening and they want to chat about this stuff, I’m on Twitter, and my DMs are open, so just feel free to shoot me a note. I am totally geek for this enterprise feature stuff, as I’m sure you’ve heard, so reach out any time. I’d love to chat with anyone about this.
Darko (29:33): That’s great. Thank you, Michael.
Have a comment? Join the discussion on the forum