The evolution of distributed systems has normalized the benefits of cloud systems. From an end-user perspective, collaborating in real time has to feel smooth and organic. Still, developing one of these applications isn’t as clear-cut. In this article, we hear from Adam Wulf, one of the developers behind the whiteboard application Muse, on the many steps and nuances of building a real-time collaboration platform and the challenges he and his team faced.
Since he founded his first software company right out of college over two decades ago, Adam Wulf has steadily jumped from one project to the next, yet always prioritizing working in a small team. In this way, he is now part of the development team behind Muse, an Apple ecosystem whiteboard application for brainstorming and real-time collaboration.
In Adam’s own words, what sets Muse apart from other whiteboard applications is that “it lets you organize your thoughts spatially.” Instead of offering a single whiteboard as a tabula rasa for your content, “you can actually put in small cards that represent other whiteboards” and nest your ideas as you see fit. Adam understands that Muse “is going to be a team real-time digital office space, for a metaphor, where you can just walk in, see what people are working on, you have all of your content there, and you can easily link out to the rest of the company as well” What’s more, it also supports integration with popular work tools such as Slack.
Muse is mainly used in small teams by product and project managers and designers. For this reason, Muse use cases include:
- Determining how the user experience looks like
- Figuring out the product pathway and strategy
- Evaluating the company strategy
- Collecting customer feedback
- Studying new features
While Muse started as a single device and user application, the new possibilities and benefits distributed systems had for end users couldn’t be ignored. Likewise, the expectations for whiteboard tools demanded a different product, so Adam and the rest of the development team refactored Muse to support single-user multi-device synchronization first and later on real-time collaboration between users.
Old world and new world: Migrating the database
As the first step in the process, they had to migrate the application data to a new database format compatible with synchronization. To store application information offline, the original Muse application used Core Data, a framework for storing application data offline. However, since Core Data didn’t provide built-in support for synchronization, they had to migrate the application data to a new sync database backend.
According to Adam, they had two options for this kind of database migration. They could either reuse it by “ripping everything apart” and putting it all back together hoping it will work, or “make a series of lots of very tiny changes” and slowly and gradually abandon the former framework and move toward a custom-made new one. Adam and their team stuck to the latter option since completely restructuring the codebase and rebuilding it from scratch could result in a messy codebase.
Slowly walking into the new world started by replacing Core Data concrete classes with protocols. Adam explains that they slowly migrated all the Core Data into one particular implementation of protocols in their application. They then gradually implemented a new version of those protocols in the Sync database layer, slowly moving one interface at a time to another until they successfully migrated everything. While Adam admits it was a challenging transition, he recognizes that the slowness of the process made it possible to deal successfully with the complexity of their application.
Shoring up the new database
On the other hand, keeping unit testing was also a big part of the process. Because the entire database structure was brand new, there was a risk of bugs in the new system. By thoroughly testing the new Sync layer, developers were confident that the user’s data and system behavior were safe after migration, and that this new layer could perform all the necessary functions required by the database system, including data storage and synchronization.
Additionally, Adam says they needed to acknowledge the different issues that come with synchronization: “In the Core Data world, if you delete content, you can just delete it and it’s physically gone from the database; in the Sync world, it’s a lot harder to do that because I can delete something from my iPad that has synchronized to my Mac, that on my Mac I had moved somewhere else.”
Adam found the solution in implementing a conflict-free replicated data type (CRDT) structure. In distributed systems, CRDTs are a class of data structures where data replicates across multiple nodes that may not always be in sync with each other. CRDTs ensure that the data remains consistent across the system, even in the presence of network partitions or other failures. CRDTs achieve this consistency by guaranteeing that all updates to the data structure are commutative and associative, which means that they can be applied in any order without changing the result; eventually, all nodes will converge to the same state, even if they receive updates in a different order or at different times.
Likewise, when changes are made to Muse data on multiple devices simultaneously, there is a risk of conflicting updates. In other words, if two devices make different changes to the same data at the same time, it is possible that these changes conflict with each other and cannot be easily merged. To avoid such conflicts, data synchronization systems often use a “last writer wins” approach, where the most recent update to the data is considered the “correct” one and all previous updates are discarded.
However, this approach can lead to data loss, as the changes made on the other devices are overwritten and lost forever. In a way, CRDTs work like a clock that can tell which device is the latest to have made changes. However, to work as expected this clock needs to be set at the same time on all devices. In the case a device has the time set in the future, other devices would not be able to make changes until they caught up with that time. What they use is a Hybrid Logical Clock (HLC). Instead of relying on a centralized time to synchronize all devices, a hybrid logical clock determines the order of events by combining the behavior of logical and physical clocks.
Putting identifiers in everything
Synchronization also has to make sure that changes made from different devices don’t nullify each other. This could happen, for example, when a user edits a board from one device offline and then performs additional changes on the same board from another device. To solve this issue, the backend organizes the content through attribute-value pairs that label the data. These allow decomposing objects into more granular components, such as the color or size of a particular element. In this way, changes merge seamlessly, and “if two devices are editing the same object from a user’s perspective, they’re very probably changing different attributes of that object.”
The attribute-value pair design also helps the application deal with the load of having multiple users working simultaneously without crashing or having performance issues. The application backend collects many of these granular changes that affect the same object into a pack that is sent to the server. The server cannot read the data itself, just a label specifying its “scope” which identifies the data as part of the same box, e. g. a board, a PDF document, or an image.
The server knows who the users are, their access and edition permissions, and what devices they are using, but doesn’t know anything about what’s inside the scopes. Hence, the server capacity doesn’t have to deal with the complexity of this data. Besides, developers can introduce new features within the data in those scopes without having to modify the server. Adam affirms that this design choice “has made managing scale on the server a much easier problem to handle and can be completely disassociated from the problems of the Muse application.”
The granularity of attribute-value pairs also helps with version compatibility. Version compatibility ensures that users can work together seamlessly, regardless of the version of the application they are using. This is particularly important in situations where users need to share files or collaborate on projects, as it ensures that everyone can access and work with the same data
On the other hand, the Muse team can’t guarantee that all devices of a user are running the same version of the app. However, they can label the content with TypeCode, a four-byte value that is used to identify different data types in Mac applications. TypeCode specifies what the object is about; e.g. a board, an image, or PDF, and if the application doesn’t know the object’s label, it will ignore it and only load the new data from the data types that it recognizes.
Lastly, the Muse team also uses identifiers in requests and objects to help them debug and map, and trace requests. They also use versioning to keep track of changes in different parts of the system while ensuring that different versions of the application can communicate and work with each other seamlessly. If a device is on an older version of the application and receives data for a newer version, it will not be able to understand it due to the lack of Protobuf definitions. In this case, the device will set aside the newer version data and continue to function with the older version data. The device will not be able to use any new features or functionality that have been added to the newer version of the application until it has been upgraded to the newer version; then, it will retrieve the newer version data and operate as intended.
The bottom line
Learn more about Muse at museapp.com. Adam also takes part in different open-source projects in the iOS and macOS development sphere. For UIBezierPath, a class in the UIKit framework that represents a vector path, he shared with us two tools:
- PerformanceBezier: a framework that adds caching into UIBezierPath’s default data structure to increase its performance.
- ClippingBezier: a library that can find intersections points between paths and related operations.
Adam is also responsible for the development of PonyExpress, a Swift package notification library conceived as an alternative to NotificationCenter that implements type safety.
Adam volunteers in the Prison Entrepreneurship Program, which provides entrepreneurship education for felons during their last years of incarceration. Once they are released, they can use these skills to integrate into society.
You can follow the latest of Adam and his work at adamwulf.me.