Last week was my 20th work anniversary - yippie 🥳 - and I decided to celebrate it by writing about one of my favorite topics. I can now say that I have been around this tech world enough to be underwhelmingly impressed by the chants of cloud vendor lock-in fatalists. Just in case, I am talking about this illusional dependency on a cloud platform for services, unable to easily switch to another platform without substantial costs or inconvenience.
Undeniably, the threat of being tethered to a single vendor casts a long shadow over numerous IT departments and tech companies. However, in the quest for operational freedom, these teams might be falling prey to the paradox of overengineering against lock-in, ironically trapping themselves in a web of complexity that serves little practical benefit.
The high price of overpreparation
Let’s break down this conundrum through the lens of a decision model chart that Gregor Hohpe recently introduced in his brilliant talk "Do modern cloud applications lock you in?" at AWS re:Invent 2023.
The graph plots the total cost of ownership against a backdrop of different strategies—ranging from Naive to Overinsured. It suggests that a Sensible approach balances upfront investment against the potential liability of switching costs, multiplied by the likelihood of actually needing to switch services. The key takeaway here is that the fear of future costs (liability) can lead to an overinvestment in flexibility that may never be utilized and an underinvestment in today's productivity. Even at the risk of sounding pedantic, I put this differently in the book Building Software Platforms:
By not leveraging the system defaults (in this case, the underlying cloud platform), there is a risk that you take on an overhead that you would not have had to worry about had you stayed with the widely used platform defaults. Arguably, this overhead will become more expensive than just writing the code twice in the unlikely event of portability.
The curves on the chart above illustrate how the total cost comprises both the upfront investment (Option price) and the potential future costs (Strike price). As organizations move from a naive to a sensible approach, they invest more upfront, presumably in more open, flexible solutions to mitigate future switching costs. However, past a certain point, this investment does not bring additional value but rather introduces unnecessary complexity and cost—hence, the curve bending upwards again, marking the zone of overinsurance.
For example, AWS offers many services that could cause some to fear lock-in. Services like AWS Lambda allow for serverless computing, meaning you pay only for the compute time you consume, leading to better scalability (and arguably cost savings). However, the unique nature of serverless architecture could tie a company to AWS's implementation. That's why some IT departments and tech companies respond by designing their own containerized solutions or using abstracted orchestration platforms that promise easier migration to other serverless runtimes or on-premises solutions. Or even worse, they respond by creating another custom abstraction layer on top of the cloud platform service. But, the graph warns us — this might be an overcorrection. Especially when, in Wardley mapping terms, serverless services are a commodity because they are based on standards, which is why they are primarily interoperable with very few corrections needed. But more on this later.
This graph symbolically represents the trap of overthinking vendor lock-in. The pursuit of absolute flexibility and avoidance of cloud vendor lock-in can lead to a significant increase in complexity and upfront costs. This can be counterproductive, as it hampers agility and diverts resources from innovation to the maintenance of unnecessarily complex systems. And this is where the corollary of Gregor's law kicks in:
More options are desirable, but wanting to have all options all the time will result in unnecessary complexity, as is often the case with overly elaborate abstraction layers or massive configuration frameworks.
Productivity before portability
In this re:Invent session, Gregor Hohpe also highlights the importance of immediate utility over potential future flexibility. In other words, productivity should take precedence over portability, as being unproductive can be more detrimental than being locked into a particular vendor. Fast-moving companies don’t bog themselves down with complex portability frameworks; their focus is on delivering value quickly.
Moreover, while using cloud-managed services based on open-source projects can offer a middle ground, it is crucial to consider how these services fit into the broader picture of the company's operational velocity. Reducing undifferentiated heavy lifting is beneficial, but it must be balanced with keeping switching costs reasonable without overcomplicating the infrastructure.
Design patterns and hexagonal architectures to the rescue
We need to stop approaching our architectural solutions in terms of cloud services available in a catalog and start thinking (again) about design patterns. Focusing solely on platform services can cause a loss of the application’s design intent and inadvertently lead to a different form of lock-in—mental lock-in. Instead, thinking in design patterns preserves the application's intent and facilitates more straightforward transitions if and when the need arises.
A good way of looking at this challenge is through the prism of the evolutionary properties of hexagonal architectures, as originally described by Luca Mezzarilla in his fantastic blog post Developing evolutionary architecture with AWS Lambda. This pattern is used in software design and aims at creating loosely coupled application components that can be easily connected to their software environment by means of ports and adapters. The principles of hexagonal architectures can help you isolate the domain logic of your service component from the implementation details of the underlying cloud platform.
Designing hexagonal architectures requires a little bit of upfront time and investment since careful thought is needed to create a good design. But, in any case, this design exercise is guided by a change-friendly pattern that will give you some guard rails and future-proofing benefits. In the unlikely event of platform portability, the changes in your component will be concentrated mainly in two dimensions:
- Input adapters for transforming potentially new message formats used by the clients accessing your component in the new platform.
- Output ports for accessing the new cloud services as per the new programming interfaces. Although the underlying protocols may still be based on the same standards (e.g., HTTP endpoints) and the semantics may be equivalent, the developer experience may differ.
Conclusion
In summary, IT departments and tech companies must find a balance between the need for agility and the fear of vendor lock-in. A sensible middle path that values immediate productivity and utility without completely disregarding future flexibility is essential. After all, the ultimate goal is to deliver value, not to construct an impenetrable fortress of options that may never be exercised.