The Technical Reality of Vendor Lock-In

You inherited a VMware environment. Three years ago it ran beautifully, the licensing was predictable, and your infrastructure team knew every corner of the stack. Then Broadcom completed its acquisition of VMware in late 2023, and within months the pricing model for the platform your entire business depended on was unrecognisable. Per-CPU perpetual licensing was gone. Bundled SKUs replaced the modular options your finance team had spent years optimising. Renewal conversations that used to be routine became emergency board-level discussions.

That is vendor lock-in at its most visible. But the version that should worry you more is the kind you cannot yet see: the API dependencies quietly accumulating in your application code, the egress cost buried four layers deep in your cloud bill, the proprietary managed service that has replaced three open-source components you could have operated yourself.

This article dissects vendor lock-in from the infrastructure layer upward. It explains where it forms, how each major platform enforces it, and what the engineering decisions that actually reduce it look like in practice.

What Vendor Lock-In Actually Means at an Engineering Level

Vendor lock-in is not a purchasing problem. It is an architectural condition that emerges when the cost of moving a workload from one platform or provider to another exceeds the cost of staying, regardless of how poor the current deal has become.

That framing matters because it reframes the question. You are not trying to avoid a bad vendor. You are trying to design systems where the switching cost stays low enough that market forces can keep your vendor honest.

At a technical level, lock-in accumulates across five distinct layers, and most engineers only think about one or two of them.

The Five Layers Where Lock-In Builds

The Hypervisor and VM Format Layer

This is the layer most teams think of first, and for good reason. VMware’s VMDK disk format and its associated ecosystem of vSphere constructs, distributed virtual switches, vSAN storage policies, and NSX network overlays create a deeply proprietary runtime environment. When you build your operational muscle memory around vCenter, your runbooks, your automation scripts, your monitoring hooks, and your failover procedures all assume VMware-specific APIs and behaviours.

The Open Virtualization Format, published by the DMTF standards body, was specifically designed to provide a vendor-neutral envelope for virtual machines. OVF and its successor OVA allow VM images to be described in a platform-independent way. The practical challenge is that hypervisor-specific features such as hardware version dependencies, paravirtualised drivers, and custom storage controllers do not travel cleanly across OVF imports. The standard describes the container, not the content inside it.

OpenNebula, the open-source cloud management platform, addresses this at the orchestration layer rather than the image layer. Its architecture documentation describes a hypervisor-agnostic model where the same management plane can orchestrate KVM, LXC, and Firecracker workloads, and where VM templates are defined in a normalised format decoupled from any single hypervisor vendor. This is the technical basis for why Nubius’s virtualisation platform built on OpenNebula can reduce virtualisation costs by 20 to 50 percent relative to a VMware equivalent: the licensing cost of the orchestration layer is eliminated, and the hypervisor is KVM, which ships in the Linux kernel.

The migration path away from VMware is real but requires planning. Cold migration via OVA export works for straightforward workloads. Workloads that depend on vSphere HA, DRS cluster placement, or vSAN storage policies require a phased approach where the application layer is decoupled from its infrastructure dependencies before the hypervisor is changed.

The Data Layer

Storage lock-in is the most expensive layer when it finally surfaces, because by the time you notice it your data is already where it is.

Public cloud providers build proprietary storage tiers optimised for their own internal hardware generations. AWS S3 uses a proprietary object storage protocol. Azure Blob Storage uses a different one. Neither is interoperable with the other at the API level, which means application code that writes directly to S3 APIs cannot be pointed at Azure Blob Storage without code changes. The S3 API has become a de facto standard and many providers, including open-source object stores like MinIO, implement S3-compatible endpoints specifically to reduce this form of lock-in. But the application-level code change is only part of the problem.

The larger problem is egress pricing. AWS publishes its data transfer pricing at aws.amazon.com/ec2/pricing/on-demand, and the current rate for data transferred out to the internet from EC2 is $0.09 per GB after the first 100 GB per month. Azure publishes its equivalent at azure.microsoft.com/en-us/pricing/details/bandwidth. Inbound data transfer is free on both platforms. This asymmetry is not accidental. It creates a structural economic penalty for leaving that grows in direct proportion to how much data you have accumulated in the platform.

For organisations running analytics workloads, media pipelines, backup repositories, or large-scale transactional databases in a single public cloud, the egress cost alone can make migration economically impossible even when the underlying service is dramatically more expensive than alternatives. This is a deliberate part of the pricing architecture, not an incidental outcome.

Google Cloud recognised this publicly and removed egress fees for data transferred out to the internet for customers who want to migrate away from GCP, but only under specific conditions and via approved migration paths. This was a competitive move, not an industry-wide change. AWS and Azure egress pricing remains in place.

Distributed storage architectures built on open protocols, such as StorPool, which Nubius uses for its distributed storage service, store data in formats accessible via standard block device interfaces. The data is not encapsulated in a proprietary object namespace that requires vendor tooling to read. That distinction is operationally significant when you need to migrate at scale.

The API and Managed Service Layer

This is the layer where lock-in grows fastest and is hardest to detect during development.

Every major cloud provider offers a catalogue of managed services, database engines, message queues, stream processors, and machine learning runtimes, that are built on open-source components but exposed through proprietary management APIs. AWS RDS for PostgreSQL and Azure Database for PostgreSQL both run PostgreSQL under the hood, but the APIs you use to create instances, configure failover, set parameter groups, and manage read replicas are entirely different between the two. Application code that talks to the PostgreSQL wire protocol is portable. Infrastructure-as-code that provisions and configures the database service is not.

AWS Lambda, Azure Functions, and Google Cloud Functions all execute serverless compute but differ in their invocation models, execution environment APIs, event source integrations, and deployment package formats. Code written to the AWS Lambda handler interface does not run on Azure Functions without modification. The Cloud Native Computing Foundation maintains the CNCF landscape specifically to map which projects exist to abstract over these differences: Knative for serverless, Crossplane for infrastructure provisioning, Dapr for distributed application runtime. These are the open-source escape routes from managed service lock-in, and they exist because the problem is real enough that major enterprises fund them.

AWS also offers proprietary services that have no meaningful open-source or multi-cloud equivalent. DynamoDB’s provisioned capacity model, Aurora’s shared storage layer, and the Bedrock model invocation API are designed without portability in mind. That is not a criticism: proprietary depth is sometimes the right engineering trade-off. The risk is that teams adopt these services for convenience and only discover the exit cost years later when their business requirements change.

The Operational and Tooling Layer

Operations teams develop muscle memory around specific platforms. VMware vCenter, AWS CloudWatch, Azure Monitor, and GCP Cloud Console each represent years of accumulated tribal knowledge: custom dashboards, alerting rules written in platform-specific query languages, runbooks that assume specific CLI tools, and automation code that calls platform-specific APIs.

This is the least discussed layer of lock-in because it does not appear on any invoice. But it is often the primary reason migrations are slower and more expensive than the technical complexity alone would suggest. When your senior engineers have spent five years mastering VMware NSX-T microsegmentation, the cost of relearning equivalent Calico or OVN-Kubernetes constructs is real, even if the new platform is technically superior.

The operational analogue to open standards is investing in tooling that abstracts over platform differences: Terraform for infrastructure provisioning, Ansible for configuration management, Prometheus and Grafana for observability, and Kubernetes for workload orchestration. These tools have stable APIs that span cloud providers and on-premises environments. An organisation whose infrastructure is codified in Terraform modules and whose workloads run in Kubernetes has a substantially lower switching cost than one whose operations are built around a single vendor’s console.

Nubius’s cloud operations service is structured precisely around this model: platform-agnostic operational support across AWS, Azure, GCP, and on-premises environments using the same team, tooling, and processes regardless of which underlying provider hosts the workload. That is a direct expression of the open-standards architecture philosophy.

The Licensing Layer

Licensing lock-in is the layer that makes vendor lock-in political inside organisations. It is where the engineering consequences of architectural decisions become visible to the board.

The Broadcom acquisition of VMware is the defining case study for this layer in the current decade. Broadcom announced the transition of VMware’s product portfolio to subscription-only licensing in February 2024. The grandfathering of perpetual licenses was eliminated. Customers who had purchased perpetual VMware vSphere licenses found that their renewal options had fundamentally changed. VMware’s official licensing documentation at support.broadcom.com describes the current subscription structure, which replaces the prior per-CPU and per-VM models with per-core subscription bundles that bundle products many customers did not need.

The consequence for organisations running VMware at scale was a sudden exposure of switching cost that had been building for years. The lock-in was always there. Broadcom’s pricing decision made it expensive enough that organisations could no longer afford to ignore it.

This pattern has precedents. Oracle’s licensing audit programme has created similar dynamics in enterprise database environments for over a decade. Red Hat’s 2023 decision to restrict access to RHEL source code triggered comparable conversations about the risk of building on commercially controlled open-source platforms.

How to Audit Your Own Lock-In Risk Before It Becomes a Crisis

An honest lock-in audit covers four questions for every component in your stack.

The first question is: can I export my data in a format a competing platform can ingest without proprietary tooling? For PostgreSQL on RDS, the answer is yes via pg_dump. For DynamoDB, the answer is yes but with effort. For vSAN datastores, the answer is yes via OVA export with some loss of advanced storage policy configuration. Documenting the answer for every major data store in your environment tells you where migration effort will concentrate.

The second question is: which of my infrastructure-as-code modules use provider-specific resources that have no equivalent in another provider? A Terraform module that provisions an AWS VPC and its associated resources is not portable to Azure. A module that provisions a Kubernetes cluster using the Kubernetes provider API is substantially more portable. This audit turns lock-in from a vague concern into a specific list of refactoring candidates.

The third question is: what does it cost to move one terabyte of my data out of each platform today? For AWS, calculate the egress cost from the EC2 pricing page. For Azure, calculate it from the bandwidth pricing page. For your on-premises environment, the cost is your operational effort to copy the data. This number should be part of your total cost of ownership model, not an afterthought.

The fourth question is: which of my engineering teams have platform-specific certifications, and what proportion of their operational procedures depend on vendor-specific tooling? This is a capability audit, not a blame exercise. It tells you where you need to invest in platform-neutral skills and tooling to reduce the operational component of your switching cost.

What the VMware-to-OpenNebula Migration Actually Involves

Because the VMware to OpenNebula migration path is the most common scenario Nubius works on, it is worth describing what the technical work actually entails at a practitioner level.

The starting point is a workload inventory. Not every workload in a VMware environment is a lock-in candidate. Many VMs are straightforward Linux workloads running application software that has no VMware dependency whatsoever. The disk image is in VMDK format, but the operating system has no vSphere-specific drivers loaded. These workloads convert cleanly to KVM using qemu-img, which is part of the standard QEMU toolchain. The QEMU documentation covers the conversion process from VMDK to qcow2, the native KVM image format, as well as from VMDK to raw image formats that OpenNebula can register as images in its image repository.

Workloads that run VMware Tools require a driver substitution step. VMware Tools provides paravirtualised drivers for disk I/O and network I/O optimised for the ESXi hypervisor. The equivalent under KVM is the VirtIO driver set, which provides comparable or superior I/O performance. The migration sequence requires booting the VM from a rescue environment, uninstalling VMware Tools, installing the VirtIO drivers, adjusting the disk controller configuration in the VM’s hardware definition, and then registering the resulting image in OpenNebula.

Network configuration is the most context-dependent part of the migration. VMware NSX-T configurations, distributed virtual switch port groups, and vSphere network I/O control rules all need equivalents in the target environment. OpenNebula’s virtual network documentation covers the supported network drivers including VXLAN, VLAN 802.1Q, and OVS (Open vSwitch) overlay networks. Mapping the NSX-T microsegmentation rules to an equivalent policy in the target environment is engineering work that requires understanding both the source and destination network models.

High-availability configuration changes. VMware HA relies on vCenter’s cluster-level awareness of host failures and uses storage heartbeat regions to distinguish host failures from network partitions. OpenNebula provides HA for VMs through its scheduler and health-checking subsystem, documented in the OpenNebula HA and fault tolerance documentation. The conceptual model is similar but the configuration interface and the underlying mechanisms differ.

This is why the Nubius cloud migration consulting service uses a phased approach: non-production workloads first, then production workloads in dependency order, with a defined rollback window at each phase. The technical work is tractable, but it requires discipline and sequencing.

Multi-Cloud Architecture as a Lock-In Mitigation Strategy

Multi-cloud is often proposed as the solution to vendor lock-in, and it is a genuine mitigation, but it introduces its own operational complexity that needs to be managed deliberately.

Running workloads across AWS, Azure, and GCP simultaneously means you are managing three sets of networking constructs, three sets of IAM models, three sets of monitoring and logging integrations, and three sets of managed service APIs. Without a consistent abstraction layer, multi-cloud can increase operational complexity faster than it reduces lock-in risk.

The abstraction layers that make multi-cloud operationally viable are Kubernetes for workload orchestration, Terraform or OpenTofu for infrastructure provisioning, and a cloud-neutral observability stack such as Prometheus with remote write configured to a centralised Thanos or Grafana Mimir backend. The Kubernetes documentation describes the container runtime interface and the cloud controller manager interface that decouple workloads from specific cloud providers’ underlying infrastructure APIs.

The hybrid cloud model, running workloads partly on-premises and partly in public cloud, is a structural hedge against both egress cost and licensing risk. Latency-sensitive workloads and large data repositories stay on-premises where storage and compute costs are fully controllable. Burst workloads, disaster recovery replicas, and globally distributed application tiers use public cloud. The hybrid cloud architecture described in the Nubius blog covers the network connectivity, identity federation, and data replication patterns that make this operationally viable.

For a deeper technical discussion of the service mesh and API gateway patterns that make multi-cloud workloads portable, the service-oriented architecture guide published on the Nubius blog covers the decoupling mechanisms that allow services to be relocated between cloud environments without upstream application changes.

The Infrastructure Portability Checklist

The following is a practitioner-level checklist for assessing and improving portability for each layer of your stack.

For your compute layer, confirm that your VM images are stored in a hypervisor-portable format such as qcow2 or raw, or that your containerised workloads are defined in OCI-compliant container images stored in a registry that supports the OCI distribution specification. Confirm that your VM templates do not depend on hypervisor-specific hardware versions or paravirtualised drivers that would prevent booting on an alternative hypervisor.

For your storage layer, confirm that your application data can be accessed via a standard block device interface, NFS export, or S3-compatible object store endpoint. Verify that no application writes directly to a storage API that is specific to a single provider. Calculate your egress cost for a full data migration scenario and model it as a risk line item in your infrastructure budget.

For your networking layer, confirm that your network policies are expressible in a vendor-neutral format such as Kubernetes NetworkPolicy objects or a Terraform module that abstracts over provider-specific security group constructs. Document which VPN or interconnect constructs tie you to a specific provider’s backbone.

For your application layer, confirm that your infrastructure-as-code does not use provider-specific Terraform resources for functions that have cross-provider equivalents. Review your application code for direct calls to vendor-managed service APIs that lack open-source equivalents. Identify which services you could replace with a self-managed open-source component if pricing or availability of the managed version became unacceptable.

For your observability layer, confirm that your monitoring data is stored in a format you can export. CloudWatch metrics, Azure Monitor logs, and GCP Cloud Logging data are all accessible via export APIs, but the ongoing cost of that export and the tooling required to consume the exported data varies significantly.

How Open Standards Create Negotiating Power

The practical benefit of reducing lock-in is not just that migration becomes cheaper. It is that the credible threat of migration changes your negotiating position with every vendor in your stack.

When Broadcom announced VMware’s new subscription pricing in early 2024, the organisations that had been quietly evaluating OpenNebula and KVM as an alternative for two or three years were in a fundamentally different position from those that had never considered it. The former had cost models, pilot data, and engineering teams that understood the migration path. They could walk into the renewal conversation with a real alternative. The latter had to accept the new pricing or face a migration project they had not budgeted for and had no experience executing.

Nubius’s virtualization platform service exists specifically to give organisations the production-ready OpenNebula environment they need to either migrate from VMware or to run as a credible alternative alongside their existing VMware footprint. The OpenNebula project documentation covers the features of the platform in detail, including the KVM driver configuration, the VM lifecycle management API, the image management system, and the distributed storage integration interfaces that allow StorPool or Ceph to serve as the underlying storage layer.

The same principle applies to application-level lock-in. An organisation that runs its application tier on Kubernetes, uses Helm charts for deployment, and stores its application state in PostgreSQL on a self-managed cluster has the ability to move that entire application to a different cloud provider or a different on-premises environment by repointing DNS and running a database migration. The cost of that move is engineering time, not a structural penalty imposed by a vendor’s pricing architecture.

Where Lock-In Reduction Starts in Practice

Most organisations cannot rewrite their entire infrastructure to eliminate lock-in in a single project. The practical approach is sequential and risk-ordered.

Start with data. Egress pricing creates the largest structural penalty, and the organisations that renegotiate their cloud contracts most successfully are the ones that have demonstrated they can move their data. A pilot migration of a non-critical data store to a cloud-neutral storage layer, such as a StorPool-backed block storage cluster or a self-managed Ceph cluster, builds the operational muscle and the cost model that makes larger migrations credible.

Continue with the hypervisor. A parallel OpenNebula environment running your non-production workloads gives you the operational experience needed to migrate production workloads when the business case is clear. The Nubius managed AppOps service supports the application middleware layer across both environments during a parallel-run period, so your development and staging environments can operate on OpenNebula while production remains on VMware until the migration window opens.

Address managed service dependencies last, because they require application code changes, not just infrastructure reconfiguration. The pattern is to introduce an abstraction layer, typically a thin service interface or a provider-agnostic client library, between your application code and the vendor-specific managed service API. Once that abstraction layer is in place, the underlying implementation can be switched without touching application code.

The cloud operations guide published on the Nubius blog covers the operational monitoring and cost optimisation practices that make a multi-phase migration manageable without sacrificing visibility into system performance during the transition.

The Long-Term Cost of Inaction

The engineering cost of vendor lock-in is not always visible until a pricing event, a product discontinuation, or an acquisition forces it into the open. By the time it is visible, the options are constrained.

An organisation running 500 VMware hosts on perpetual licenses in 2022 had a straightforward upgrade path, predictable renewal costs, and years of accumulated operational knowledge. By late 2024, the same organisation was looking at a per-core subscription model that restructured its cost basis, eliminated the flexibility of its existing license model, and required a renewal conversation with a fundamentally different counterparty in Broadcom. The engineering architecture had not changed. The risk that architecture carried had always been there. The pricing event just made it visible.

The same dynamic plays out more slowly in cloud environments. Each AWS Lambda function written without a portability layer, each DynamoDB table provisioned without a data export strategy, and each CloudFormation stack that uses AWS-specific constructs without a Terraform equivalent is a unit of switching cost that accumulates invisibly in your infrastructure balance sheet.

The cloud deployment models guide on the Nubius blog provides context for evaluating which deployment model, public, private, hybrid, or community cloud, carries the most appropriate lock-in risk profile for different workload categories. Not every workload needs the same level of portability, and the engineering investment in portability should be proportional to the criticality and longevity of the workload.

Conclusion

Vendor lock-in is an architectural condition that accumulates across the hypervisor layer, the data layer, the API and service layer, the operational tooling layer, and the licensing layer. It does not announce itself until a pricing event or a platform change makes the switching cost visible. By that point, the cost of the options you have is determined by the architectural decisions you made two or three years earlier.

The engineering response is not to avoid all managed services or to refuse all proprietary platforms. It is to make deliberate trade-offs, document the portability constraints of each one, and invest in the open-standards tooling that keeps your switching cost proportional to your engineering effort rather than to a vendor’s pricing architecture.

If you are currently in a VMware renewal conversation, facing an AWS egress bill that is constraining your architecture decisions, or trying to understand whether your application is more locked in than your infrastructure, a structured technical review is the starting point. The Nubius cloud migration consulting service begins with exactly that kind of assessment, mapping your current lock-in risk by layer before recommending a migration sequence.

Open standards are not ideological. They are a negotiating strategy. And negotiating from a position of credible alternatives is what keeps your infrastructure cost rational over time.