Crossplane Case Study

Published on September 08, 2022

Table of Contents

Overview
The Workflow
The Vision
The Current State
In Conclusion

Special thanks to Nic Cope (Upbound), Brandon McNama (Nirvana Money), Rene Scheepers (Shopify), Maria Ntalla (Shopify), Marty Henderson (Ocelot Consulting), and my peers at SuperOrbital for their help reviewing this post and providing invaluable feedback.

Overview

blueprint banner

In this article, we will explore and discuss the open-source Crossplane project. What it is, how it can be used, and the current state of the technology.

Crossplane allows you to manage cloud resources like storage buckets, databases, users, and virtual machines as Kubernetes resources. This means you can create them through the same API you use to manage Kubernetes-native resources like Deployments and Services. This has a few benefits: it allows you to define the cloud resources right next to your application YAML¹, it uses the declarative and idempotent Kubernetes API, and it provides a strong separation of concerns between infrastructure owners and application developers.

Crossplane is a CNCF² Incubating project and Upbound is the commercial entity that founded the project and contributed the core components to the open source community. In this article, we will be focusing on the open-source version of Crossplane, instead of any commercial offerings from Upbound.

Unlike Terraform and similar tools, Crossplane is a control plane that is built on top of the Kubernetes ecosystem. It consists of one or more providers, a.k.a. operators, that understand how to create and manage resources via third-party APIs³, like AWS⁴, GCP⁵, and even SQL⁶, based on declarative YAML manifests.

The Workflow

So, what is it that Crossplane is designed to help us accomplish? At its simplest, Crossplane is designed to allow platform teams to create abstracted APIs for resource lifecycle management, so that developers have an easy way to request and manage the cloud resources they need, without being forced to deal directly with the complex cloud ecosystems that are generally well outside their core expertise. The most basic Crossplane workflow looks like this:

Create a valid Kubernetes manifest that describes one or more cloud resources (or any API object) and the desired configuration for those resources.
Apply that manifest to Kubernetes.
As soon as the Crossplane operator for the required provider (AWS, GCP, etc.) starts the next reconciliation loop, those resources will be created exactly as they are described in the manifest.
If any changes are made to those cloud objects by other processes, Crossplane will remediate those changes during the next reconciliation loop.

In this manner, Crossplane can be used to easily deploy cloud resources directly alongside the applications that require them and ensure that those resources do not drift for long and are configured exactly as required.

The overall process for implementing and utilizing Crossplane today looks something like this:

An operations team sets up a Kubernetes cluster and installs Crossplane and the necessary providers for their environment. For this discussion, the AWS provider is all that is required.

Then, a platform team creates a service catalog by determining and then assembling a series of Composite Resources (XR), that represent one more Managed Resources (MR), which a development team might want to request via Kubernetes.

A Composite Resource Definition(XRD)+Composition tuple is the primary abstraction tool, enabling the platform team to hide the implementation details and only expose the minimal variables that the organization needs or wants the end users to specify.

Finally, a development team deploys a new application to Kubernetes containing a Composite Resource Claim (XRC). This will cause the related Crossplane operator to create and manage a set of resources, as defined by the requested XR. This Resource/Claim pattern should be familiar to anyone who has worked with storage (StorageClasses, PersistentVolumes, and PersistentVolumeClaims) in Kubernetes.

For easy reference, most data returned by the API about a resource (e.g., an IP address) will be stored underneath the Managed Resource’s status field, and secrets that are returned (e.g., user credentials) can easily be stored in a new Kubernetes secret, or, via a new alpha feature, the secret can also be pushed into an external secret store, like Vault.

The development team uses these resources until they no longer have a need, and then they delete the XRC, and Crossplane deletes the related cloud resources.

crossplane flow

This workflow means that platform teams can focus on defining well-designed resources that ensure best practices are followed. Development teams can then stay tightly focused on their application’s requirements, and Crossplane and Kubernetes can be relied on to ensure that the system is kept in the expected state whenever possible, and any divergence from this desired state will be reported.

The Vision

Crossplane is a very intriguing project for people already all-in with Kubernetes. In many ways, just like how the Argo project re-imagined CI/CD for a Kubernetes world, Crossplane is attempting to make an infrastructure management tool that leverages Kubernetes strengths while also considering what both operators and developers individually need from such a tool.

This is a great goal and something that I truly want to see them succeed with.

The Current State

So, how easy is Crossplane to adopt today?

Documentation

There are various documents and blog posts that walk through the technical steps to install Crossplane and use it to create a standard cloud resource like a database or even a managed Kubernetes cluster. Still, after some investigation, it becomes clear that the documentation is very light, and the primary example used almost everywhere only covers the creation of a cloud-based PostgreSQL instance.

It would be very useful to see a variety of real-world Composition examples, that could be used in all 3 major cloud providers⁷. Some possibilities include:

Create an encrypted object store with versioning and 2 users. One user with read/write access and a second user with only read access.
Create an SQL database with 3 users and a basic backup policy. One user with full admin rights inside the database, a second user with read/write access and a third user with only read access.
Create a simple, but realistic, Kubernetes cluster.

It would also be very helpful to have some more in-depth tutorials on how to build your own Compositions. Currently, this primarily requires looking at the minimal examples and then parsing YAML and APIs to figure out everything that is required.

At the moment, the documentation lacks a lot of detail.

Providers

Even during the initial installation and configuration of Crossplane, some important decisions must be made. At the moment, there are two cloud providers for most cloud vendors.

In the case of Amazon Web Services, there is the native AWS Crossplane provider. This works well and is built to take advantage of everything that Crossplane has to offer. However, the current version only supports 173 AWS resources.

To help make it easier for people to transition to Crossplane, the developers created a tool called Terrajet, which makes it possible to create a Crossplane provider from an existing Terraform provider. This has already been done for the major cloud providers, so it is possible to also use the Terrajet AWS Provider for Crossplane to manage your resources in Amazon Web Services. This provider supports 780 AWS resources.

The underlying Terraform core for these providers is certainly well-tested, which is a big plus. But does this also mean that the Terrajet providers are just using Terraform behind the scenes? No, not exactly. Although the actual Terraform provider for Crossplane basically wraps up Terraform as we know it today, the Terrajet-based providers take another approach. Each Terrajet provider primarily only makes use of the CRUD⁸ logic inside each upstream Terraform provider, to enable it to create and manage all of the individual resources that each provider supports. Out of necessity, the Terrajet providers make use of the terraform binary behind the scenes, but they do not use most of the other Terraform functionality, like HCL, the DAG⁹, plans, etc. This reliance on Terraform components might complicate the decision for people trying to get away from Terraform. However, there are some benefits that might not be apparent. It is likely that your organization is already very familiar with how Terraform and its standard providers work, and with Terrajet-based providers, there is no longer any HCL to manage, and Crossplane will internally manage all of the Terraform state files. As it turns out, Terrajet-based providers create each individual Managed Resource separately. This means that the resulting state files are very small. Every Terrajet-based provider keeps the generated state files cached in ephemeral storage within the pod. However, all of this data is duplicated into the related Managed Resource’s manifest and related secrets that are stored in Kubernetes’ etcd instance (you are backing this up, right?), which means that Crossplane can easily re-create the state for each object at any time. So, this could actually be a big win.

The Terrajet providers extend a workflow that is already well tested. But these providers are mostly intended to accelerate adoption versus providing fine-tuned providers for the Crossplane workflow. Due to this bifurcation, some people have even decided to use both providers in their Crossplane environments. Nothing is preventing this and it allows organizations to lean on the strengths of both solutions, but it is also more complicated and still does not make the road ahead any clearer.

It is unclear which providers (native or Terrajet) should be preferred¹⁰.

The native Crossplane providers appear to be the long-term goal. They integrate into the Crossplane workflow very cleanly and don’t require a future costly provider migration (from Terrajet to the native version). Yet the native versions still don’t have the broad resource coverage or testing that the Terraform providers have.

The APIs

This brings us to the APIs. There are a lot of them, and the majority of them are labeled v1alpha1 or v1beta1, which means that although they all work to some degree or another, many of them are not yet at a stage where users should expect them to be stable and free from breaking changes between releases¹¹. Although it is common to see organizations use beta APIs in their production Kubernetes clusters, alpha APIs are rarely deployed unless a feature is so important to an organization that they are willing to accept the additional risks that come with very early adoption.

As an example, when working with AWS, we will need to either use the native AWS provider API, which is currently at aws.crossplane.io/v1beta1, or the Terrajet-based AWS provider which exposes aws.jet.crossplane.io/v1alpha1

And then, when creating an AWS EC2¹² instance via the provider, we would need to use one of the following APIs depending on which provider we settled on:

ec2.aws.crossplane.io/v1alpha1
ec2.aws.jet.crossplane.io/v1alpha2

Another thing to be aware of is that these 2 APIs are often similar but not 100% compatible. Since Terrajet utilizes Terraform and the Terraform providers, the naming of fields in the API matches what Terraform uses. But those field names may be different in the native Crossplane providers.

An example of this can be seen when defining the image that should be used to spin up an AWS EC2 instance.

With the native AWS provider, the user would set the value imageId, while in the Terrajet AWS provider this value is called ami. In most cases, these should only be a hassle for the platform teams building out the abstracted Compositions for developers to utilize, but it still may make successful adoption and future migrations tricker than desired.

The Crossplane APIs are still moving targets that are actively evolving.

It is also worth noting that Crossplane can easily install hundreds of CRDs into your Kubernetes cluster. This can cause some issues for older Kubernetes releases, so you should be running a recent release.

Building a Platform

Although creating Managed Resources (MR) like a single AWS S3 bucket is reasonably easy via Crossplane, it does not provide any abstraction for the cloud resources that developers might want to leverage. Because of this, Crossplane expects organizations to have a platform team that will define and build Compositions for Crossplane which cover all the use cases that the developers need.

For example, when using Crossplane, an organization’s platform team could create a Composition called CloudObjectStore, which developers could then request to receive an optimized object store for their use case. The Composition should abstract away all the details that the developers do not need to consider and should only require them to provide a few simple, cloud-agnostic details in their Claim (like whether versioning should be enabled). When a development team makes this request, they will get all of the required components configured and spun up properly for their environment, whether that be development, production, or even an integration instance running in a secondary cloud provider that the organization supports.

The providers that you use will have a direct impact on what you can build into your compositions. If your chosen provider does not support a resource you want to create, you can’t include it in your Composition.

This is also an area where the current documentation makes things challenging and forces users to lean heavily on blog posts, source code, and the Crossplane Slack workspace.

The lack of documentation and fully-featured providers might create a significant barrier to entry for teams that are not comprised of strong Kubernetes engineers who are comfortable with Go development and happy to dive into the Crossplane codebase to figure things out and contribute improvements back into the evolving toolset.

When using Terraform, there are lots of public modules that are available, but since public modules need to be flexible by design, these don’t abstract away many of the underlying details. Crossplane strongly encourages the creation of simplification via abstractions, which means that it is up to individual organizations to define what they need and want to expose to their developers. Only time will tell if a public collection of Compositions might be developed, but for the moment, this is left up to each organization to generate in-house.

Are you committed to defining and building a platform on top of Crossplane?

Day-to-Day Interactions

Crossplane provides a pretty standard YAML-based workflow that is built directly on top of the underlying JSON REST API, which should be familiar to anyone who already works directly with Kubernetes. This is a mixed blessing. YAML is a data serialization language that is great for writing declarative manifests but it is not a programming language and can not model complex logic in a reasonable way. On the other hand, the JSON REST API can be leveraged by most modern programming languages to do just about anything, you just have to write the code. In theory, a manifest describes what we want and the logic is what operators are supposed to provide for us, but implementing a little bit of additional logic, should not require one to write a complete Kubernetes operator. Maybe, the Kubernetes community needs a pattern for cluster admins to add plugins to operators, and hook some custom logic into them. To address this need for extensibility Crossplane is currently working on a design document for Composition Functions which would support implementing additional custom logic in Compositions.

Logic is tricky in Terraform’s HCL¹³. It is even harder in YAML.

Crossplane will only destroy resources when you explicitly tell it to by removing the manifests that requested the object creation from Kubernetes. Unlike in Terraform, making changes to fields that would cause the re-creation of an object is essentially ignored by Crossplane. This is actually a good thing since Crossplane has no equivalent to a dry run mode, like terraform plan. Due to this, developers will not receive any feedback before their changes are implemented. At the moment, the biggest problem with this is that I discovered cases, where I could apply a manifest change to something that would normally force the cloud object to be recreated (e.g. the instanceType for an AWS EC2 instance), and Crossplane’s native AWS provider would continue to tell me that everything was in sync, despite the fact that the requested state (spec) deviates from the actual state (status). This issue has been reported upstream and has recently been updated to reference the underlying issue that contributes to this problem. It looks like in the future these fields will be marked immutable so that you can not change them without deleting the object.

Resources that require re-creation need special handling.

The lack of a dry run mode can make development tricky. The issue is a bit complicated and it might be less necessary than it is in Terraform since Crossplane will not delete resources that are not explicitly deleted. However, dry run modes are also about building trust. They provide a way for humans or applications to confirm that a code change, no matter how complicated, should result in the expected changes and avoid any surprises. The complete lack of this functionality makes it trickier to support the local development of Compositions since every change must essentially be applied to a real environment to confirm that it is generally correct. Some of these issues could likely be improved by introducing some CLI¹⁴ tooling that could scan the files of a new Composition and make sure that they are all valid so that at least some issues can be detected as early in the cycle as possible. Implementing a custom CI¹⁵ workflow that tests Crossplane related PRs¹⁶ in an ephemeral Kubernetes cluster would also be beneficial.

The lack of a dry-run mode requires trust in code reviews and Crossplane.

When things break, troubleshooting can be a bit tricky, because of the various layers and operators that are involved. If you create a Composition and it applies cleanly, and then a developer creates a Claim for that resource that never materializes, where is the problem?

It is completely possible to figure out the answer, but there are a lot of layers that need to be investigated, and if you are using the Terrajet provider, you have the additional embedded Terraform process to investigate.

If people find Terraform hard to work with is this going to be an improvement? As with most things, the real measure of a tool comes when things are not going to plan, instead of when everything is running smoothly. Does the tool make us more efficient and remove more barriers than it creates?

Troubleshooting resource creation issues can be a manual and complex process.

In Conclusion

So, where does this leave us?

Crossplane has a ton of potential, and it is usable today, but the opinion that I have developed from my testing and discussions with current users, is that adopting it today is similar to adopting Docker in 2013. You can do it, and you can be successful with it. However, it will take a significant amount of implementation effort, you will need to be very active in the community, and you are very likely going to be reminded of why they call it the bleeding edge.

YAML Ain’t Markup Language ↩
Cloud Native Computing Foundation ↩
Application Programming Interface ↩
Amazon Web Services ↩
Google Cloud Platform ↩
Structured Query Language ↩
Upbound is working on some documentation along these lines, but this is outside the scope of the open source project, which is the focus of this review. ↩
Create, Read, Update, and Delete ↩
Directed Acyclic Graph (terraform graph) ↩
This is an intentional decision by the project to encourage multiple approaches across the community and allow the best provider to rise to the top of the stack naturally. ↩
A feature lifecycle document has very recently been added to the Crossplane repo to make the meaning of, and expectations for, an alpha, beta, or stable feature much clearer. ↩
Elastic Compute Cloud ↩
Hashicorp Configuration Language ↩
Command-Line Interface ↩
Continuous Integration ↩
Pull Request ↩