Published on February 09, 2024
This is part of our series on Cluster API and how it can be a solution for managing large numbers of clusters at scale. For the first part on this series, see Cluster API: A Deep Dive On Declarative Cluster Lifecycle Management.
Table of Contents
At SuperOrbital, we often encounter challenging projects where customers want to quickly scale their Kubernetes clusters, deploy their workloads on them, and have complete control over the management of these clusters while not overbearing their DevOps team. Most of them make use of AWS’s EKS offering, which simplifies the Kubernetes management aspect. However, copy-pasting Terraform configuration files over and over for every cluster or refactoring large modules to try to DRY the code can become a pain, as can the lengthy terraform apply
sessions needed to keep the state of all the clusters up-to-date. This is where CAPI comes in handy with the AWS infrastructure provider: Cluster API Provider AWS, also known as CAPA!
What is CAPA?
CAPA is the CAPI infrastructure provider for AWS, and allows the user to deploy CAPI-managed clusters with AWS infrastructure. Its list of features includes (but is not limited to):
- Fully-featured Kubernetes clusters on EC2 instances
- No need to faff around with the network configuration for control planes since CAPA will do that all for you
- Support for managing EKS clusters
- Ability to separate the clusters into different AWS accounts, even for the management cluster
- Cost savings through support for Spot instances
- Best practices for HA, such as the ability to deploy a cluster’s nodes across different availability zones by default.
The only prerequisite for CAPA is having access to an administrative account in AWS to bootstrap the IAM roles required for CAPA to create and manage all the resources required for these clusters.
Creating the Management Cluster
The process for deploying CAPA can be a point of friction, as it requires an existing cluster and manually executing a bunch of commands one after another. If you’re looking to simply get your feet wet and try out the capabilities of CAPA, here at SuperOrbital, we built the capa-bootstrap Terraform configuration, which automates the process of provisioning a single-node management cluster with everything that’s needed to start creating and managing production-ready clusters in AWS! This bootstrapper will create a single EC2 instance, set it up as a cluster, and install all the CAPI+CAPA controllers so that it will serve as our management cluster.
WARNING: For a production-ready CAPA installation, it’s recommended that the management cluster has multiple nodes to ensure CAPA is always available since any downtime means that no cluster or node can be created, modified, or deleted. For the sake of this tutorial, capa-bootstrap installs CAPI and CAPA on a single-node cluster. Additionally, as the management cluster itself is now a critical piece of infrastructure, be sure to back it up using your normal process. The capa-bootstrap tool is meant only for educational purposes, and production usage is highly discouraged!
Clone the capa-bootstrap repository and cd into it:
$ git clone https://github.com/superorbital/capa-bootstrap.git
Cloning into 'capa-bootstrap'...
remote: Enumerating objects: 232, done.
remote: Counting objects: 100% (232/232), done.
remote: Compressing objects: 100% (83/83), done.
remote: Total 232 (delta 126), reused 229 (delta 123), pack-reused 0
Receiving objects: 100% (232/232), 40.25 KiB | 2.87 MiB/s, done.
Resolving deltas: 100% (126/126), done.
$ cd capa-bootstrap/
Set the required variables aws_secret_key
and aws_access_key
, by creating a .tfvars file (using the provided example file) or by exporting them as environment variables:
export TF_VAR_aws_access_key="<MY ACCESS KEY>"
export TF_VAR_aws_secret_key="<MY SECRET KEY>"
Review and modify any other optional variables, such as the instance type and the Kubernetes version for the management cluster if desired.
Execute terraform init
, followed by terraform apply
:
$ terraform apply
data.aws_ami.latest_ubuntu: Reading...
data.aws_ami.latest_ubuntu: Read complete after 0s [id=ami-04ab94c703fb30101]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# aws_instance.capa_server will be created
+ resource "aws_instance" "capa_server" {
+ ami = "ami-0c7217cdde317cfec"
+ arn = (known after apply)
+ associate_public_ip_address = (known after apply)
+ availability_zone = (known after apply)
+ cpu_core_count = (known after apply)
+ cpu_threads_per_core = (known after apply)
+ disable_api_stop = (known after apply)
+ disable_api_termination = (known after apply)
+ ebs_optimized = (known after apply)
+ get_password_data = false
+ host_id = (known after apply)
+ host_resource_group_arn = (known after apply)
+ iam_instance_profile = (known after apply)
+ id = (known after apply)
+ instance_initiated_shutdown_behavior = (known after apply)
+ instance_lifecycle = (known after apply)
+ instance_state = (known after apply)
+ instance_type = "m5a.large"
+ ipv6_address_count = (known after apply)
+ ipv6_addresses = (known after apply)
+ key_name = (known after apply)
+ monitoring = (known after apply)
+ outpost_arn = (known after apply)
+ password_data = (known after apply)
+ placement_group = (known after apply)
+ placement_partition_number = (known after apply)
+ primary_network_interface_id = (known after apply)
+ private_dns = (known after apply)
+ private_ip = (known after apply)
+ public_dns = (known after apply)
+ public_ip = (known after apply)
+ secondary_private_ips = (known after apply)
+ security_groups = (known after apply)
+ source_dest_check = true
+ spot_instance_request_id = (known after apply)
+ subnet_id = (known after apply)
+ tags = {
+ "Name" = "superorbital-quickstart-capa-server"
+ "Owner" = "capa-bootstrap"
}
+ tags_all = {
+ "Name" = "superorbital-quickstart-capa-server"
+ "Owner" = "capa-bootstrap"
}
+ tenancy = (known after apply)
+ user_data = (known after apply)
+ user_data_base64 = (known after apply)
+ user_data_replace_on_change = false
+ vpc_security_group_ids = (known after apply)
...<SNIPPED>...
Review the plan and accept!
Plan: 10 to add, 0 to change, 0 to destroy.
Changes to Outputs:
+ capa_node_ip = (known after apply)
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
After a successful terraform apply
, you should have a cluster running CAPA!
tls_private_key.global_key: Creating...
tls_private_key.global_key: Creation complete after 0s [id=235a6d5c450ee1dd91714d4b4f68bd0639e5c59f]
local_file.ssh_public_key_openssh: Creating...
local_sensitive_file.ssh_private_key_pem: Creating...
local_file.ssh_public_key_openssh: Creation complete after 0s [id=2c4b12d7db822a33a5f52c8485a1390976543a89]
local_sensitive_file.ssh_private_key_pem: Creation complete after 0s [id=34e9aab08bdc9ff2becd847de034699154104190]
aws_key_pair.capa_bootstrap_key_pair: Creating...
aws_security_group.capa_bootstrap_sg_allowall: Creating...
aws_key_pair.capa_bootstrap_key_pair: Creation complete after 0s [id=superorbital-quickstart-capa-bootstrap-20240126213038814400000001]
aws_security_group.capa_bootstrap_sg_allowall: Creation complete after 3s [id=sg-0bac29de0faa35c77]
aws_instance.capa_server: Creating...
aws_instance.capa_server: Still creating... [10s elapsed]
aws_instance.capa_server: Still creating... [20s elapsed]
aws_instance.capa_server (remote-exec): Waiting for cloud-init to complete...
aws_instance.capa_server: Still creating... [30s elapsed]
aws_instance.capa_server (remote-exec): Completed cloud-init!
aws_instance.capa_server: Creation complete after 35s [id=i-0e2961b43b81829d1]
module.capa.ssh_resource.install_k3s: Creating...
module.capa.ssh_resource.install_k3s: Still creating... [10s elapsed]
module.capa.ssh_resource.install_k3s: Creation complete after 11s [id=474554630196844793]
module.capa.ssh_resource.retrieve_config: Creating...
module.capa.ssh_resource.install_capa: Creating...
module.capa.ssh_resource.retrieve_config: Creation complete after 1s [id=6493639830213559672]
module.capa.local_file.kube_config_server_yaml: Creating...
module.capa.local_file.kube_config_server_yaml: Creation complete after 0s [id=d57ac94d6db2c9bd6d6f7bcd08e5024e4f79c833]
module.capa.ssh_resource.install_capa: Still creating... [10s elapsed]
module.capa.ssh_resource.install_capa: Still creating... [20s elapsed]
module.capa.ssh_resource.install_capa: Still creating... [30s elapsed]
module.capa.ssh_resource.install_capa: Still creating... [40s elapsed]
module.capa.ssh_resource.install_capa: Still creating... [50s elapsed]
module.capa.ssh_resource.install_capa: Creation complete after 52s [id=2560992061155103173]
Apply complete! Resources: 10 added, 0 changed, 0 destroyed.
Outputs:
capa_node_ip = "3.81.60.22"
Now that the management cluster is created, we can check if the Pods are running with kubectl get pods
. The kubeconfig for the management cluster will be placed in your current directory by Terraform.
$ kubectl --kubeconfig capa-management.kubeconfig get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-cainjector-c778d44d8-nl57j 1/1 Running 0 20m
cert-manager cert-manager-7d75f47cc5-zjptm 1/1 Running 0 20m
cert-manager cert-manager-webhook-55d76f97bb-cwg6n 1/1 Running 0 20m
kube-system coredns-6799fbcd5-xrk5c 1/1 Running 0 20m
kube-system local-path-provisioner-84db5d44d9-qnmpg 1/1 Running 0 20m
kube-system helm-install-traefik-crd-zrvch 0/1 Completed 0 20m
kube-system metrics-server-67c658944b-wblth 1/1 Running 0 20m
kube-system svclb-traefik-27bfa6a0-gnmlx 2/2 Running 0 20m
kube-system helm-install-traefik-4bdww 0/1 Completed 1 20m
kube-system traefik-f4564c4f4-6g6q5 1/1 Running 0 20m
capi-system capi-controller-manager-855f9f859-w8c4r 1/1 Running 0 20m
capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-75b968db86-cgvcv 1/1 Running 0 20m
capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-75758b5479-2f4zg 1/1 Running 0 20m
capa-system capa-controller-manager-6b48bcb87c-cb9wt 1/1 Running 0 20m
One last thing to note is that if you wish to poke into the cluster directly, you can SSH into the EC2 instance where the management cluster is running using the public/private SSH key pair created in the same directory.
$ ls | grep id_rsa
id_rsa
id_rsa.pub
$ ssh -i id_rsa ubuntu@<NODE IP ADDRESS>
Creating a Managed Cluster
With CAPA up and running, there are two ways of creating managed clusters: using the clusterctl command to generate and apply manifests directly to the cluster or by creating the objects in the cluster with a dedicated Kubernetes client. The latter method is better suited for a future post where we talk about adding APIs on top of CAPI, so today, we’ll just focus on applying manifests.
The capa-bootstrap directory already has two example manifests for creating a simple cluster on AWS using EC2 instances with a CAPI-managed control plane and a simple EKS cluster with an AWS-managed control plane. Let’s take a look at the YAML defined for the CAPI-managed control plane cluster:
# Namespace (1)
apiVersion: v1
kind: Namespace
metadata:
name: aws-cluster-1
---
# Cluster definition (2)
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: aws-cluster-1
namespace: aws-cluster-1
spec:
clusterNetwork:
pods:
cidrBlocks:
- 192.168.0.0/16 # (5)
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: aws-cluster-1-control-plane
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
name: aws-cluster-1
---
# AWSCluster definition (2)
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
name: aws-cluster-1
namespace: aws-cluster-1
spec:
region: us-east-1
sshKeyName: default
---
# KubeadmControlPlane definition (3)
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: aws-cluster-1-control-plane
namespace: aws-cluster-1
spec:
machineTemplate:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: aws-cluster-1-control-plane
replicas: 3 # (4)
version: v1.28.3 # (6)
In here, we’re defining a namespace (1), a cluster object that is configured as an “AWSCluster”-type cluster (2), and the configuration of our desired control plane for this cluster (3). We have complete control over the amount of control plane replicas (4), the pod CIDR (5), and the kubernetes version (6) that will be deployed for this control plane.
# AWSMachineTemplate (control plane)
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
metadata:
name: aws-cluster-1-control-plane
namespace: aws-cluster-1
spec:
template:
spec:
iamInstanceProfile: control-plane.cluster-api-provider-aws.sigs.k8s.io
instanceType: t3.medium # (9)
sshKeyName: default
---
# MachineDeployment
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: aws-cluster-1-md-0
namespace: aws-cluster-1
spec:
clusterName: aws-cluster-1
replicas: 3 # (7)
selector:
matchLabels: null
template:
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: aws-cluster-1-md-0
clusterName: aws-cluster-1
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: aws-cluster-1-md-0
version: v1.28.3 # (10)
---
# AWSMachineTemplate (worker nodes)
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
metadata:
name: aws-cluster-1-md-0
namespace: aws-cluster-1
spec:
template:
spec:
iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
instanceType: t3.medium # (9)
sshKeyName: default
---
# KubeadmConfigTemplate (8)
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
name: aws-cluster-1-md-0
namespace: aws-cluster-1
spec:
template:
spec:
joinConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws
name: '{{ ds.meta_data.local_hostname }}'
We have a similar amount of configurability for the nodes in the cluster themselves, including the amount of replicas on your MachineDeployments (7), how kubelet is bootstrapped (8) and the instance type used for each node (9). All of these manifests can be modified to suit your needs for your managed cluster – maybe you’d like your cluster to use a previous version of Kubernetes than the one used on the manifest (10), or maybe a single MachineDeployment is not enough for your use case since you want to mix ARM and x86 workloads in a single cluster.
Creating these clusters is as simple as applying these manifests, which creates all the objects the controller needs to create the cloud resources. Doing so for the EKS cluster shows the following output:
$ kubectl --kubeconfig capa-management.kubeconfig apply -f examples/eks-cluster-1.yaml
cluster.cluster.x-k8s.io/eks-cluster-1 created
awsmanagedcluster.infrastructure.cluster.x-k8s.io/eks-cluster-1 created
awsmanagedcontrolplane.controlplane.cluster.x-k8s.io/eks-cluster-1-control-plane created
machinedeployment.cluster.x-k8s.io/eks-cluster-1-md-0 created
awsmachinetemplate.infrastructure.cluster.x-k8s.io/eks-cluster-1-md-0 created
eksconfigtemplate.bootstrap.cluster.x-k8s.io/eks-cluster-1-md-0 created
The cluster is now being created, so sit back and relax while CAPI and CAPA do the hard work! For details on the progress of the cluster creation, one can check the events on the cluster object itself or check the status of the various components (control plane, machines, and cluster) in the logs of the controllers of the management cluster.
$ kubectl --kubeconfig capa-management.kubeconfig describe cluster -n eks-cluster-1 eks-cluster-1
...
Status:
Conditions:
Last Transition Time: 2023-09-21T19:10:55Z
Message: 4 of 10 completed
Reason: RouteTableReconciliationFailed
Severity: Warning
Status: False
Type: Ready
Last Transition Time: 2023-09-21T19:07:35Z
Message: Waitng for control plane provider to indicate the control plane has been initialized
Reason: WaitingForControlPlaneProviderInitialized
Severity: Info
Status: False
Type: ControlPlaneInitialized
Last Transition Time: 2023-09-21T19:10:55Z
Message: 4 of 10 completed
Reason: RouteTableReconciliationFailed
Severity: Warning
Status: False
Type: ControlPlaneReady
Last Transition Time: 2023-09-21T19:07:35Z
Status: True
Type: InfrastructureReady
Infrastructure Ready: true
Observed Generation: 1
Phase: Provisioning
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 8m37s (x2 over 8m37s) cluster-controller Cluster eks-cluster-1 is Provisioning
Normal InfrastructureReady 8m37s cluster-controller Cluster eks-cluster-1 InfrastructureReady is now true
Once the cluster is created, congratulations! We can see the status of our created clusters by the kubectl get clusters
command on our management cluster:
$ kubectl get clusters -A
NAMESPACE NAME PHASE AGE VERSION
development-aws aws-cluster-1 Provisioned 105m
production-eks bravo Provisioned 104m
development-eks eks-cluster-1 Provisioned 13m
The last thing to note is that each cluster type (AWS EC2 or EKS) has a different kind of Kubeconfig file available. EKS-flavored clusters contain two different types of kubeconfig files, one for users and another one for CAPI’s sole administrative use:
$ kubectl get secrets -n development-eks | grep kubeconfig
eks-cluster-1-user-kubeconfig cluster.x-k8s.io.secret 1 9m32s
eks-cluster-1-kubeconfig cluster.x-k8s.io.secret 1 9m32s
For EKS, only the user kubeconfig should be used to access the managed cluster:
$ kubectl get secrets -n development-eks eks-cluster-1-user-kubeconfig -o jsonpath='{.data.value}' | base64 -d > eks-cluster-1.kubeconfig
$ kubectl --kubeconfig eks-cluster-1.kubeconfig get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-127-104.ec2.internal Ready <none> 3m v1.22.17-eks-0a21954
$ kubectl --kubeconfig eks-cluster-1.kubeconfig get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system aws-node-68bnp 1/1 Running 0 3m16s
kube-system coredns-7f5998f4c-gl24x 1/1 Running 0 12m
kube-system coredns-7f5998f4c-jqjjp 1/1 Running 0 12m
kube-system kube-proxy-4fxtj 1/1 Running 0 3m16s
The AWS EC2 clusters only have a single admin kubeconfig that’s maintained by CAPI, so care must be taken when using it, as it is equivalent to having root access on the managed cluster.
Extra Credit: Bootstrap + Pivot
OK, this is pretty advanced, and fairly mind-bendy. If the idea of using Terraform to manage the cluster bothers you when you’re trying to go all in on CAPI, you can have CAPI manage itself! First, you’d use CAPI to create a single Kubernetes cluster (see all the previous steps up until now). Then you can promote this new cluster to be your management cluster by pivoting the installation to it from the original single-node cluster. You can now safely destroy the cluster provisioned by Terraform.
While CAPI and CAPA can now manage their own management cluster like the other managed clusters, this removes Terraform’s role in the cluster lifecycle. Deleting these management resources entirely can become an increased challenge.
For more information on how to do this and how it works, see the documentation in the CAPI book.
What’s next?
We’ve seen how CAPI and CAPA are capable of managing our infrastructure, but what about the workloads in these clusters? For our next post, we’ll be going over a few options that we have available to easily deploy and update our workloads in the managed clusters.
Subscribe (yes, we still ❤️ RSS) or join our mailing list below to stay updated!