Published on February 22, 2021
Table of Contents
- Special thanks to Andra Paraschiv from AWS and Sascha Wise and Andrei Khvalko from m10 for their help with this project!
What are AWS Nitro Enclaves?
The AWS1 Nitro enclave architecture enables EC22 instances to start secure virtual machines that can run a trusted and verifiable codebase. Unlike traditional VMs3 or containers, these enclaves cannot be altered or inspected by anything or anyone on the EC2 instance, including root
. Enclaves are fully isolated virtual machines that have no persistent storage, no interactive access, and no external networking.
This is incredibly powerful when you need to run a highly secure service, that the system must completely trust, and that should not allow any sort of memory or code inspection or alteration.
The Project
M10, a client of SuperOrbital, is building a platform for central banks that want to support digital currencies in their financial markets. Due to the financial nature of the platform there are components of the system that must be completely trusted. SGX4 is one approach to solving this issue, but it is only avaliable on Intel hardware. Nitro enclaves provide similar functionality on AWS.
Almost all of M10’s platform runs on EKS5 in AWS. Although AWS does not currently support running Nitro enclaves within Kubernetes clusters, one of the AWS engineering teams provided us with invaluable guidance to figure out how to accomplish this with the current set of tools.
In this post we will discuss the step we took to:
- Spin up an EKS cluster with a subset of enclave enabled worker Nodes.
- Verify that we could start and stop an enclave from the host’s console and from within a Linux container on the host.
- Give a Kubernetes Pod access to the host system’s
/dev/nitro_enclaves
device. - Manage an enclave from within a Kubernetes Pod running in our EKS cluster.
- Investigate the resources that are used by the Nitro Enclave Allocator and understand how that might innteract with Kuebrnetes.
Step One: Get at least one enclave enabled K8s worker node
M10 uses Terraform to automate and manage all of their Kubernetes clusters. They’re primarily using the AWS provider and terraform-aws-eks module module to manage this on AWS.
We began by working with the Terraform EKS module team to add enclave support for worker Nodes to the module. This change was merged and released in v14.0.0.
At this point, we could start spinning up EKS clusters that had worker Nodes with enclave support enabled.
The primary EKS related Terraform code looks something like this:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 3.22"
}
}
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "14.0.0"
cluster_name = var.name
cluster_version = "1.18"
subnets = module.vpc.public_subnets
vpc_id = module.vpc.vpc_id
worker_groups = [
{
# This is the default worker group for our normal work loads.
name = "client-node-pool"
asg_desired_capacity = 3
bootstrap_extra_args = "--enable-docker-bridge true"
instance_type = "i3en.xlarge"
key_name = "aws-ssh"
kubelet_extra_args = "--kube-reserved=memory=0.3Gi,ephemeral-storage=1Gi --system-reserved=memory=0.2Gi,ephemeral-storage=1Gi --eviction-hard=memory.available<200Mi,nodefs.available<10% --node-labels=designated-for=client-node-pool"
}
]
worker_groups_launch_template = [
{
# This is a launch template for a second group of workers that will have enclave support.
name = "client-node-pool-enclave"
asg_desired_capacity = 2
bootstrap_extra_args = "--enable-docker-bridge true"
enclave_support = true
instance_type = "i3en.xlarge"
key_name = "aws-ssh"
kubelet_extra_args = "-kube-reserved=memory=0.3Gi,ephemeral-storage=1Gi --system-reserved=memory=0.2Gi,ephemeral-storage=1Gi --eviction-hard=memory.available<200Mi,nodefs.available<10% --node-labels=designated-for=client-node-pool,enclave.example.com/type=nitro,smarter-device-manager=enabled --register-with-taints=node.example.com/enclave=nitro:NoSchedule"
}
]
workers_group_defaults = {
additional_security_group_ids = [aws_security_group.worker_group_mgmt.id]
# At the moment we are using "amazon-eks-node-1.18-v20210125"
# From: https://github.com/awslabs/amazon-eks-ami/releases/tag/v20210125
ami_id = "ami-092298bf0c9ae11be"
asg_max_size = "20"
asg_min_size = "0"
root_volume_size = 100
root_volume_type = "gp2"
pre_userdata = file("${path.module}/pre_userdata.sh")
}
}
The most important parts of this are that:
-
We ensured that we were using a very recent EKS AMI for our instances, by setting
ami_id = "ami-092298bf0c9ae11be"
. This is likely not required anymore, but we were doing a ton of testing at the time, and wanted to ensure we were using the most current release.The AWS recommended AMI version can be discovered with a command like this:
aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.18/amazon-linux-2/recommended/image_id --region us-west-2 --query "Parameter.Value" --output text
-
We created a new
worker_groups_launch_template
that would manage the enclave enabled K8s worker Nodes.We needed to ensure that we were using an EC2 instance type that supports Nitro enclaves with
instance_type = "i3en.xlarge"
.We also registered a K8s taint with
--register-with-taints=node.example.com/enclave=nitro:NoSchedule
so that we could ensure that other workloads did not get scheduled on these Nodes.Then we added two Node labels to the enclave workers:
-
enclave.example.com/type=nitro
allows us to easily target the enclave enabled Nodes and differentiate them from other Nodes if we need to. -
smarter-device-manager=enabled
will become important in step three.
-
-
We also leveraged an AWS user data script to configure any instance that has the
/dev/nitro_enclaves
device with the tools and configuration required to manage Nitro enclaves.
This is the contents of the pre_userdata.sh
script, which will be prepended onto the AWS user data and start the instance bootstrapping process.
set -uxo pipefail
# Test to see if the Nitro enclaves module is loaded
lsmod | grep -q nitro_enclaves
RETURN=${?}
set -e
# Setup Nitro enclave on the host if the module is available as expected.
if [ ${RETURN} -eq 0 ]; then
amazon-linux-extras install aws-nitro-enclaves-cli -y
yum install aws-nitro-enclaves-cli-devel -y
usermod -aG ne ec2-user
usermod -aG docker ec2-user
# If needed, install custom allocator config here: /etc/nitro_enclaves/allocator.yaml
systemctl start nitro-enclaves-allocator.service
systemctl enable nitro-enclaves-allocator.service
systemctl start docker
systemctl enable docker
#
# Note: After some testing we discovered that there is an apparent bug in the
# Nitro CLI RPM or underlying tools, that does not properly reload the
# udev rules when inside the AWS bootstrap environment. This means that we must
# manually fix the device permissions on `/dev/nitro_enclaves`, if we
# don't want to be forced to restart the instance to get everything working.
# See: https://github.com/aws/aws-nitro-enclaves-cli/issues/227
#
chgrp ne /dev/nitro_enclaves
echo "Done with AWS Nitro enclave Setup"
fi
At this point, the EKS module’s default user data script attempts to join the instance to the EKS cluster.
Once this is all applied and we have authenticated against the resulting Kubernetes cluster then we should be able to find our Nitro enclave enabled Node with this command:
$ kubectl get nodes -o wide -l enclave.example.com/type=nitro
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-42-42.us-west-2.compute.internal Ready <none> 14h v1.18.9-eks-d1db3c 10.0.42.42 55.555.555.55 Amazon Linux 2 4.14.209-160.339.amzn2.x86_64 docker://19.3.6
Step Two: Ensuring that the enclave enabled instance works
Next we needed to confirm that the instance could indeed be used to build and deploy a Nitro enclave. This work eventually led to the creation of this POC6 repo which automated many of the steps and facilitated the testing that we needed.
For manual confirmation testing, you can ssh
into your enclave enabled instance as the ec2-user
and do the following:
$ mkdir enclave
$ cd enclave
$ wget -q https://raw.githubusercontent.com/aws/aws-nitro-enclaves-cli/main/examples/x86_64/hello/Dockerfile
$ wget -q https://raw.githubusercontent.com/aws/aws-nitro-enclaves-cli/main/examples/x86_64/hello/hello.sh
$ chmod a+rx hello.sh
$ docker build -t hello-enclave:1.0 .
Sending build context to Docker daemon 3.072kB
Step 1/4 : FROM busybox
latest: Pulling from library/busybox
5c4213be9af9: Pull complete
Digest: sha256:c6b45a95f932202dbb27c31333c4789f45184a744060f6e569cc9d2bf1b9ad6f
Status: Downloaded newer image for busybox:latest
---> 491198851f0c
Step 2/4 : ENV HELLO="Hello from the enclave side!"
---> Running in d5bc60ef3dd2
Removing intermediate container d5bc60ef3dd2
---> 64bb9ca9fb92
Step 3/4 : COPY hello.sh /bin/hello.sh
---> 16cd1282145d
Step 4/4 : CMD ["/bin/hello.sh"]
---> Running in b0bab8180fed
Removing intermediate container b0bab8180fed
---> 85a8751b17bb
Successfully built 85a8751b17bb
Successfully tagged hello-enclave:1.0
$ nitro-cli build-enclave --docker-uri hello-enclave:1.0 --output-file hello.eif
Start building the Enclave Image...
Enclave Image successfully created.
{
"Measurements": {
"HashAlgorithm": "Sha384 { ... }",
"PCR0": "ec...07",
"PCR1": "c3...42",
"PCR2": "5f...71"
}
}
$ nitro-cli run-enclave --cpu-count 2 --memory 512 --eif-path hello.eif --debug-mode > /tmp/output.json
Start allocating memory...
Started enclave with enclave-cid: 16, memory: 512 MiB, cpu-ids: [1, 3]
$ cat /tmp/output.json
{
"EnclaveID": "i-09...78",
"ProcessID": 23953,
"EnclaveCID": 16,
"NumberOfCPUs": 2,
"CPUIDs": [
1,
3
],
"MemoryMiB": 512
}
$ EID=$(cat /tmp/output.json | jq -r .EnclaveID)
$ nitro-cli describe-enclaves
[
{
"EnclaveID": "i-09...78",
"ProcessID": 23953,
"EnclaveCID": 16,
"NumberOfCPUs": 2,
"CPUIDs": [
1,
3
],
"MemoryMiB": 512,
"State": "RUNNING",
"Flags": "DEBUG_MODE"
}
]
# Type [Control-C] to exit the enclave console
$ nitro-cli console --enclave-id ${EID}
Connecting to the console for enclave 16...
Successfully connected to the console.
...
[ 1] Hello from the enclave side!
[ 2] Hello from the enclave side!
[ 3] Hello from the enclave side!
[ 4] Hello from the enclave side!
^C
$
We can tell from the successful nitro-cli run-enclave
command and the output from running nitro-cli console
that we have succeeded in spinning up a Nitro enclave on the instance. The following command can now be used to terminate the enclave.
$ nitro-cli terminate-enclave --enclave-id ${EID}
Successfully terminated enclave i-09...78.
{
"EnclaveID": "i-09...78",
"Terminated": true
}
Using the Docker image from the POC repo, we can also verify that we can run an enclave from inside a Linux container on the instance:
$ docker run -ti -v /var/run/docker.sock:/var/run/docker.sock --device=/dev/nitro_enclaves:/dev/nitro_enclaves:rw spkane/nitro-cli:latest /enclave/build.sh run
Unable to find image 'spkane/nitro-cli:latest' locally
latest: Pulling from spkane/nitro-cli
...
Digest: sha256:823465ff1f20c9aec6af2743583161db4016c51efb3800b4023dd0ae778960d9
Status: Downloaded newer image for spkane/nitro-cli:latest
+ docker build -f ./hello/Dockerfile -t hello-world-nitro-enclave:1.0 ./hello
Sending build context to Docker daemon 3.072kB
Step 1/4 : FROM busybox
---> 491198851f0c
Step 2/4 : ENV HELLO="Hello from the enclave side!"
---> Using cache
---> 64bb9ca9fb92
Step 3/4 : COPY hello.sh /bin/hello.sh
---> a28148e544c7
Step 4/4 : CMD ["/bin/hello.sh"]
---> Running in 2a3d44c97c62
Removing intermediate container 2a3d44c97c62
---> d08627167758
Successfully built d08627167758
Successfully tagged hello-world-nitro-enclave:1.0
+ nitro-cli build-enclave --docker-uri hello-world-nitro-enclave:1.0 --output-file hello.eif
Start building the Enclave Image...
Enclave Image successfully created.
{
"Measurements": {
"HashAlgorithm": "Sha384 { ... }",
"PCR0": "ed...3f",
"PCR1": "ef...70",
"PCR2": "56...7b"
}
}
+ '[' run == run ']'
+ /enclave/run.sh
+ nitro-cli run-enclave --cpu-count 2 --memory 512 --eif-path hello.eif --debug-mode
Start allocating memory...
Started enclave with enclave-cid: 17, memory: 512 MiB, cpu-ids: [1, 3]
+ cat /tmp/output.json
{
"EnclaveID": "i-09...90",
"ProcessID": 40,
"EnclaveCID": 17,
"NumberOfCPUs": 2,
"CPUIDs": [
1,
3
],
"MemoryMiB": 512
}
++ cat /tmp/output.json
++ jq -r .EnclaveID
+ EID=i-09...90
+ echo
+ echo
+ nitro-cli describe-enclaves
[
{
"EnclaveID": "i-09...90",
"ProcessID": 40,
"EnclaveCID": 17,
"NumberOfCPUs": 2,
"CPUIDs": [
1,
3
],
"MemoryMiB": 512,
"State": "RUNNING",
"Flags": "DEBUG_MODE"
}
]
+ echo
+ nitro-cli terminate-enclave --enclave-id i-09...90
Successfully terminated enclave i-09...90.
{
"EnclaveID": "i-09...90",
"Terminated": true
}
Now that we are confident that we can actually build and create an enclave on the instance, we need to see if we can get this all working from within a Kubernetes pod.
Step Three: Exposing /dev/nitro_enclaves to Kubernetes Pods
The first step in this process was trying to determine the proper way to mount a Linux device into the pod. Unfortunately it is not as easy as it is in Docker, but it is possible with Kubernetes Device Plugins. Various hardware vendors like Intel and NVIDIA have used this to expose their hardware devices inside Kubernetes. AWS does not currently offer this for enclaves, and in many ways, an enclave specific solution feels like the wrong approach.
A bit of research eventually led us to a project by ARM Research, called Smarter Device Manager (blog), that was built for Internet of Things (IoT) devices, that often have to expose serial, audio, and video hardware to the onboard software.
After familiarizing ourselves with the project, we were able to configure Smarter Device Manager and install the necessary components in our cluster using a set of YAML manifests like the one here.
In essence, this set of manifests:
- Creates a Namespace.
- Creates a ConfigMap which is used to configure Smarter Device Manager.
- And finally creates a DaemonSet which deploys the Manager.
The most important part of the manifest is our configuration for making /dev/nitro_enclaves an available Pod resource.
apiVersion: v1
kind: ConfigMap
metadata:
name: smarter-device-nitro
namespace: smarter-device
data:
conf.yaml: |
- devicematch: ^nitro_enclaves$
nummaxdevices: 1
# This denotes how many of these devices Smarter Device Manager
# should consider available for reservation.
Way back in step one, we added the Node label smarter-device-manager=enabled
to all of our enclave enabled Nodes. Now that the Smarter Device Manager is deployed it will see this label and use it to determine what Nodes it should manage device access on.
At this point in the process, it should be possible to spin up a Pod that can work with Nitro enclaves.
- Note: It turns out that, access to Docker is required to build Nitro enclaves, and since there are security concerns with that and we would strongly prefer to build these within our CI7 pipeline anyhow, we realized that we only need our production Pods to be able to run (start and terminate) Nitro enclaves. Fortunately, this means that we do not need to give the Pod any access to Docker.
Step Four: Managing a Nitro enclave from a Kubernetes Pod
The final step was to finally see if we could control the creation and termination of a Nitro enclave from within a Pod in our Kubernetes cluster. To test this we created a new Pod manifest to allow us to test this.
The first step was to set up the Pod with a resource request for /dev/nitro_enclaves
, like this.
resources:
limits:
smarter-devices/nitro_enclaves: "1"
requests:
smarter-devices/nitro_enclaves: "1"
This tells Smart Device Manager that we need to be deployed onto a Node with the nitro_enclaves
device and we need the only available reservation for that device.
-
Note: At the moment, only one Nitro Enclave per EC2 instance can be launched. If multiple Pods have access to the Nitro Enclaves device, an enclave launched from inside a Pod would need to be terminated before another Pod launches a new enclave. This is why we have told Smarter Device Manager that only one reservation is allowed via the
nummaxdevices
configuration option.
Because Smarter Device Manager is already limiting what Node our Pod will get scheduled to, we did not need to use a selector that specifically targeted our custom Node label enclave.example.com/type=nitro
, we did however have to ensure that our Pod has a Kubernetes toleration defined that would allow it to be scheduled on the enclave enabled Node, since we gave that Node a Kubernetes taint to keep other Pods off of it.
At this point we thought that everything would just work, but instead we ran into one last issue. Trying to run an enclave from inside the Pod was exiting with a new error:
$ nitro-cli run-enclave --cpu-count 2 --memory 512 --eif-path hello.eif --debug-mode
Start allocating memory...
[ E35 ] EIF file parsing error. Such error appears when attempting to fill a memory region with a section of the EIF file, but reading the entire section fails.
For more details, please visit https://docs.aws.amazon.com/enclaves/latest/user/cli-errors.html#E35
If you open a support ticket, please provide the error log found at "/var/log/nitro_enclaves/err2021-02-19T23:19:46.661390373+00:00.log"
Failed connections: 1
[ E39 ] Enclave process connection failure. Such error appears when the enclave manager fails to connect to at least one enclave process for retrieving the description information.
For more details, please visit https://docs.aws.amazon.com/enclaves/latest/user/cli-errors.html#E39
If you open a support ticket, please provide the error log found at "/var/log/nitro_enclaves/err2021-02-19T23:19:46.661536416+00:00.log"
We went around in circles for about a day looking at container permissions, device permissions, and resource related settings. After spending some time comparing the Docker container to the Kubernetes Pod container, we finally discovered the missing piece. The nitro-enclave CLI8 makes use of Linux huge pages and we need to make this available inside the pod.
To enable hugepages in our Pod we needed to adjust our manifest to look like this.
resources:
limits:
smarter-devices/nitro_enclaves: "1"
hugepages-2Mi: 512Mi
memory: 2Gi
cpu: 250m
requests:
smarter-devices/nitro_enclaves: "1"
hugepages-2Mi: 512Mi
volumeMounts:
- mountPath: /dev/hugepages
name: hugepage
readOnly: false
volumes:
- name: hugepage
emptyDir:
medium: HugePages
We also submitted a small upstream issue and PR to ensure that the docs better reflected this requirement.
Once this final step was applied we were able to successfully spin up a Pod inside an EKS cluster that could start and terminate Nitro enclaves without any issues.
$ kubectl exec -ti hello-world-enclave -- /enclave/run.sh
+ nitro-cli run-enclave --cpu-count 2 --memory 512 --eif-path hello.eif --debug-mode
Start allocating memory...
Started enclave with enclave-cid: 16, memory: 512 MiB, cpu-ids: [1, 3]
...
{
"EnclaveID": "i-00...b3",
"ProcessID": 16,
"EnclaveCID": 16,
"NumberOfCPUs": 2,
"CPUIDs": [
1,
3
],
"MemoryMiB": 512
}
...
+ nitro-cli describe-enclaves
[
{
"EnclaveID": "i-00...b3",
"ProcessID": 16,
"EnclaveCID": 16,
"NumberOfCPUs": 2,
"CPUIDs": [
1,
3
],
"MemoryMiB": 512,
"State": "RUNNING",
"Flags": "DEBUG_MODE"
}
]
...
+ nitro-cli terminate-enclave --enclave-id i-00...b3
Successfully terminated enclave i-00...b3.
{
"EnclaveID": "i-00...b3",
"Terminated": true
}
Step Five: Understanding Resource Utilization Interactions
At this point we wanted to better understand how the Nitro enclaves were using resources and whether there was anything that we would need to configure in Kubernetes to ensure that they did not conflict.
The Nitro Enclave Allocator service is responsible for setting aside the node resources defined in /etc/nitro_enclaves/allocator.yaml
. By default, this file defines:
# How much memory to allocate for enclaves (in MiB).
memory_mib: 512
#
# How many CPUs to reserve for enclaves.
cpu_count: 2
but for this test, we set the configuration to:
memory_mib: 1024
cpu_count: 2
and then disabled the Nitro Enclave Allocator service, so that we could run a few tests, to better understand what was happening on the system.
$ sudo systemctl disable nitro-enclaves-allocator.service
Removed symlink /etc/systemd/system/multi-user.target.wants/nitro-enclaves-allocator.service.
We then rebooted the enclave enabled instance and ran a few commands to see what resources were available without the allocator running.
$ lscpu | grep "CPU(s) list"
On-line CPU(s) list: 0-3
$ free -h
total used free shared buff/cache available
Mem: 30G 655M 29G 1.8M 777M 30G
Swap: 0B 0B 0B
We can see that there are 4 vCPUs9 online and 29GB of memory free.
If we run kubectl node describe
and look at the allocatable resources we see this:
Allocatable:
attachable-volumes-aws-ebs: 39
cpu: 4
ephemeral-storage: 94477937300
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 31736096Ki
pods: 58
smarter-devices/nitro_enclaves: 5
At this point we can manually start up the Nitro Enclave Allocator service.
$ sudo systemctl start nitro-enclaves-allocator.service
Once the service is started, we can re-examine our system resources.
$ lscpu | grep "CPU(s) list"
On-line CPU(s) list: 0,2
Off-line CPU(s) list: 1,3
$ free -h
total used free shared buff/cache available
Mem: 30G 1.6G 28G 1.8M 779M 29G
Swap: 0B 0B 0B
If we compare this to the original data, before the allocator service was started, we notice that two vCPUs have been taken offline utilizing the Linux hotplug functionality and one additional GB of memory is now in use.
If we re-run kubectl node describe
and look at the allocatable resources now we see this:
Allocatable:
attachable-volumes-aws-ebs: 39
cpu: 2
ephemeral-storage: 94477937300
hugepages-1Gi: 1Gi
hugepages-2Mi: 0
memory: 30687520Ki
pods: 58
smarter-devices/nitro_enclaves: 5
Looking at what has changed we can see these differences:
- cpu: 4
+ cpu: 2
- hugepages-1Gi: 0
+ hugepages-1Gi: 1Gi
- memory: 31736096Ki
+ memory: 30687520Ki
These results are very encouraging because it suggests that simply starting the Nitro Enclave Allocator service at boot time is enough for Kubernetes to ignore the CPUs and memory that are reserved for the enclave. The kubelet
also register a new hugepages-1Gi
resource, which we might want to ensure remains unused by Kubernetes, but we will leave this for some future exploration via this Kubernetes core issue that we filed.
Finally, to return things back to their proper state, the last thing we want to do is re-enable the Nitro Allocator Service, so that it will start again whenever the enclave enabled instance is restarted.
$ sudo systemctl enable nitro-enclaves-allocator.service
Created symlink from /etc/systemd/system/multi-user.target.wants/nitro-enclaves-allocator.service to /usr/lib/systemd/system/nitro-enclaves-allocator.service.
Conclusion
The results of this work will enable M10 to deliver incredibly secure compute services on AWS to their customers while also giving back to the broader tech community.
This is exactly the sort of project that makes SuperOrbital thrive. Getting the chance to work with engineers from multiple companies, contributing to the broader open source community, and actively pushing the boundaries of what people have done in the past is incredibly rewarding.
We all hope that this work helps others trying to achieve similar goals on AWS and Kubernetes.