Detecting a Container Escape with Cilium and eBPF
If you run Cloud Native Workloads, you better secure them. After all, services are often exposed to the public and Workloads might belong to various tenants. In this blog post we will show you how an attacker with access to your Kubernetes cluster could do a container escape: running a pod to gain root privileges, escaping the pod onto the host, and persisting the attack with invisible pods and fileless executions. And we will show you how to detect these attacks with Isovalent Cilium Enterprise.
During a container escape an attacker breaks the isolation boundary between the host and the container, ending up escaping into what is eventually a Kubernetes control plane or a worker node. In this case, the attackers can see other containers that are running on the same host, gather their secrets, read or write data on the host file system, attack kubelet and escalate privileges; or exploit a Kubernetes bug and persist in the environment by deploying an invisible pod.
Applying security best practises on a Kubernetes environment can limit these types of attacks but a container breakout is still possible, an attacker can use a privileged pod or exploit an existing vulnerability to gain privileges. Security Teams need to measure if hardening configurations are suitable and applied protections are working.
One way to achieve this is observability following a data-driven approach: collect data from Kubernetes workloads and hosts, observe feedback, and make continuous data-driven decisions to protect the Kubernetes environment.
By using eBPF, Security Teams can get unique visibility directly into any Kubernetes workloads, such as pods. Because pods on a Kubernetes node share a single kernel, each of the processes within a pod are visible to a single eBPF program. This can provide full visibility into each process running on a node whether they are long running processes on the host managed by systemd or short lived processes running inside of containers.
Cilium uses eBPF to very efficiently monitor all network and process behaviour inside of Kubernetes workloads and outside on the host and gives you Kubernetes Identity Aware and OS Level Process Visibility into those behaviours.
Cilium deploys as a daemonset inside of a Kubernetes environment. So, there is a Cilium agent running on each Kubernetes node and it is communicating with the Kubernetes API server to understand Kubernetes Pod Identities, Network Policies, Services etc. Then based on the identity of each one of the workloads deployed inside of a Kubernetes environment, Cilium installs a highly efficient eBPF program to do Connectivity, Observability and Security for those workloads.
Rich Security Events
Cilium is both able to observe and enforce what behaviour happened inside of a Linux system. It can collect and filter out Security Observability data directly in the kernel and export it to user space as JSON events and / or store them in a specific log file via a Daemonset called hubble-enterprise. These JSON events are enriched with Kubernetes Identity Aware Information including services, labels, namespaces, pods and containers and with OS Level Process Visibility data including process binaries, pids, uids, parent binaries with the full Process Ancestry Tree. These events can then be exported in a variety of formats and sent to external systems such as a SIEM, e.g: Elasticsearch, Splunk or stored in an S3 bucket. For simplicity, in this blog post they will be directly consumed from the log file.
By leveraging this real-time time Network and Process-Level Visibility Data from the kernel via Cilium, Security Teams are able to see all the processes that have been executed in their Kubernetes environment which helps them to make continuous data driven decisions and improve the security posture of their system. One such example is detecting a container escape.
Let’s reach the host namespace
In this example, we are using a privileged pod with host namespace configuration to represent a container escape attack. This is possible in a hardened Kubernetes environment, as we demonstrate it here. Note, that there are multiple ways to perform a breakout, for example an attacker can exploit a vulnerability as well to gain privileges and escape out of the container sandbox.
The first and easiest step for an attacker to perform a container escape would be to spin up a pod with a privileged Pod spec. Note: Kubernetes allows this by default and the privileged flag grants the container all available kernel capabilities. The hostPID and hostNetwork flag places the container into the host PID and networking namespace, so it can see and interact with all process and network resources. One easy example can be found in the following yaml file:
$ cat privileged.yaml apiVersion: v1 kind: Pod metadata: name: privileged-the-pod spec: hostPID: true hostNetwork: true containers: - name: privileged-the-pod image: nginx:latest ports: - containerPort: 80 securityContext: privileged: true
So, let’s apply that privileged Pod spec:
$ kubectl apply -f privileged.yaml pod/privileged-the-pod created $ kubectl get pods NAME READY STATUS RESTARTS AGE privileged-the-pod 1/1 Running 0 11s
Now, the attacker has a privileged pod up and running which gives them the same permissions as they would have if they were root on the underlying node. Why is it so powerful? Because of the capabilities that the pod has started with, including CAP_SYS_ADMIN, which is essentially the “new root” in Linux and also gives access to all devices on the host machine. This capability in combination with HostPID gives the attacker access to breaking out of all container namespaces put in place, so they can interact with and exploit any other process or filesystem on the underlying node where the privileged pod is deployed.
By using Cilium, Security Teams can detect any privileged container execution by picking up the following process_exec event exported to userspace by executing the following command:
kubectl logs -n kube-system ds/hubble-enterprise -c export-stdout
Secondly, they can see the related:
- Kubernetes Identity Aware Information, such as the namespace:
default, the pod name:
privileged-the-pod, the container-id and the label
- OS Level Visibility Information, such as the binary:
0and the arguments:
nginx -g \"daemon off;
- Full Process Ancestry Tree which includes
/usr/bin/containerd-shimas a direct parent process binary
- Capabilities that the container has started which includes
As a second step, the attacker can use kubectl exec to get shell access to
kubectl exec -it privileged-the-pod -- /bin/bash root@minikube:/#
A shell suddenly popping up in a container log after it was started is of course an event the Security Team is interested in. They can detect the bash execution by picking up the following process_exec event exported to userspace via Cilium. The Process Information can be seen between line 4 and 11 while the Kubernetes Identity Aware Information can be seen between line 12 and 24.
As a third step, the attacker can use the nsenter command to enter into the host namespace and run the bash command as root on the host.
root@minikube:/# nsenter -t 1 -a bash bash-5.0#
nsenter command executes commands in specified namespaces. The first flag,
-t defines the target namespace where the attacker wants to go. Every Linux machine runs a process with PID
1 which always runs in the host namespace. The other command line arguments define the other namespaces where the attacker also wants to enter, in this case,
-a describes all the namespaces.
So, the attacker is breaking out from the container in every possible way and running the
bash command as
root on the host.
Security Teams can identify this breakout by picking up two process_exec events. In the first event, they are able to observe the executed
nsenter command in line 8 with the appropriate namespace arguments
-t 1 -a in line 9. They can also see the source pod name in line 15, which is
privileged-the-pod and all the Kubernetes Identity Aware and OS Level Visibility Information:
By picking up the second process_exec event, Security Teams are able to detect the bash execution on the host namespace having
nsenter as a parent process binary. The Parent Process Information can be seen between line 151 and 177 and the source binary name can be seen in line 8, which is
Now, the attacker has reached the host namespace on a node in a Kubernetes cluster and is running bash. We have used a privileged container with hostPID associations in this example. In the real world, this could also have been an unprivileged container with its own process namespace that then managed to exploit a kernel vulnerability to gain privileges and break out. What can they do? The attacker can see containers that are running on the same controller node, gather secrets associated with them, read data from the host file system, attack kubelet and escalate privileges; or exploit a special Kubernetes behavior and persist the breakout by firing up an invisible container. Let’s assume the attacker chooses the last option.
Container where are you?
Persist the break out by creating an invisible container by the Kubernetes API server
Apart from stealing sensitive information from other Kubernetes workloads, peaking into other Kubernetes namespaces, the attacker can persist the breakout by starting a “hidden”, static pod.
There are many ways an attacker could create a persistent process to hide traces of further activities. For this example, we are going to use a static Kubernetes pod. Unfortunately, many Security Teams make the assumption that every time
kubelet launches a workload all the configs have been statically analyzed by the Kubernetes API server and its webhooks. They don’t take into account that if an attacker inserts a Pod spec under the
/etc/kubernetes/manifests directory on a