The Cilium project recently became a graduated CNCF project and is the only graduated project in the CNCF Cloud Native Networking category.
While Cilium can do many things – Ingress, Service Mesh, Observability, Encryption – its popularity initially soared as a pure Container Network Interface (CNI): a high-performance feature-rich Container Network plugin.
However, we tend to take for granted what a CNI actually does. This blog post will demystify the role of the CNI and even explain how to build a basic one from scratch.
While most users will not need that depth of understanding to use Cilium or our Enterprise edition, understanding the core functionality of the CNI can only make operating Kubernetes easier.
This blog post is based on a session I presented at the recent Cloud Native Rejekts confeference, preceding KubeCon Paris 2024.
What is a Container Network Interface (CNI)?
The Container Network Interface (CNI) is a CNCF project that specifies the relationship between a Container Runtime Interface (CRI), such as containerd, responsible for container creation, and a CNI plugin tasked with configuring network interfaces within the container upon execution.
Ultimately, it’s the CNI plugin that performs the substantive tasks, while CNI primarily denotes the interaction framework.
However, it’s common practice to simply refer to the CNI plugin as “CNI”, a convention we’ll adhere to in this post.
Container networking explained
Containers do not possess their own kernel: instead, they rely on the kernel of the host system on which they are running.
This design choice renders containers more lightweight but less isolated compared to Virtual Machines (VMs).
To provide some level of isolation, containers utilize a kernel feature known as namespaces.
These namespaces allocate system resources, such as interfaces, to specific namespaces, preventing those resources from being visible in other namespaces of the same type.
Note that by namespaces, I refer to Linux namespaces: a Linux kernel feature that is not directly related to Kubernetes namespaces.
While containers consist of various namespaces, we will concentrate on the network namespace for the purposes of this blog. Typically each container has its own network namespace.
This isolation ensures that interfaces outside of the container’s namespace are not visible within the container’s namespace and processes can bind to the same port without conflict.
To facilitate networking, containers employ a specialized device known as a virtual ethernet device (veth).
Veth devices are always created in interconnected pairs, ensuring that packets reaching one end are transmitted to the other, similar to two systems being linked via a cable.
To enable communication between a container and the host system, one of the veth interfaces resides within the container’s network namespace, while the other resides within the host’s network namespace.
This configuration allows seamless communication between the container and the host system.
As a result containers on the same node are able to communicate with each other through the host system.
How does a CNI work?
Picture a scenario where a user initiates the creation of a Pod and submits the request to the kube-apiserver.
Following the scheduler’s determination of the node where the Pod should be deployed, the kube-apiserver contacts the corresponding kubelet.
The kubelet, rather than directly creating containers, delegates this task to a CRI.
The CRI’s responsibility encompasses container creation, including the establishment of a network namespace, as previously discussed.
Once this setup is complete, the CRI calls upon a CNI plugin to generate and configure virtual ethernet devices and necessary routes.
Please note that CNIs typically do not handle traffic forwarding or load balancing.
By default, kube-proxy serves as the default network proxy in Kubernetes which utilizes technologies like iptables or IPVS to direct incoming network traffic to the relevant Pods within the cluster.
However, Cilium offers a superior alternative by loading eBPF programs directly into the kernel, achieving the same tasks with significantly higher speed.
For more information on this topic see “What is Kube-Proxy and why move from iptables to eBPF?“.
Writing a CNI from scratch
A common misconception suggests that Kubernetes-related components must be coded exclusively in Go.
However, CNI dispels this notion by being language agnostic, focusing solely on defining the interaction between the CRI and a CNI plugin.
The language used is inconsequential, what matters is that the plugin is executable.
To demonstrate this flexibility, we’ll develop a CNI plugin using bash.
Before delving into the implementation, let’s examine the steps in more detail:
1. Following the CRI’s creation of a network namespace, it will load the first file located in /etc/cni/net.d/
. Therefore, we’ll generate a file named /etc/cni/net.d/10-demystifying.conf
. This file must adhere to a specific JSON structure outlined in the CNI specification. The line "type": "demystifying"
indicates the presence of an executable file named demystifying
, which the CRI will execute in the next step.
2. The CRI will search in the directory /opt/cni/bin/
and execute our CNI plugin, demystifying
. For that reason we will create our bash script at /opt/cni/bin/demystifying
. When the CRI invokes a CNI plugin, it passes data to the executable: the JSON retrieved from the previous step is conveyed via STDIN, while details about the container, including the relevant network namespace indicated by CNI_NETNS
, are conveyed as environment variables.
3. The first task our CNI plugin has to achieve is to create a virtual ethernet device. This action results in the creation of two veth interfaces, which we’ll subsequently configure. One of the interfaces will be named veth_netns
and the other one veth_host
to make it easier to follow further steps.
4. Next, we’ll move one of the veth interfaces, veth_netns
, into the container’s network namespace. This allows for a connection between the container’s network namespace and the host’s network namespace.
5. While the veth interfaces are automatically assigned MAC addresses, they lack an IP address. Typically, each node possesses a dedicated CIDR range, from which an IP address is selected. Assigning an IP to the veth interface inside the container network namespace is what is considered to be the Pod IP. For simplicity, we’ll statically set 10.244.0.20
as the IP address and rename the interface based on the CNI_IFNAME
environment variable. Keep in mind that Pod IPs must be unique in order to not create routing issues further down the line. In reality one would therefore keep track of all assigned IPs, a detail that we are skipping for simplicity reasons.
6. The veth interface on the host will receive another IP address, serving as the default gateway within the container’s network namespace.We’ll statically assign 10.244.0.101
as the IP address. Irrespective of the number of Pods created on the node this IP can stay the same as its sole purpose is to serve as a destination for a route within the container’s network namespace.
7. Now it is time to add routes. Inside the container’s network namespace, we need to specify that all traffic should be routed through 10.244.0.101
, directing it to the host. On the host side all traffic destined for 10.244.0.20
must be directed through veth_host
. This configuration achieves bidirectional communication between the container and the host.
8. Finally, we need to inform the CRI of our actions. To accomplish this, we’ll print a JSON via STDOUT containing various details about the configuration performed, including the interfaces and IP addresses created.
Now it is time to test it for yourself.
Try your own CNI
To test it out, we’re going to be using a kind cluster. The following GitHub repo includes the cluster configuration file (note that the default CNI is not installed as we will be building our own) and a Makefile: simply run make cluster
to create the cluster (and make destroy
once you’re done with this tutorial).
As previously outlined we will have to create two files on the node.
The first one will be /etc/cni/net.d/10-demystifying.conf
:
Create it locally and copy it to the kind
node with the following command:
The second one is the executable CNI plugin /opt/cni/bin/demystifying
:
Again, you can create it locally and copy it to the container with:
As the CNI plugin must be executable, we’ll need to modify the file permissions using chmod
:
With that, we’ve constructed an operational CNI. The next time a Pod is created on the node, it will follow the outlined steps, and our CNI will be invoked:
As evident, the Pod is running as expected. Note that the Pod’s IP address is 10.244.0.20
, as set by our CNI. With everything configured correctly, the node can successfully reach the Pod and receive a response:
Keep in mind that this setup isn’t suitable for production environments and comes with limitations as real-world scenarios require additional steps. For instance, assigning different IP addresses to the veth interface within the container network namespace and ensuring unique names for the veth interfaces on the host namespace are essential to support multiple Pods. Additionally, adding network configuration is only one aspect of the tasks a CNI plugin must support. These tasks are triggered by the CRI setting the CNI_COMMAND
environment variable to DEL
or CHECK
respectively when invoking the CNI plugin. There are several other tasks, the specific requirements for full compatibility vary across versions and are outlined in the CNI specification. Nevertheless, the concepts outlined in this blog post hold true regardless of version and offer valuable insights into the workings of a CNI.
Summary
At the heart of Kubernetes networking lies the Container Network Interface (CNI) specification which defines the exchange between the Container Runtime Interface (CRI) and the executable CNI plugin which resides on every node within the Kubernetes cluster. While the CRI establishes a container’s network namespace, it is the CNI plugin’s role to execute intricate network configurations. These configurations involve creating virtual ethernet interfaces and managing network settings, ensuring seamless connectivity both to and from the newly established container network namespace.
We hope this blog post demystified the CNI and the role it plays in cloud native architectures. You will have seen a CNI in its most basic form ; to access the most advanced one, check out our free Enterprise labs or talk to our experts.
Filip is currently working at Isovalent – the creators of Cilium and eBPF.
He is actively contributing to a number of CNCF projects such as Cilium, Tetragon, ArgoCD, Minkube and many more.
With his networking and platform engineering background he has worked in several industries and focuses on improving cloud native technologies.