Back to blog

Demystifying the CNI by Writing One From Scratch

Filip Nikolic
Filip Nikolic
Published: Updated: Cilium
Demystifying the CNI by Writing One From Scratch

The Cilium project recently became a graduated CNCF project and is the only graduated project in the CNCF Cloud Native Networking category.
While Cilium can do many things – Ingress, Service Mesh, Observability, Encryption – its popularity initially soared as a pure Container Network Interface (CNI): a high-performance feature-rich Container Network plugin.

However, we tend to take for granted what a CNI actually does. This blog post will demystify the role of the CNI and even explain how to build a basic one from scratch.

While most users will not need that depth of understanding to use Cilium or our Enterprise edition, understanding the core functionality of the CNI can only make operating Kubernetes easier.

This blog post is based on a session I presented at the recent Cloud Native Rejekts confeference, preceding KubeCon Paris 2024.

What is a Container Network Interface (CNI)?

The Container Network Interface (CNI) is a CNCF project that specifies the relationship between a Container Runtime Interface (CRI), such as containerd, responsible for container creation, and a CNI plugin tasked with configuring network interfaces within the container upon execution.
Ultimately, it’s the CNI plugin that performs the substantive tasks, while CNI primarily denotes the interaction framework.
However, it’s common practice to simply refer to the CNI plugin as “CNI”, a convention we’ll adhere to in this post.

Container networking explained

Containers do not possess their own kernel: instead, they rely on the kernel of the host system on which they are running.
This design choice renders containers more lightweight but less isolated compared to Virtual Machines (VMs).
To provide some level of isolation, containers utilize a kernel feature known as namespaces.
These namespaces allocate system resources, such as interfaces, to specific namespaces, preventing those resources from being visible in other namespaces of the same type.

Note that by namespaces, I refer to Linux namespaces: a Linux kernel feature that is not directly related to Kubernetes namespaces.

While containers consist of various namespaces, we will concentrate on the network namespace for the purposes of this blog. Typically each container has its own network namespace.
This isolation ensures that interfaces outside of the container’s namespace are not visible within the container’s namespace and processes can bind to the same port without conflict.

To facilitate networking, containers employ a specialized device known as a virtual ethernet device (veth).
Veth devices are always created in interconnected pairs, ensuring that packets reaching one end are transmitted to the other, similar to two systems being linked via a cable.

To enable communication between a container and the host system, one of the veth interfaces resides within the container’s network namespace, while the other resides within the host’s network namespace.
This configuration allows seamless communication between the container and the host system.
As a result containers on the same node are able to communicate with each other through the host system.

How does a CNI work?

Picture a scenario where a user initiates the creation of a Pod and submits the request to the kube-apiserver.
Following the scheduler’s determination of the node where the Pod should be deployed, the kube-apiserver contacts the corresponding kubelet.
The kubelet, rather than directly creating containers, delegates this task to a CRI.
The CRI’s responsibility encompasses container creation, including the establishment of a network namespace, as previously discussed.
Once this setup is complete, the CRI calls upon a CNI plugin to generate and configure virtual ethernet devices and necessary routes.

Please note that CNIs typically do not handle traffic forwarding or load balancing.
By default, kube-proxy serves as the default network proxy in Kubernetes which utilizes technologies like iptables or IPVS to direct incoming network traffic to the relevant Pods within the cluster.
However, Cilium offers a superior alternative by loading eBPF programs directly into the kernel, achieving the same tasks with significantly higher speed.
For more information on this topic see “What is Kube-Proxy and why move from iptables to eBPF?“.

Writing a CNI from scratch

A common misconception suggests that Kubernetes-related components must be coded exclusively in Go.
However, CNI dispels this notion by being language agnostic, focusing solely on defining the interaction between the CRI and a CNI plugin.
The language used is inconsequential, what matters is that the plugin is executable.
To demonstrate this flexibility, we’ll develop a CNI plugin using bash.

Before delving into the implementation, let’s examine the steps in more detail:

1. Following the CRI’s creation of a network namespace, it will load the first file located in /etc/cni/net.d/. Therefore, we’ll generate a file named /etc/cni/net.d/10-demystifying.conf. This file must adhere to a specific JSON structure outlined in the CNI specification. The line "type": "demystifying" indicates the presence of an executable file named demystifying, which the CRI will execute in the next step.

2. The CRI will search in the directory /opt/cni/bin/ and execute our CNI plugin, demystifying. For that reason we will create our bash script at /opt/cni/bin/demystifying. When the CRI invokes a CNI plugin, it passes data to the executable: the JSON retrieved from the previous step is conveyed via STDIN, while details about the container, including the relevant network namespace indicated by CNI_NETNS, are conveyed as environment variables.

3. The first task our CNI plugin has to achieve is to create a virtual ethernet device. This action results in the creation of two veth interfaces, which we’ll subsequently configure. One of the interfaces will be named veth_netns and the other one veth_host to make it easier to follow further steps.

4. Next, we’ll move one of the veth interfaces, veth_netns, into the container’s network namespace. This allows for a connection between the container’s network namespace and the host’s network namespace.

5. While the veth interfaces are automatically assigned MAC addresses, they lack an IP address. Typically, each node possesses a dedicated CIDR range, from which an IP address is selected. Assigning an IP to the veth interface inside the container network namespace is what is considered to be the Pod IP. For simplicity, we’ll statically set 10.244.0.20 as the IP address and rename the interface based on the CNI_IFNAME environment variable. Keep in mind that Pod IPs must be unique in order to not create routing issues further down the line. In reality one would therefore keep track of all assigned IPs, a detail that we are skipping for simplicity reasons.

6. The veth interface on the host will receive another IP address, serving as the default gateway within the container’s network namespace.We’ll statically assign 10.244.0.101 as the IP address. Irrespective of the number of Pods created on the node this IP can stay the same as its sole purpose is to serve as a destination for a route within the container’s network namespace.

7. Now it is time to add routes. Inside the container’s network namespace, we need to specify that all traffic should be routed through 10.244.0.101, directing it to the host. On the host side all traffic destined for 10.244.0.20 must be directed through veth_host. This configuration achieves bidirectional communication between the container and the host.

8. Finally, we need to inform the CRI of our actions. To accomplish this, we’ll print a JSON via STDOUT containing various details about the configuration performed, including the interfaces and IP addresses created.

Now it is time to test it for yourself.

Try your own CNI

To test it out, we’re going to be using a kind cluster. The following GitHub repo includes the cluster configuration file (note that the default CNI is not installed as we will be building our own) and a Makefile: simply run make cluster to create the cluster (and make destroy once you’re done with this tutorial).

$ git clone https://github.com/f1ko/demystifying-cni
Cloning into 'demystifying-cni'...
remote: Enumerating objects: 33, done.
remote: Counting objects: 100% (33/33), done.
remote: Compressing objects: 100% (27/27), done.
remote: Total 33 (delta 9), reused 30 (delta 6), pack-reused 0
Receiving objects: 100% (33/33), 1.54 MiB | 9.30 MiB/s, done.
$ cd demystifying-cni 
$ make cluster
kind create cluster --config kind.yaml --name demystifying-cni
Creating cluster "demystifying-cni" ...
 ✓ Ensuring node image (kindest/node:v1.30.0) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-demystifying-cni"
You can now use your cluster with:

kubectl cluster-info --context kind-demystifying-cni

Thanks for using kind! 😊
kubectl delete deploy -n kube-system coredns
deployment.apps "coredns" deleted
kubectl delete deploy -n local-path-storage local-path-provisioner
deployment.apps "local-path-provisioner" deleted

As previously outlined we will have to create two files on the node.

The first one will be /etc/cni/net.d/10-demystifying.conf:

{
  "cniVersion": "1.0.0",
  "name": "fromScratch",
  "type": "demystifying"
}

Create it locally and copy it to the kind node with the following command:

$ docker cp 10-demystifying.conf demystifying-cni-control-plane:/etc/cni/net.d/10-demystifying.conf
Successfully copied 2.05kB to demystifying-cni-control-plane:/etc/cni/net.d/10-demystifying.conf

The second one is the executable CNI plugin /opt/cni/bin/demystifying:

#!/usr/bin/env bash

# create veth
VETH_HOST=veth_host
VETH_NETNS=veth_netns
ip link add ${VETH_HOST} type veth peer name ${VETH_NETNS}

# put one of the veth interfaces into the new network namespace
NETNS=$(basename ${CNI_NETNS})
ip link set ${VETH_NETNS} netns ${NETNS}

# assign IP to veth interface inside the new network namespace
IP_VETH_NETNS=10.244.0.20
CIDR_VETH_NETNS=${IP_VETH_NETNS}/32
ip -n ${NETNS} addr add ${CIDR_VETH_NETNS} dev ${VETH_NETNS}

# assign IP to veth interface on the host
IP_VETH_HOST=10.244.0.101
CIDR_VETH_HOST=${IP_VETH_HOST}/32
ip addr add ${CIDR_VETH_HOST} dev ${VETH_HOST}

# rename veth interface inside the new network namespace
ip -n ${NETNS} link set ${VETH_NETNS} name ${CNI_IFNAME}

# ensure all interfaces are up
ip link set ${VETH_HOST} up
ip -n ${NETNS} link set ${CNI_IFNAME} up

# add routes inside the new network namespace so that it knows how to get to the host
ip -n ${NETNS} route add ${IP_VETH_HOST} dev eth0
ip -n ${NETNS} route add default via ${IP_VETH_HOST} dev eth0

# add route on the host to let it know how to reach the new network namespace
ip route add ${IP_VETH_NETNS}/32 dev ${VETH_HOST} scope host

# return a JSON via stdout
RETURN_TEMPLATE='
{
  "cniVersion": "1.0.0",
  "interfaces": [
    {
      "name": "%s",
      "mac": "%s"
    },
    {
      "name": "%s",
      "mac": "%s",
      "sandbox": "%s"
    }
  ],
  "ips": [
    {
      "address": "%s",
      "interface": 1
    }
  ]
}'

MAC_HOST_VETH=$(ip link show ${VETH_HOST} | grep link | awk '{print$2}')
MAC_NETNS_VETH=$(ip -netns $nsname link show ${CNI_IFNAME} | grep link | awk '{print$2}')

RETURN=$(printf "${RETURN_TEMPLATE}" "${VETH_HOST}" "${MAC_HOST_VETH}" "${CNI_IFNAME}" "${mac_netns_veth}" "${CNI_NETNS}" "${CIDR_VETH_NETNS}")
echo ${RETURN}

Again, you can create it locally and copy it to the container with:

$ docker cp demystifying demystifying-cni-control-plane:/opt/cni/bin/demystifying
Successfully copied 3.58kB to demystifying-cni-control-plane:/opt/cni/bin/demystifying

As the CNI plugin must be executable, we’ll need to modify the file permissions using chmod:

$ docker exec demystifying-cni-control-plane chmod +x /opt/cni/bin/demystifying
$ docker exec demystifying-cni-control-plane ls -l /opt/cni/bin/demystifying

-rwxr-xr-x 1 501 dialout 1783 Aug  8 09:12 /opt/cni/bin/demystifying

With that, we’ve constructed an operational CNI. The next time a Pod is created on the node, it will follow the outlined steps, and our CNI will be invoked:

$ kubectl run best-app-ever --image=nginx
pod/best-app-ever created
$ kubectl get pods -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP            NODE                             NOMINATED NODE   READINESS GATES
best-app-ever   1/1     Running   0          34s   10.244.0.20   demystifying-cni-control-plane   <none>           <none>

As evident, the Pod is running as expected. Note that the Pod’s IP address is 10.244.0.20, as set by our CNI. With everything configured correctly, the node can successfully reach the Pod and receive a response:

$ docker exec demystifying-cni-control-plane curl -sI 10.244.0.20

HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Thu, 08 Aug 2024 09:23:29 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 28 May 2024 13:22:30 GMT
Connection: keep-alive
ETag: "6655da96-267"
Accept-Ranges: bytes

Keep in mind that this setup isn’t suitable for production environments and comes with limitations as real-world scenarios require additional steps. For instance, assigning different IP addresses to the veth interface within the container network namespace and ensuring unique names for the veth interfaces on the host namespace are essential to support multiple Pods. Additionally, adding network configuration is only one aspect of the tasks a CNI plugin must support. These tasks are triggered by the CRI setting the CNI_COMMAND environment variable to DEL or CHECK respectively when invoking the CNI plugin. There are several other tasks, the specific requirements for full compatibility vary across versions and are outlined in the CNI specification. Nevertheless, the concepts outlined in this blog post hold true regardless of version and offer valuable insights into the workings of a CNI.

Summary

At the heart of Kubernetes networking lies the Container Network Interface (CNI) specification which defines the exchange between the Container Runtime Interface (CRI) and the executable CNI plugin which resides on every node within the Kubernetes cluster. While the CRI establishes a container’s network namespace, it is the CNI plugin’s role to execute intricate network configurations. These configurations involve creating virtual ethernet interfaces and managing network settings, ensuring seamless connectivity both to and from the newly established container network namespace.

We hope this blog post demystified the CNI and the role it plays in cloud native architectures. You will have seen a CNI in its most basic form ; to access the most advanced one, check out our free Enterprise labs or talk to our experts.

Filip Nikolic
AuthorFilip NikolicSenior Solutions Architect

Related

Labs

Discovery: Platform Engineer

In this short hands-on discovery lab designed for Platform and DevOps Engineers, you will learn, in 15 minutes, several Cilium features, including: Observability Built-in Ingress and Gateway API Performance Monitoring Integration with Grafana And more!

Briefs

Cilium Cheat Sheet

Are you using Cilium, and do you often have to look up options in the documentation? Fear not, this cheat sheet will help!

By
Dean Lewis
Blogs

Introducing The New “Kubernetes Networking and Cilium for the Network Engineer” eBook!

Introducing a new eBook! A Kubernetes Networking and Cilium instructions manual for the network engineer!

By
Nico Vibert

Industry insights you won’t delete. Delivered to your inbox weekly.