Tutorial : Azure CNI Powered by Cilium

Nov 14, 2022Cilium

At KubeCon North America 2022, Microsoft announced the availability of a new eBPF-based dataplane in Azure Kubernetes Service (AKS): Azure CNI Powered by Cilium.

This is not the first major announcement of a Cilium integration on a hyper-scaler cloud: it follows those with AWS and their EKS Anywhere platform, Google and GKE‘s Dataplane v2, Alibaba Cloud, etc. – the list goes on.

You can find out more about this new announcement, by reading Thomas’ blog post, Microsoft’s own announcement, and the official Azure documentation. This tutorial will go deeper into this new feature.

We will first go through a detailed walkthrough of the various AKS networking models, we will explain how Cilium can be installed on top of AKS and how to install and configure Azure CNI Powered by Cilium and finally, we will dive into the enhancements Cilium introduces to AKS networking and security.

Step 1: Prerequisites

If you’re reading this, then I assume you are already familiar with AKS and the Azure CLI. If not, follow “How to install the Azure CLI” to install it.

As I write this blog post (November 2022), Azure CNI Powered by Cilium is still in Preview, so the first steps require enabling your account to support it.

You’ll need to use the aks-preview CLI extension to leverage these features. As you can see in the listing below, it was already installed on my machine. It had been installed previously when I was documenting another AKS feature that integrates well with Cilium: Bring Your Own CNI.

nicovibert:~$ az extension list
[
  {
    "experimental": false,
    "extensionType": "whl",
    "name": "aks-preview",
    "path": "/Users/nicovibert/.azure/cliextensions/aks-preview",
    "preview": true,
    "version": "0.5.91"
  }
]

Version 0.5.91 is however not recent enough – you need the aks-preview extension 0.5.109 or later to support this feature. Let’s update it:

nicovibert:~$ az extension update --name aks-preview
nicovibert:~$ az extension list                     
[
  {
    "experimental": false,
    "extensionType": "whl",
    "name": "aks-preview",
    "path": "/Users/nicovibert/.azure/cliextensions/aks-preview",
    "preview": true,
    "version": "0.5.113"
  }
]

If you didn’t have the aks-preview CLI extension already, install it with az extension add --name aks-preview.

Let’s now register the new CiliumDataplanePreview Azure Resource Provider for this feature:

nicovibert:~$ az feature register --namespace "Microsoft.ContainerService" --name "CiliumDataplanePreview"
Once the feature 'CiliumDataplanePreview' is registered, invoking 'az provider register -n Microsoft.ContainerService' is required to get the change propagated
{
  "id": "/subscriptions/22716d91-fb67-4a07-ac5f-d36ea49d6167/providers/Microsoft.Features/providers/Microsoft.ContainerService/features/CiliumDataplanePreview",
  "name": "Microsoft.ContainerService/CiliumDataplanePreview",
  "properties": {
    "state": "Registering"
  },
  "type": "Microsoft.Features/providers/features"
}

Note that registering a new feature can take some time to propagate (typically, between 5 and 15 minutes).

Let’s also enable the Preview feature for the Azure CNI Overlay feature (more on this shortly):

nicovibert:~$ az feature register --namespace Microsoft.ContainerService --name AzureOverlayPreview
Once the feature 'AzureOverlayPreview' is registered, invoking 'az provider register -n Microsoft.ContainerService' is required to get the change propagated
{
  "id": "/subscriptions/22716d91-fb67-4a07-ac5f-d36ea49d6167/providers/Microsoft.Features/providers/Microsoft.ContainerService/features/AzureOverlayPreview",
  "name": "Microsoft.ContainerService/AzureOverlayPreview",
  "properties": {
    "state": "Registering"
  },
  "type": "Microsoft.Features/providers/features"
}
nicovibert:~$ 

Again, it might take you between 5 and 15 minutes for the feature to show as Registered. You can check with the following command if the features are registered:

nicovibert:~$ az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/AzureOverlayPreview')].{Name:name,State:properties.state}"
Name                                            State
----------------------------------------------  ----------
Microsoft.ContainerService/AzureOverlayPreview  Registered
nicovibert:~$ 
nicovibert:~$ az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/CiliumDataplanePreview')].{Name:name,State:properties.state}"
Name                                               State
-------------------------------------------------  ----------
Microsoft.ContainerService/CiliumDataplanePreview  Registered

As you can see, both features are now Registered and we can progress to the next step.

Step 2: AKS Networking Mode Selection

While Azure CNI Powered by Cilium is the main focus of the article, it’s important to understand the various AKS Networking Modes and their evolution over time.

Azure has traditionally supported two networking modes: kubenet and Azure CNI. Let’s explain briefly the main differences and the evolution of these modes:

  1. When compared to kubenet, Azure CNI is the more advanced option and is better integrated with native Azure services. But Azure CNI requires subnet planning prior to deploying the cluster. As pods would get their IP addresses allocated from the VNet subnet, users had to plan their future requirements carefully before creating the virtual network. Azure CNI actually pre-allocates IP addresses to pods, based on the max-pod settings defined during creation (By default, there is a maximum of 30 pods per node and therefore 30 additional IP addresses are pre-assigned for pods that might eventually be scheduled on the node.)
  2. kubenet doesn’t require as much planning with regards to IP address Management: pods get their IP addresses from a private subnet that is distinct to the VNet address range. However, using kubenet comes with a main drawback: it requires user-defined routes (UDR) to route traffic back into the cluster. There is a hard limit of 400 UDR per route table which means AKS cluster with kubenet cannot scale beyond 400 nodes.
  3. Last year, Microsoft introduced a more flexible IP Address Management model to Azure CNI: Dynamic IP allocation and enhanced subnet support in AKS gives users the ability to have distinct subnets for nodes and pods (albeit from the same broader VNet CIDR). It also means that IP addresses will be dynamically assigned when the pods are created, instead of being pre-allocated.
  4. Finally, Microsoft recently announced a new networking model that is meant to give users the best of both worlds: Azure CNI Overlay. Still in Preview mode as I write this post, Azure CNI Overlay acts like kubenet from an IP Address Management viewpoint (IP addresses are assigned to pods from an address space logically different from the VNet) but with even simpler network configuration (it eliminates the need for UDR) and with better performance and scale.

To summarize:

Network ModelConsiderations
kubenetConserves IP address space but has scale limitation, requires User-Defined Routes (UDR) and has minor additional latency due to an extra hop.
Pods get their IP assigned from a private range different from the VNet address range.
Azure CNI (Classic)Provides full virtual network connectivity but requires more IP address space and careful planning.
Pods get their IP assigned from the same range as the nodes.
Azure CNI (Dynamic)Provides full virtual network connectivity and is more flexible than the classic Azure CNI model.
Pods get their IP assigned from a range that is distinct to the one used for the nodes but that remains within the VNet address space.
Azure CNI OverlayProvides full virtual network connectivity, high performance and does not require additional routing configuration.
Pods get their IP assigned from a private range different from the VNet address range (similar to kubenet).

Still in Preview and currently limited to a subset of regions (North Central US and West Central US at time of publication).
AKS Network Model Comparisons

Step 3: Cilium Mode Selection

Now that we understand the various AKS network models available to us, let’s look at how Cilium can be installed as there are several Cilium options available to us:

  1. The best experience for users looking at deploying Cilium with AKS will be Azure CNI Powered by Cilium.
  2. For those who want more flexibility, users can leverage BYOCNI to install a fully customizable Cilium (the Cilium configuration on Azure CNI Powered by Cilium is managed by AKS and cannot be modified).
  3. Finally, some users might want to use the “legacy” Cilium model with Azure IPAM, when custom Azure API integration and Cilium customization are required.

The first option – Azure CNI Powered by Cilium – is the preferred option and supports two methods:

  • Assign IP addresses from a VNet (similar to the Azure CNI (Dynamic) model where pods can get an IP address different from the node pool subnet but still within the VNet CIDR space)
  • Assign IP addresses from an overlay (based on Azure CNI Overlay, with pods getting IP addresses from a network range different to the one used by nodes).

In either mode, AKS CNI performs all IPAM actions and acts as the IPAM plugin for Cilium. Cilium is given the IP address and interface information for the pod.

In the rest of the tutorial, I will be using Azure CNI Powered by Cilium based on Azure CNI Overlay.

Step 4: AKS Network Creation

Let’s create a new Resource Group before setting up the network.

Note the location is important as the Azure CNI Overlay feature is currently only available in a couple of regions: North Central US and West Central US. If you want to use Cilium with Azure CNI (Dynamic) instead, you can deploy in the region of your choice as this is already available in all public cloud regions.

nicovibert:~$ resourceGroup="myResourceGroupCilium"
nicovibert:~$ vnet="myVirtualNetworkCilium"
nicovibert:~$ location="westcentralus"
nicovibert:~$ az group create --name $resourceGroup --location $location
{
  "id": "/subscriptions/a-b-c-d/resourceGroups/myResourceGroupCilium",
  "location": "westcentralus",
  "name": "myResourceGroupCilium",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "type": "Microsoft.Resources/resourceGroups"
}
nicovibert:~$ 

Let’s now create a VNet and the VNet subnet that the nodes will use for their own IP addresses:

nicovibert:~$ az network vnet create -g $resourceGroup --location $location --name $vnet --address-prefixes 10.0.0.0/8 -o none
nicovibert:~$ az network vnet subnet create -g $resourceGroup --vnet-name $vnet --name nodesubnet --address-prefix 10.10.0.0/16 -o none

Let’s get the subnetId of the subnet we’ve just created as we’ll need it for the cluster deployment:

nnicovibert:~$ az network vnet subnet list --resource-group $resourceGroup --vnet-name $vnet -o json | jq .'[0].id'
"/subscriptions/a-b-c-d/resourceGroups/myResourceGroupCilium/providers/Microsoft.Network/virtualNetworks/myVirtualNetworkCilium/subnets/nodesubnet"
nicovibert:~$ subnetId="/subscriptions/a-b-c-d/resourceGroups/myResourceGroupCilium/providers/Microsoft.Network/virtualNetworks/myVirtualNetworkCilium/subnets/nodesubnet"

Step 5: Cluster Creation

Let’s now deploy our cluster. Note below that network-plugin is set to azure and that network-plugin-mode is set to overlay as I chose to deploy Cilium in this particular mode.

Notice as well, the pod-cidr is set to 192.168.0.0/16 which is a different range from the VNet subnet created previously.

Finally, enable-cilium-dataplane will enable the Cilium Dataplane feature set.

nicovibert:~$ az aks create -n $clusterName -g $resourceGroup -l $location \
  --max-pods 250 \
  --node-count 2 \
  --network-plugin azure \
  --network-plugin-mode overlay \
  --pod-cidr 192.168.0.0/16 \
  --vnet-subnet-id $subnetId \
  --enable-cilium-dataplane

It should take about 5 to 10 minutes to create the cluster.

Let’s verify in the Azure portal what I explained earlier with regards to pods and nodes addressing: once I deploy some pods (see next section), my pods get IP addresses from the 192.168.0.0/16 range…

AzureCNI Pods IP addresses

…which is distinct from the network that nodes get their IP addresses from.

AzureCNI Nodes IP addresses

Step 6: Cluster and Cilium Health Check

Let’s look at the cluster once the deployment is completed. Let’s first connect to the cluster:

nicovibert:~$ az aks get-credentials -n $clusterName -g $resourceGroup      
The behavior of this command has been altered by the following extension: aks-preview
Merged "myOverlayClusterCilium" as current context in /Users/nicovibert/.kube/config

We can now run kubectl commands. Let’s start by checking the status of the nodes.

nicovibert:~$ kubectl get nodes
NAME                                STATUS   ROLES   AGE   VERSION
aks-nodepool1-33954605-vmss000000   Ready    agent   26h   v1.23.12
aks-nodepool1-33954605-vmss000001   Ready    agent   26h   v1.23.12

Cilium is healthy (I am using the Cilium CLI to verify the status of Cilium – follow the steps on the Cilium docs to install it if you don’t have it already).

nicovibert:~$ cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:         OK
 \__/¯¯\__/    Operator:       OK
 /¯¯\__/¯¯\    Hubble:         disabled
 \__/¯¯\__/    ClusterMesh:    disabled
    \__/

Deployment        cilium-operator    Desired: 2, Ready: 2/2, Available: 2/2
DaemonSet         cilium             Desired: 2, Ready: 2/2, Available: 2/2
Containers:       cilium-operator    Running: 2
                  cilium             Running: 2
Cluster Pods:     13/13 managed by Cilium
Image versions    cilium-operator    mcr.microsoft.com/oss/cilium/operator-generic:1.12.2: 2
                  cilium             mcr.microsoft.com/oss/cilium/cilium:1.12.2: 2

Let’s also check the node-to-node health with cilium-health status:

nicovibert:~$ kubectl -n kube-system exec ds/cilium -- cilium-health status
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
Probe time:   2022-11-01T16:34:34Z
Nodes:
  aks-nodepool1-33954605-vmss000001 (localhost):
    Host connectivity to 10.10.0.5:
      ICMP to stack:   OK, RTT=256.012µs
      HTTP to agent:   OK, RTT=208.41µs
  aks-nodepool1-33954605-vmss000000:
    Host connectivity to 10.10.0.4:
      ICMP to stack:   OK, RTT=1.31046ms
      HTTP to agent:   OK, RTT=540.124µs

We can even run a cilium connectivity test (an automated test that checks that Cilium has been deployed correctly and tests intra-node connectivity, inter-node connectivity and network policies) to verify that everything is working as expected.

nicovibert:~$ cilium connectivity test
ℹ️  Monitor aggregation detected, will skip some flow validation steps
[myOverlayClusterCilium] Creating namespace cilium-test for connectivity check...
[myOverlayClusterCilium] Deploying echo-same-node service...
[myOverlayClusterCilium] Deploying DNS test server configmap...
[myOverlayClusterCilium] Deploying same-node deployment...
[myOverlayClusterCilium] Deploying client deployment...
[myOverlayClusterCilium] Deploying client2 deployment...
[myOverlayClusterCilium] Deploying echo-other-node service...
[myOverlayClusterCilium] Deploying other-node deployment...
[omitted for brevity]
ℹ️  Cilium version: 1.12.2
🏃 Running tests...

[=] Test [no-policies]
....................................
[=] Test [allow-all-except-world]
..............
[=] Test [client-ingress]
..
[=] Test [echo-ingress]
....
[=] Test [client-ingress-icmp]
..
[=] Test [client-egress]
....
[=] Test [client-egress-expression]
....
[=] Test [client-egress-to-echo-service-account]
....
[=] Test [to-entities-world]
......
[=] Test [to-cidr-1111]
....
[=] Test [echo-ingress-from-other-client-deny]
......
[=] Test [client-ingress-from-other-client-icmp-deny]
......
[=] Test [client-egress-to-echo-deny]
......
[=] Test [client-ingress-to-echo-named-port-deny]
....
[=] Test [client-egress-to-echo-expression-deny]
....
[=] Test [client-egress-to-echo-service-account-deny]
....
[=] Test [client-egress-to-cidr-deny]
....

✅ All 17 tests (114 actions) successful, 6 tests skipped, 0 scenarios skipped.

Step 7: Cilium Benefits

Let’s now demonstrate three of the benefits that Cilium Dataplane brings:

  • Build-in network policy support
  • Network Monitoring
  • Replacement of kube-proxy for better performance and reduced latency
Azure CNI Powered by Cilium - Architecture Overview

Network Policy Support

Kubernetes doesn’t natively enforce Network Policies – it needs a network plugin to do that for us. For AKS clusters, installing and managing a separate network policy engine was previously required.

That’s no longer the case once you deploy clusters with Azure CNI powered by Cilium: that’s automatically built-in.

Let’s verify it. First, I am going to create three namespaces and three identical applications across each namespace. Each application is based on a pair of pods: a frontend-service and a backend-service. A frontend-service can communicate over HTTP to a backend-service.

nicovibert:~$ kubectl create ns tenant-a
nicovibert:~$ kubectl create ns tenant-b
nicovibert:~$ kubectl create ns tenant-c
nicovibert:~$ kubectl create -f https://docs.isovalent.com/v1.11/public/tenant-services.yaml -n tenant-a
nicovibert:~$ kubectl create -f https://docs.isovalent.com/v1.11/public/tenant-services.yaml -n tenant-b
nicovibert:~$ kubectl create -f https://docs.isovalent.com/v1.11/public/tenant-services.yaml -n tenant-c
namespace/tenant-a created
namespace/tenant-b created
namespace/tenant-c created
service/frontend-service created
pod/frontend-service created
service/backend-service created
pod/backend-service created
service/frontend-service created
pod/frontend-service created
service/backend-service created
pod/backend-service created
service/frontend-service created
pod/frontend-service created
service/backend-service created
pod/backend-service created
nicovibert:~$ 

Let’s verify communications before we apply any network policies. My frontend-service in tenant-a namespace should be able to communicate with any of the backends across the tenant-a, tenant-b and tenant-c namespaces. It can also connect to the public Twitter APIs. As you can also see below, these curl requests are sent to FQDNs and therefore a network policy will need to ensure DNS is still effective.

nicovibert:~$ kubectl exec -n tenant-a frontend-service -- curl -sI backend-service.tenant-a
HTTP/1.1 200 OK
X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Accept-Ranges: bytes
Cache-Control: public, max-age=0
Last-Modified: Fri, 07 Oct 2022 13:29:16 GMT
ETag: W/"809-183b2a2c7e0"
Content-Type: text/html; charset=UTF-8
Content-Length: 2057
Date: Fri, 28 Oct 2022 16:37:52 GMT
Connection: keep-alive
Keep-Alive: timeout=5

nicovibert:~$ kubectl exec -n tenant-a frontend-service -- curl -sI backend-service.tenant-b
HTTP/1.1 200 OK
X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Accept-Ranges: bytes
Cache-Control: public, max-age=0
Last-Modified: Fri, 07 Oct 2022 13:29:16 GMT
ETag: W/"809-183b2a2c7e0"
Content-Type: text/html; charset=UTF-8
Content-Length: 2057
Date: Fri, 28 Oct 2022 16:37:57 GMT
Connection: keep-alive
Keep-Alive: timeout=5

nicovibert:~$ kubectl exec -n tenant-a frontend-service -- curl -sI backend-service.tenant-c
HTTP/1.1 200 OK
X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Accept-Ranges: bytes
Cache-Control: public, max-age=0
Last-Modified: Fri, 07 Oct 2022 13:29:16 GMT
ETag: W/"809-183b2a2c7e0"
Content-Type: text/html; charset=UTF-8
Content-Length: 2057
Date: Fri, 28 Oct 2022 16:38:00 GMT
Connection: keep-alive
Keep-Alive: timeout=5

nicovibert:~$ kubectl exec -n tenant-a frontend-service -- curl -sI --max-time 5 api.twitter.com
HTTP/1.1 301 Moved Permanently
location: https://api.twitter.com/
x-connection-hash: f3daae715022b056cf839c8e308a1f7b7a313e817b6e63d78b4859a0d1686cde
date: Fri, 28 Oct 2022 16:37:42 GMT
server: tsa_b
transfer-encoding: chunked

Let’s say we want to enforce some segmentation between my tenants and prevent pods in tenant-a to access services in tenant-b and tenant-c. Imagine we also want to prevent communications to the outside of the cluster. Let’s build a network policy for this use case.

nicovibert:~$ cat netpol.yaml 
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-policy
  namespace: tenant-a
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector: {}
  egress:
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
    - to:
        - podSelector: {}

To be perfectly honest, I still find creating and editing network policies tricky. To make life easier for me, I just used the Cilium Network Policy editor. Note that we still need to authorize explicitly DNS in order for name resolution to be successful. Watch the video below to see how I created this policy.

As soon as I apply the NetworkPolicy, traffic to the Twitter APIs and to the backends outside the tenant-a namespace is dropped while traffic within the namespace is still approved.

nicovibert:~$ kubectl apply -f netpol.yaml 
networkpolicy.networking.k8s.io/default-policy created
nicovibert:~$ 
nicovibert:~$ kubectl exec -n tenant-a frontend-service -- curl -sI --max-time 5 api.twitter.com
command terminated with exit code 28
nicovibert:~$ kubectl exec -n tenant-a frontend-service -- curl -sI --max-time 5 backend-service.tenant-b
command terminated with exit code 28
nicovibert:~$ kubectl exec -n tenant-a frontend-service -- curl -sI --max-time 5 backend-service.tenant-c
command terminated with exit code 28
nicovibert:~$ kubectl exec -n tenant-a frontend-service -- curl -sI --max-time 5 backend-service.tenant-a
HTTP/1.1 200 OK
X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Accept-Ranges: bytes
Cache-Control: public, max-age=0
Last-Modified: Fri, 07 Oct 2022 13:29:16 GMT
ETag: W/"809-183b2a2c7e0"
Content-Type: text/html; charset=UTF-8
Content-Length: 2057
Date: Fri, 28 Oct 2022 16:58:24 GMT
Connection: keep-alive
Keep-Alive: timeout=5

The full network policy demo can be watched below:

Network Policy Demo

Note that there is not yet support for the more advanced Cilium Network Policies, including the Layer 7-based filtering. The Cilium configuration you get with Azure CNI Powered by Cilium cannot be changed – if you want all the bells and whistles of Cilium, consider deploying it in BYOCNI mode instead.

Network Monitoring

Another feature not yet available with Azure CNI Powered by Cilium is the observability platform Hubble. However, there is an alternative for users that want to understand flow connectivity or those that need to troubleshoot networking issues. You can use cilium monitor to track network traffic.

Let’s go back to the example above where frontend-service in the tenant-a namespace is unsuccessfully trying to access backend-service in tenant-b as a network policy drops this traffic. We can track this, using cilium monitor --type drop from the Cilium agent shell.

Let’s get the agent’s name first:

nicovibert:~$ kubectl get pods -n kube-system | grep cilium                                              
cilium-k8lf6                          1/1     Running   0               4d
cilium-nzbr8                          1/1     Running   0               4d
cilium-operator-c69fd4946-j96v8       1/1     Running   3 (3d12h ago)   4d
cilium-operator-c69fd4946-mbs9x       1/1     Running   3 (30h ago)     4d

While I retry a curl from frontend-service/tenant-a:

nicovibert:~$ kubectl exec -n tenant-a frontend-service -- curl -sI --max-time 5 backend-service.tenant-b
command terminated with exit code 28

I can see on my agent this particular flow and why the packet was dropped (Policy denied):

nicovibert:~$ kubectl -n kube-system exec -it cilium-nzbr8 -- cilium monitor --type drop 

xx drop (Policy denied) flow 0xa20c6c5f to endpoint 0, file bpf_lxc.c line 1179, , identity 13886->466: 192.168.2.132:39044 -> 192.168.2.115:80 tcp SYN
xx drop (Policy denied) flow 0xc8601b95 to endpoint 0, file bpf_lxc.c line 1179, , identity 13886->466: 192.168.2.132:39044 -> 192.168.2.115:80 tcp SYN
xx drop (Policy denied) flow 0x435450c6 to endpoint 0, file bpf_lxc.c line 1179, , identity 13886->466: 192.168.2.132:39044 -> 192.168.2.115:80 tcp SYN

I can get even more insight by filtering based on the endpoint ID (an identifier that uniquely represents an object in Cilium and can be found via kubectl get cep). Note how we can also see the DNS requests to core-dns (192.168.1.194:53):

nicovibert:~$ kubectl get cep -n tenant-a --field-selector metadata.name=frontend-service -o json | jq '.items[0].status.id' 
263
nicovibert:~$ kubectl -n kube-system exec -it cilium-nzbr8 -- cilium monitor  --from 263  
Policy verdict log: flow 0xcedbf7e0 local EP ID 263, remote ID 53170, proto 17, egress, action allow, match L3-L4, 192.168.2.132:41326 -> 192.168.1.194:53 udp
-> stack flow 0xcedbf7e0 , identity 13886->53170 state new ifindex 0 orig-ip 0.0.0.0: 192.168.2.132:41326 -> 192.168.1.194:53 udp
-> endpoint 263 flow 0x26aea76c , identity 53170->13886 state reply ifindex 0 orig-ip 192.168.1.194: 192.168.1.194:53 -> 192.168.2.132:41326 udp
Policy verdict log: flow 0x69df841b local EP ID 263, remote ID 466, proto 6, egress, action deny, match none, 192.168.2.132:51554 -> 192.168.2.115:80 tcp SYN
xx drop (Policy denied) flow 0x69df841b to endpoint 0, file bpf_lxc.c line 1179, , identity 13886->466: 192.168.2.132:51554 -> 192.168.2.115:80 tcp SYN

Let’s now look at a successful connection between frontend-service and backend-service in the tenant-a namespace. In this instance, you can see how the Network Policy allows the traffic through. You can also see a successful TCP 3-way handshake.

nicovibert:~$ kubectl -n kube-system exec -it cilium-nzbr8 -- cilium monitor  --from 263  
Policy verdict log: flow 0x9373c462 local EP ID 263, remote ID 53170, proto 17, egress, action allow, match L3-L4, 192.168.2.132:57145 -> 192.168.1.194:53 udp
-> stack flow 0x9373c462 , identity 13886->53170 state new ifindex 0 orig-ip 0.0.0.0: 192.168.2.132:57145 -> 192.168.1.194:53 udp
-> endpoint 263 flow 0xa3309bb3 , identity 53170->13886 state reply ifindex 0 orig-ip 192.168.1.194: 192.168.1.194:53 -> 192.168.2.132:57145 udp
Policy verdict log: flow 0x16eb06db local EP ID 263, remote ID 53170, proto 17, egress, action allow, match L3-L4, 192.168.2.132:40471 -> 192.168.2.146:53 udp
-> stack flow 0x16eb06db , identity 13886->53170 state new ifindex 0 orig-ip 0.0.0.0: 192.168.2.132:40471 -> 192.168.2.146:53 udp
-> endpoint 263 flow 0x0 , identity 53170->13886 state reply ifindex 0 orig-ip 192.168.2.146: 192.168.2.146:53 -> 192.168.2.132:40471 udp
Policy verdict log: flow 0x640d2639 local EP ID 263, remote ID 48999, proto 6, egress, action allow, match L3-Only, 192.168.2.132:45674 -> 192.168.2.187:80 tcp SYN
-> stack flow 0x640d2639 , identity 13886->48999 state new ifindex 0 orig-ip 0.0.0.0: 192.168.2.132:45674 -> 192.168.2.187:80 tcp SYN
-> endpoint 263 flow 0xee33bb12 , identity 48999->13886 state reply ifindex 0 orig-ip 192.168.2.187: 192.168.2.187:80 -> 192.168.2.132:45674 tcp SYN, ACK
-> stack flow 0x640d2639 , identity 13886->48999 state established ifindex 0 orig-ip 0.0.0.0: 192.168.2.132:45674 -> 192.168.2.187:80 tcp ACK
-> stack flow 0x640d2639 , identity 13886->48999 state established ifindex 0 orig-ip 0.0.0.0: 192.168.2.132:45674 -> 192.168.2.187:80 tcp ACK
-> endpoint 263 flow 0xee33bb12 , identity 48999->13886 state reply ifindex 0 orig-ip 192.168.2.187: 192.168.2.187:80 -> 192.168.2.132:45674 tcp ACK
-> stack flow 0x640d2639 , identity 13886->48999 state established ifindex 0 orig-ip 0.0.0.0: 192.168.2.132:45674 -> 192.168.2.187:80 tcp ACK, FIN
-> endpoint 263 flow 0xee33bb12 , identity 48999->13886 state reply ifindex 0 orig-ip 192.168.2.187: 192.168.2.187:80 -> 192.168.2.132:45674 tcp ACK, FIN
-> stack flow 0x640d2639 , identity 13886->48999 state established ifindex 0 orig-ip 0.0.0.0: 192.168.2.132:45674 -> 192.168.2.187:80 tcp ACK

Kubeproxy Replacement

One of the additional benefits of using Cilium is its extremely efficient data plane. It’s particularly useful at scale, as the standard kube-proxy is based on a technology – iptables – that was never designed with the churn and the scale of large Kubernetes clusters.

There have been many presentations, benchmarks and case studies that explain the performance benefits of moving away from iptables to Cilium’s eBPF kube-proxy replacement so I will keep this outside the scope of this blog post.

But still, you might want to check whether you might see any iptables rules created when using Cilium.

For this test, I used a script to create 100 Kubernetes Services on my cluster. When I did this test on a standard Kubernetes cluster (as documented on my personal blog a few months ago), I got over 400 rules created.

root@aks-nodepool1-20100607-vmss000000:/# iptables-save | grep -c KUBE-SEP
432
root@aks-nodepool1-20100607-vmss000000:/# iptables-save | grep -c KUBE-SVC
423

Once you start having thousands of services, the table becomes huge and latency will be introduced as any incoming packet has to match an iptables rule: the table has to be linearly traversed so a packet that matches a rule at the end of a table with thousands of entries could encounter significant delay.

If you were to use Azure CNI Powered by Cilium instead, you benefit from Cilium’s eBPF-based kube-proxy replacement. The iptables configuration is simply much shorter and doesn’t increase when you add services.

nicovibert:~$ kubectl get svc -A | grep -c ClusterIP
106
nicovibert:~$ kubectl-exec aks-nodepool1-33954605-vmss000000                                                                        
Kuberetes client version is 1.25. Generator will not be used since it is deprecated.
creating pod "aks-nodepool1-33954605-vmss000000-exec-24918" on node "aks-nodepool1-33954605-vmss000000"

If you don't see a command prompt, try pressing enter.

root@aks-nodepool1-33954605-vmss000000:/# 
root@aks-nodepool1-33954605-vmss000000:/# iptables-save | grep -c KUBE-SVC
0
root@aks-nodepool1-33954605-vmss000000:/# iptables-save | grep -c KUBE-SEP
0

Conclusion

Hopefully this post gave you a good overview of how and why you would use Azure CNI Powered by Cilium, while presenting you with the various networking options offered with AKS.

If you have any feedback on the solution, please share it with us. You’ll find us on the Cilium Slack channel.

Learn More

Nico Vibert
AuthorNico VibertSenior Technical Marketing Engineer