Last updated: 12th June 2023
At KubeCon North America 2022, Microsoft announced the availability of a new eBPF-based dataplane in Azure Kubernetes Service (AKS): Azure CNI Powered by Cilium.
This is not the first major announcement of a Cilium integration on a hyper-scaler cloud: it follows those with AWS and their EKS Anywhere platform, Google and GKE‘s Dataplane v2, Alibaba Cloud, etc. – the list goes on.
You can find out more about this new announcement, by reading Thomas’ blog post, Microsoft’s own announcement, and the official Azure documentation. This tutorial will go deeper into this new feature.
We will first go through a detailed walkthrough of the various AKS networking models, we will explain how Cilium can be installed on top of AKS and how to install and configure Azure CNI Powered by Cilium and finally, we will dive into the enhancements Cilium introduces to AKS networking and security.
Step 1: Prerequisites
If you’re reading this, then I assume you are already familiar with AKS and the Azure CLI. If not, follow “How to install the Azure CLI” to install it.
When I initially wrote this blog post (November 2022), Azure CNI Powered by Cilium was launched in Preview, so some additional steps were required.
With Azure CNI Powered by Cilium now Generally Available (GA), if you are running Azure CLI version 2.48.1 or later, you can go straight to Step 2. If you’re not ready to upgrade your Azure CLI yet, follow the rest of the instructions below.
If you’re not using Azure CLI 2.48.1 yet, you will need to use the aks-preview
CLI extension.
As you can see in the listing below, it was already installed on my machine. It had been installed previously when I was documenting another AKS feature that integrates well with Cilium: Bring Your Own CNI.
Version 0.5.91
is however not recent enough – you need the aks-preview
extension 0.5.109 or later to support this feature. Let’s update it:
If you didn’t have the aks-preview
CLI extension already, install it with az extension add --name aks-preview
.
Let’s now register the new CiliumDataplanePreview
Azure Resource Provider for this feature:
Note that registering a new feature can take some time to propagate (typically, between 5 and 15 minutes).
Let’s also enable the Preview feature for the Azure CNI Overlay feature (more on this shortly):
Again, it might take you between 5 and 15 minutes for the feature to show as Registered
. You can check with the following command if the features are registered:
As you can see, both features are now Registered
and we can progress to the next step.
Step 2: AKS Networking Mode Selection
While Azure CNI Powered by Cilium is the main focus of the article, it’s important to understand the various AKS Networking Modes and their evolution over time.
Azure has traditionally supported two networking modes: kubenet and Azure CNI. Let’s explain briefly the main differences and the evolution of these modes:
- When compared to
kubenet
,Azure CNI
is the more advanced option and is better integrated with native Azure services. ButAzure CNI
requires subnet planning prior to deploying the cluster. As pods would get their IP addresses allocated from the VNet subnet, users had to plan their future requirements carefully before creating the virtual network.Azure CNI
actually pre-allocates IP addresses to pods, based on themax-pod
settings defined during creation (By default, there is a maximum of 30 pods per node and therefore 30 additional IP addresses are pre-assigned for pods that might eventually be scheduled on the node.) kubenet
doesn’t require as much planning with regards to IP address Management: pods get their IP addresses from a private subnet that is distinct to the VNet address range. However, usingkubenet
comes with a main drawback: it requires user-defined routes (UDR) to route traffic back into the cluster. There is a hard limit of 400 UDR per route table which means AKS cluster withkubenet
cannot scale beyond 400 nodes.- Last year, Microsoft introduced a more flexible IP Address Management model to
Azure CNI
: Dynamic IP allocation and enhanced subnet support in AKS gives users the ability to have distinct subnets for nodes and pods (albeit from the same broader VNet CIDR). It also means that IP addresses will be dynamically assigned when the pods are created, instead of being pre-allocated. - Finally, Microsoft recently announced a new networking model that is meant to give users the best of both worlds: Azure CNI Overlay. Still in Preview mode as I write this post, Azure CNI Overlay acts like
kubenet
from an IP Address Management viewpoint (IP addresses are assigned to pods from an address space logically different from the VNet) but with even simpler network configuration (it eliminates the need for UDR) and with better performance and scale.
To summarize:
Network Model | Considerations |
---|---|
kubenet | Conserves IP address space but has scale limitation, requires User-Defined Routes (UDR) and has minor additional latency due to an extra hop. Pods get their IP assigned from a private range different from the VNet address range. |
Azure CNI (Classic) | Provides full virtual network connectivity but requires more IP address space and careful planning. Pods get their IP assigned from the same range as the nodes. |
Azure CNI (Dynamic) | Provides full virtual network connectivity and is more flexible than the classic Azure CNI model. Pods get their IP assigned from a range that is distinct to the one used for the nodes but that remains within the VNet address space. |
Azure CNI Overlay | Provides full virtual network connectivity, high performance and does not require additional routing configuration. Pods get their IP assigned from a private range different from the VNet address range (similar to kubenet). Now GA and currently limited to a subset of regions (North Central US and West Central US at time of publication). |
Step 3: Cilium Mode Selection
Now that we understand the various AKS network models available to us, let’s look at how Cilium can be installed as there are several Cilium options available to us:
- The best experience for users looking at deploying Cilium with AKS will be Azure CNI Powered by Cilium.
- For those who want more flexibility, users can leverage BYOCNI to install a fully customizable Cilium (the Cilium configuration on Azure CNI Powered by Cilium is managed by AKS and cannot be modified).
- Finally, some users might want to use the “legacy” Cilium model with Azure IPAM, when custom Azure API integration and Cilium customization are required.
The first option – Azure CNI Powered by Cilium – is the preferred option and supports two methods:
- Assign IP addresses from a VNet (similar to the Azure CNI (Dynamic) model where pods can get an IP address different from the node pool subnet but still within the VNet CIDR space)
- Assign IP addresses from an overlay (based on Azure CNI Overlay, with pods getting IP addresses from a network range different to the one used by nodes).
In either mode, AKS CNI performs all IPAM actions and acts as the IPAM plugin for Cilium. Cilium is given the IP address and interface information for the pod.
In the rest of the tutorial, I will be using Azure CNI Powered by Cilium based on Azure CNI Overlay.
Step 4: AKS Network Creation
Let’s create a new Resource Group before setting up the network.
Let’s now create a VNet and the VNet subnet that the nodes will use for their own IP addresses:
Let’s get the subnetId
of the subnet we’ve just created as we’ll need it for the cluster deployment:
Step 5: Cluster Creation
Let’s now deploy our cluster. Note below that network-plugin
is set to azure
and that network-plugin-mode
is set to overlay
as I chose to deploy Cilium in this particular mode.
Notice as well, the pod-cidr
is set to 192.168.0.0/16
which is a different range from the VNet subnet created previously.
Finally, enable-cilium-dataplane
will enable the Cilium Dataplane feature set.
It should take about 5 to 10 minutes to create the cluster.
Let’s verify in the Azure portal what I explained earlier with regards to pods and nodes addressing: once I deploy some pods (see next section), my pods get IP addresses from the 192.168.0.0/16 range…
…which is distinct from the network that nodes get their IP addresses from.
Step 6: Cluster and Cilium Health Check
Let’s look at the cluster once the deployment is completed. Let’s first connect to the cluster:
We can now run kubectl
commands. Let’s start by checking the status of the nodes.
Cilium is healthy (I am using the Cilium CLI to verify the status of Cilium – follow the steps on the Cilium docs to install it if you don’t have it already).
Let’s also check the node-to-node health with cilium-health status
:
We can even run a cilium connectivity test
(an automated test that checks that Cilium has been deployed correctly and tests intra-node connectivity, inter-node connectivity and network policies) to verify that everything is working as expected.
Step 7: Cilium Benefits
Let’s now demonstrate three of the benefits that Cilium Dataplane brings:
- Build-in network policy support
- Network Monitoring
- Replacement of kube-proxy for better performance and reduced latency
Network Policy Support
Kubernetes doesn’t natively enforce Network Policies – it needs a network plugin to do that for us. For AKS clusters, installing and managing a separate network policy engine was previously required.
That’s no longer the case once you deploy clusters with Azure CNI powered by Cilium: that’s automatically built-in.
Let’s verify it. First, I am going to create three namespaces and three identical applications across each namespace. Each application is based on a pair of pods: a frontend-service
and a backend-service
. A frontend-service can communicate over HTTP to a backend-service.
Let’s verify communications before we apply any network policies. My frontend-service
in tenant-a
namespace should be able to communicate with any of the backends across the tenant-a
, tenant-b
and tenant-c
namespaces. It can also connect to the public Twitter APIs. As you can also see below, these curl
requests are sent to FQDNs and therefore a network policy will need to ensure DNS is still effective.
Let’s say we want to enforce some segmentation between my tenants and prevent pods in tenant-a
to access services in tenant-b
and tenant-c
. Imagine we also want to prevent communications to the outside of the cluster. Let’s build a network policy for this use case.
To be perfectly honest, I still find creating and editing network policies tricky. To make life easier for me, I just used the Cilium Network Policy editor. Note that we still need to authorize explicitly DNS in order for name resolution to be successful. Watch the video below to see how I created this policy.
As soon as I apply the NetworkPolicy
, traffic to the Twitter APIs and to the backends outside the tenant-a
namespace is dropped while traffic within the namespace is still approved.
The full network policy demo can be watched below:
Note that there is not yet support for the more advanced Cilium Network Policies, including the Layer 7-based filtering. The Cilium configuration you get with Azure CNI Powered by Cilium cannot be changed – if you want all the bells and whistles of Cilium, consider deploying it in BYOCNI mode instead.
Network Monitoring
Another feature not yet available with Azure CNI Powered by Cilium is the observability platform Hubble. However, there is an alternative for users that want to understand flow connectivity or those that need to troubleshoot networking issues. You can use cilium monitor
to track network traffic.
Let’s go back to the example above where frontend-service
in the tenant-a
namespace is unsuccessfully trying to access backend-service
in tenant-b
as a network policy drops this traffic. We can track this, using cilium monitor --type drop
from the Cilium agent shell.
Let’s get the agent’s name first:
While I retry a curl
from frontend-service/tenant-a
:
I can see on my agent this particular flow and why the packet was dropped (Policy denied
):
I can get even more insight by filtering based on the endpoint ID
(an identifier that uniquely represents an object in Cilium and can be found via kubectl get cep
). Note how we can also see the DNS requests to core-dns (192.168.1.194:53
):
Let’s now look at a successful connection between frontend-service
and backend-service
in the tenant-a
namespace. In this instance, you can see how the Network Policy allows the traffic through. You can also see a successful TCP 3-way handshake.
Kubeproxy Replacement
One of the additional benefits of using Cilium is its extremely efficient data plane. It’s particularly useful at scale, as the standard kube-proxy is based on a technology – iptables – that was never designed with the churn and the scale of large Kubernetes clusters.
There have been many presentations, benchmarks and case studies that explain the performance benefits of moving away from iptables to Cilium’s eBPF kube-proxy replacement so I will keep this outside the scope of this blog post.
But still, you might want to check whether you might see any iptables rules created when using Cilium.
For this test, I used a script to create 100 Kubernetes Services on my cluster. When I did this test on a standard Kubernetes cluster (as documented on my personal blog a few months ago), I got over 400 rules created.
Once you start having thousands of services, the table becomes huge and latency will be introduced as any incoming packet has to match an iptables rule: the table has to be linearly traversed so a packet that matches a rule at the end of a table with thousands of entries could encounter significant delay.
If you were to use Azure CNI Powered by Cilium instead, you benefit from Cilium’s eBPF-based kube-proxy replacement. The iptables configuration is simply much shorter and doesn’t increase when you add services.
Conclusion
Hopefully this post gave you a good overview of how and why you would use Azure CNI Powered by Cilium, while presenting you with the various networking options offered with AKS.
If you have any feedback on the solution, please share it with us. You’ll find us on the Cilium Slack channel.
Learn More
- Learn more about Cilium
- Request a demo of Isovalent Enterprise for Cilium
- All you need to know about Isovalent, Cilium and Microsoft
- Azure CNI Powered by Cilium Announcement (Microsoft announcement)
- Configure Azure CNI Powered by Cilium in Azure Kubernetes Service (AKS) (Azure Documentation)
- Configure Azure CNI Overlay networking in Azure Kubernetes Service (AKS) (Azure CNI Overlay Documentation)
- AKS Networking Update (excellent video by John Savill)
Prior to joining Isovalent, Nico worked in many different roles—operations and support, design and architecture, and technical pre-sales—at companies such as HashiCorp, VMware, and Cisco.
In his current role, Nico focuses primarily on creating content to make networking a more approachable field and regularly speaks at events like KubeCon, VMworld, and Cisco Live.