IP address planning is not unlike urban planning (a field in which I’ve got decades of experience): Tokyo wasn’t designed for 40 million residents but regardless, it is now the largest metropolis on Earth and accommodate a huge population through compact housing and vertical expansion.
But let’s be frank: subnetting is nowhere near as much fun as playing SimCity. It’s probably one of the least enjoyable aspects of network architecture. Selecting a small subnet size and you might quickly run out of IPs to assign to your workloads. Settling on a network too big and you might end up with not enough prefixes to allocate across all your environments.
The tedium of subnet planning is not just reserved for traditional networks: it is something we also feel in Kubernetes. Platform engineers often underestimate how popular their platform quickly can become and run into IP address exhaustion issues as developers embrace Kubernetes and micro-services are deployed at scale.
Yet, just like Tokyo found smart urban strategies to expand, Cilium also offers multiple methods to overcome IP address exhaustion.
In this blog post, we will start by recapitulating the multiple methods available to assign IP addresses to Kubernetes entities, explain the differences between each of them and walk through how Cilium can give you that little extra breathing space when clusters get tight.
IP Address Management in Kubernetes
In traditional networking, a DHCP server allocates IP addresses to a server as it comes online (unless static addressing is used).
While the Kubernetes nodes would typically receive their IP address over DHCP, we do not use it for pods. Instead, we refer to IPAM (IP Address Management) the process of assigning an IP to pods and services within Kubernetes.
In Kubernetes, the Container Network Interface plugin (CNI) is often responsible for assigning an IP address to your pod.
When a new Pod is added in Kubernetes, it is first assigned to a node via the Kubernetes Scheduler. The Kubelet running on that node will then be notified of the new pod assigned to it and needs to take action to create the containers implementing the pod manifest.
For the networking part, the Kubelet uses the CNI. The first step is for the Kubelet to check the CNI configuration on the node, which is located in /etc/cni/net.d/
. When Cilium is installed on the cluster, each Cilium agent creates a configuration file in /etc/cni/net.d/05-cilium.conf
which instructs Kubelet on how to configure Pod networking.
The IP assigned to a pod comes from a subnet referred to as PodCIDR. You might remember the concept of CIDR – Classless Inter-Domain Routing – from doing subnet calculations. In Kubernetes, just think of a PodCIDR as the subnet from which the pod receives an IP address.
Cilium supports multiple IPAM modes. Some are cloud provider-specific while some can be used in any environment. In this tutorial, we will cover the following IPAM modes:
- Kubernetes Host Scope
- Cluster Scope (Default)
- Multi-Pool (Beta)
- AWS ENI (without and with Prefix Delegation)
- LoadBalancer Service IPAM
- Egress Gateway IPAM
Kubernetes-host Scope IPAM Mode
The Kubernetes Host Scope IPAM is probably the most simple IPAM option for a generic Kubernetes cluster, so we’ll start with this one.
When using the Kubernetes Host Scope, Cilium relies on CIDRs already allocated to the Node
Kubernetes resources by the Kubernetes Controller Manager.
In this mode, a single large cluster-wide prefix is assigned to a cluster, and Kubernetes carves out a subnet of that prefix for every node. Cilium would then assign IP addresses from each subnet.
Let’s take a look in a cluster with Cilium deployed in this mode :
Cilium will use the PodCIDRs associated with each node (using the Node
resources) and assign IPs from these subnets to the pods started on the nodes. In our cluster, the kind-worker
node is assigned the 10.244.1.0/24
PodCIDR.
Executing cilium-dbg status
in a Cilium agent will let you check how many IP addresses were assigned out of that prefix (note that the Cilium CLI running on the agent is different from the Cilium CNI binary typically used for installation):
Let’s deploy a pod. It is scheduled on kind-worker2
and receives an IP address from the prefix assigned to that node.
While this mode is simple, it is inflexible. With it, it is not possible to configure the size of the CIDRs allocated to each node, nor is it possible to add additional CIDRs to the cluster or to individual nodes, making precise IP address planning crucial prior to cluster deployment.
Cluster Scope IPAM Mode
The Cluster Scope mode works in a similar way to the Kubernetes Host Scope mode, but Cilium allocates pod CIDRs to the nodes itself (instead of Kube Controller Manager). As it doesn’t require a specific configuration of the Kubernetes cluster, this mode is the default option in Cilium.
In this mode, the Cilium Operator is in charge of allocating pod CIDRs to each node. It uses the CiliumNode
resources instead of the Node
resources to achieve this (this avoids potential clashes with CIDRs assigned by the Kubernetes Controller Manager).
By default, Cilium uses the 10.0.0.0/8
CIDR for pods. Let’s verify in a cluster deployed in Cluster Scope mode (which is sometimes referred to as “Cluster-Pool”):
You’ll also notice the cluster-pool-ipv4-mask-size
parameter, which is set to 24
. This means that Cilium will split the cluster CIDR into /24
subnets and assign one to each node in the cluster by adding it to the spec.ipam.podCIDRs
field in their respective CiliumNode
resources.
Let’s list the CIDRs associated to each node:
Notice that they are the first 3 /24
subnets derived from the 10.0.0.0/8
CIDR.
Let’s deploy a pod and check the IPAM status on the Cilium agent located on the same node as the pod.
The node is now using at least 3 IPs, which we can review using cilium-dbg status --verbose
, under the “Allocated addresses” section:
Three IPs have been taken from the node’s Pod CIDR range:
- the IP of the new pod (
default/netshoot
) - the internal IP of the node (
router
) - the health IP of the node (
health
)
One advantage Cluster Scope IPAM has over Kubernetes IPAM is that we can allocate multiple CIDRs to the cluster. This can provide more flexibility, albeit it doesn’t necessarily overcome IP address exhaustions.
Let’s deploy Cilium in Cluster Scope IPAM with two small CIDRs assigned to the cluster to illustrate what happens when they get exhausted. Let’s use the values.yaml
below for Cilium’s starting configuration:
Since we’ve specified 29
as the mask size, each node will receive a /29
subnet, which corresponds to 6 usable IPs (8 IPs, minus 2 IPs reserved for Cilium Host and Cilium Health interfaces).
Let’s install Cilium in this mode and verify the settings:
Since the CIDRs we passed are /28
and we’re allocating one /29
per node, only two nodes can consume IPs from the first CIDR block. For this reason, the third node gets a subnet from the second block, which is why we see three subnets configured.
Let’s create a deployment to deploy 10 pods over the cluster:
Notice that some pods did not get an IP address and are now stuck at creation. When looking at the pod logs, we can see there are no available IP addresses:
The kubelet on the node is trying to retrieve IPs from the Cilium CNI plugin to assign to the new pod but failing to do so. The CNI plugin is replying that the range is full. We can verify on the node that all IP addresses have been allocated:
This mode is easy to understand and lets you assign multiple IP ranges per cluster but comes with some limitations: 1) it doesn’t let you add PodCIDRs dynamically to a cluster and 2) it doesn’t provide as much control as users might like on how Pods receive their IP addresses from.
Let’s explore the mode that can address these limitations.
Multi-Pool IPAM Mode
The Multi-Pool mode is the most recent (it was introduced with Cilium 1.14) and the most flexible one. It supports allocating PodCIDRs from multiple different IPAM pools, depending on properties of the workload defined by the user, e.g., annotations. Pods on the same node can receive IP addresses from various ranges. In addition, PodCIDRs can be dynamically added to a node as and when needed.
Let’s try it. My cluster is deployed with Cilium in Multi-Pool mode and set to automatically create a cluster-wide pool at startup, split into /27
-large pools as and when needed.
Let’s verify the pool was deployed at start-up:
Let’s also deploy another Pod IP pool called mars
:
One of the advantages of Multi-Pool is that it gives us more control on how we allocate IP addresses to Pods. We can force a pod – or all pods in a particular namespace – to pick up IPs from a particular group.
Let’s create two deployments with two pods each. One will use IP addresses from the default pool while the other one will receive IPs from the mars
pool:
Let’s verify:
Before we scale and observe what happens, let’s look at our initial pools. Both nodes have a /27
from each broader pool.
Our two /27
are only enough for 64 pods so scaling up the mars
deployment to 70 pods causes Cilium to react:
Note the needed
field above: as the number of required IPs significantly increased, another /27
CIDR from the cluster-wide 10.20.0.0/16
was allocated to each Cilium Node.
Multi-Pool works nicely with the Multi-Network feature ; giving users the ability to connect a Pod to multiple user interfaces.
Multi-Network Lab
Learn how you can connect Kubernetes Pods to multiple interfaces with Isovalent Enterprise for Cilium 1.14 multi-network!
Start Lab!IPAM ENI Mode & Prefix Delegation for AWS EKS
As highlighted earlier, when Cilium is providing networking and security services for a managed Kubernetes service, it sometimes includes a specific IP Address Management mode.
Let’s focus on AWS EKS (Elastic Kubernetes Services).
IPAM ENI Mode
In this mode, IP allocation is based on IPs of AWS Elastic Network Interfaces (ENI).
Each node creates a CiliumNode
custom resource when Cilium starts up for the first time on that node. It contacts the EC2 metadata API to retrieve the instance ID, instance type, and VPC information, then it populates the custom resource with this information.
Let’s take a look at an EKS cluster with Cilium:
Let’s deploy the Star Wars-inspired demo app, made up of 4 pods. All pods receive an IP address as you would expect.
The pod IPs are actually taken from our EC2 instance’s network interface and listed as a secondary private IPs, as you can see on the AWS console.
This provides a clear benefit: pods receive ENI IPs that are directly routable in the AWS VPC. There’s no NAT required and it makes networking and observability easier for operators. We can also see the IPs being populated on the Cilium Node resource.
However, there’s a finite amount of Pod IPs per instance, which depends on the type of EC2 model used.
Let’s check for M5 models:
My EKS nodegroup is made up of 2 m5.large
instances. This type of instance only supports up to 3 network interfaces and 10 IPs per interface. We quickly run out of IPs when we scale our deployment to 50
Deathstars across the 2-node cluster:
The logs on the Cilium operator confirm our suspicions:
How can we get over the limit, without having to scale horizontally with more nodes?
Prefix Delegation
With Prefix Delegation, we can assign a private CIDR range to our network interface, which is then used to attribute IP addresses to pods.
A prefix still counts towards the limit of IPs – so, for example, assigning a prefix (typically, a /28) to an interface on my m5.large
would count as one of the 10 IPs I can assign. Still – going from 10 individual IP addresses to 10 /28 prefixes (up to 160 IP addresses) would alleviate many IP addressing concerns.
Let’s try it. Prefix Delegation only works for new nodes that join our cluster so we have to add a new node group to our cluster in this particular instance.
Scaling to 150 pods – which would have failed in the standard ENI mode for a 4-cluster node – is now seamless.
All my pods have received an IP address and are functioning. This time though, there’s no secondary private IPv4 addresses attached to my node network interface. Instead, prefixes have been automatically assigned to it:
My teammate Amit provided a much more detailed walkthrough in his personal blog post on Medium – read it for a more thorough tutorial.
IPAM for LoadBalancer and Egress
While the focus of this blog post is around assigning IP addresses for pods, we ought to cover some additional IP Address Management capabilities that are native to Cilium or to our enterprise edition.
LoadBalancer IPAM
We covered this feature in detail at launch with Cilium 1.13 but as a reminder: Cilium can assign IP addresses to Kubernetes Services of the type LoadBalancer. Often used with Ingress/Gateway API, these services expose applications running inside the cluster to the outside world.
Many users would have typically used MetalLB for this purpose but with Cilium offering this functionality natively, we see many users migrating from MetalLB to Cilium. You can find more about the migration process in our tutorial below:
Migrating from MetalLB to Cilium
In this blog post, you will learn how to migrate from MetalLB to Cilium for local service advertisement over Layer 2.
Start ReadingEgress IPAM
Enterprise only
Egress Gateway is a popular Cilium functionality that enables users to set a predictable IP for traffic leaving the cluster. Often used alongside external firewalls, this feature ensures traffic for a specific namespace or pod exits via a specific node and that the source IP is masqueraded to one of the IP addresses of said node. This would have typically leveraged secondary IP interfaces on the nodes, which comes with a manual burden to configure and manage the nodes themselves.
With a recent update to Egress Gateway in Isovalent Enterprise for Cilium, users get additional control of the IP addressing distributed to their Egress Gateway nodes.
The IPAM feature allows you to specify an IP pool in the IsovalentEgressGatewayPolicy
from which Cilium leases egress IPs and assigns them to the selected egress interfaces.
Combine it with BGP support for Egress to advertise the IP address range to the rest of the network, to enable the return traffic to come back safely to the cluster.
Final Thoughts
I hope this tutorial clarifies all the different ways Cilium can assign IP addresses to your Kubernetes entities. I have not mentioned the most obvious way to overcome IPv4 IP address exhaustion: IPv6! If you’d like to learn more about it, read this IPv6 tutorial with Cilium.
Thanks for reading.
Prior to joining Isovalent, Nico worked in many different roles—operations and support, design and architecture, and technical pre-sales—at companies such as HashiCorp, VMware, and Cisco.
In his current role, Nico focuses primarily on creating content to make networking a more approachable field and regularly speaks at events like KubeCon, VMworld, and Cisco Live.