Back to blog

Overcoming Kubernetes IP Address Exhaustion with Cilium

Nico Vibert
Nico Vibert
Published: Updated: Cilium
Overcoming Kubernetes IP Address Exhaustion with Cilium

IP address planning is not unlike urban planning (a field in which I’ve got decades of experience): Tokyo wasn’t designed for 40 million residents but regardless, it is now the largest metropolis on Earth and accommodate a huge population through compact housing and vertical expansion.

But let’s be frank: subnetting is nowhere near as much fun as playing SimCity. It’s probably one of the least enjoyable aspects of network architecture. Selecting a small subnet size and you might quickly run out of IPs to assign to your workloads. Settling on a network too big and you might end up with not enough prefixes to allocate across all your environments.

The tedium of subnet planning is not just reserved for traditional networks: it is something we also feel in Kubernetes. Platform engineers often underestimate how popular their platform quickly can become and run into IP address exhaustion issues as developers embrace Kubernetes and micro-services are deployed at scale.

Yet, just like Tokyo found smart urban strategies to expand, Cilium also offers multiple methods to overcome IP address exhaustion.

In this blog post, we will start by recapitulating the multiple methods available to assign IP addresses to Kubernetes entities, explain the differences between each of them and walk through how Cilium can give you that little extra breathing space when clusters get tight.

IP Address Management in Kubernetes

In traditional networking, a DHCP server allocates IP addresses to a server as it comes online (unless static addressing is used).

While the Kubernetes nodes would typically receive their IP address over DHCP, we do not use it for pods. Instead, we refer to IPAM (IP Address Management) the process of assigning an IP to pods and services within Kubernetes.

In Kubernetes, the Container Network Interface plugin (CNI) is often responsible for assigning an IP address to your pod.

Cilium CNI

When a new Pod is added in Kubernetes, it is first assigned to a node via the Kubernetes Scheduler. The Kubelet running on that node will then be notified of the new pod assigned to it and needs to take action to create the containers implementing the pod manifest.

For the networking part, the Kubelet uses the CNI. The first step is for the Kubelet to check the CNI configuration on the node, which is located in /etc/cni/net.d/. When Cilium is installed on the cluster, each Cilium agent creates a configuration file in /etc/cni/net.d/05-cilium.conf which instructs Kubelet on how to configure Pod networking.

The IP assigned to a pod comes from a subnet referred to as PodCIDR. You might remember the concept of CIDR – Classless Inter-Domain Routing – from doing subnet calculations. In Kubernetes, just think of a PodCIDR as the subnet from which the pod receives an IP address.

Cilium supports multiple IPAM modes. Some are cloud provider-specific while some can be used in any environment. In this tutorial, we will cover the following IPAM modes:

  • Kubernetes Host Scope
  • Cluster Scope (Default)
  • Multi-Pool (Beta)
  • AWS ENI (without and with Prefix Delegation)
  • LoadBalancer Service IPAM
  • Egress Gateway IPAM

Kubernetes-host Scope IPAM Mode

The Kubernetes Host Scope IPAM is probably the most simple IPAM option for a generic Kubernetes cluster, so we’ll start with this one.

When using the Kubernetes Host Scope, Cilium relies on CIDRs already allocated to the Node Kubernetes resources by the Kubernetes Controller Manager.

Kubernetes Host Scope IPAM

In this mode, a single large cluster-wide prefix is assigned to a cluster, and Kubernetes carves out a subnet of that prefix for every node. Cilium would then assign IP addresses from each subnet.

Let’s take a look in a cluster with Cilium deployed in this mode :

$ cilium config view | grep ipam
ipam                                       kubernetes

Cilium will use the PodCIDRs associated with each node (using the Node resources) and assign IPs from these subnets to the pods started on the nodes. In our cluster, the kind-worker node is assigned the 10.244.1.0/24 PodCIDR.

$ kubectl get ciliumnode kind-worker -o jsonpath='{.spec.ipam}'
{"podCIDRs":["10.244.1.0/24"]}

Executing cilium-dbg status in a Cilium agent will let you check how many IP addresses were assigned out of that prefix (note that the Cilium CLI running on the agent is different from the Cilium CNI binary typically used for installation):

$ WORKER_CILIUM_POD=$(kubectl -n kube-system get po -l k8s-app=cilium --field-selector spec.nodeName=kind-worker -o name)
$ echo $WORKER_CILIUM_POD
pod/cilium-7krnk
$ kubectl -n kube-system exec -ti $WORKER_CILIUM_POD -c cilium-agent -- cilium-dbg status | grep IPAM
IPAM:                    IPv4: 5/254 allocated from 10.244.1.0/24

Let’s deploy a pod. It is scheduled on kind-worker2 and receives an IP address from the prefix assigned to that node.

$ kubectl run netshoot --image nicolaka/netshoot --command "sleep" "infinite"
pod/netshoot created
$ kubectl get pod netshoot -o wide
NAME       READY   STATUS    RESTARTS   AGE   IP            NODE           NOMINATED NODE   READINESS GATES
netshoot   1/1     Running   0          19s   10.244.2.39   kind-worker2   <none>           <none>
$ kubectl get ciliumnode kind-worker2 -o jsonpath='{.spec.ipam}'
{"podCIDRs":["10.244.2.0/24"]}

While this mode is simple, it is inflexible. With it, it is not possible to configure the size of the CIDRs allocated to each node, nor is it possible to add additional CIDRs to the cluster or to individual nodes, making precise IP address planning crucial prior to cluster deployment.

Cluster Scope IPAM Mode

The Cluster Scope mode works in a similar way to the Kubernetes Host Scope mode, but Cilium allocates pod CIDRs to the nodes itself (instead of Kube Controller Manager). As it doesn’t require a specific configuration of the Kubernetes cluster, this mode is the default option in Cilium.

In this mode, the Cilium Operator is in charge of allocating pod CIDRs to each node. It uses the CiliumNode resources instead of the Node resources to achieve this (this avoids potential clashes with CIDRs assigned by the Kubernetes Controller Manager).

Cluster Scope IPAM Mode

By default, Cilium uses the 10.0.0.0/8 CIDR for pods. Let’s verify in a cluster deployed in Cluster Scope mode (which is sometimes referred to as “Cluster-Pool”):

$ cilium config view | grep cluster-pool
cluster-pool-ipv4-cidr                     10.0.0.0/8
cluster-pool-ipv4-mask-size                24
ipam                                       cluster-pool

You’ll also notice the cluster-pool-ipv4-mask-size parameter, which is set to 24. This means that Cilium will split the cluster CIDR into /24 subnets and assign one to each node in the cluster by adding it to the spec.ipam.podCIDRs field in their respective CiliumNode resources.

Let’s list the CIDRs associated to each node:

$ kubectl get ciliumnode -o jsonpath='{range .items[*]}{.metadata.name} {.spec.ipam.podCIDRs[]}{"\n"}{end}' | column -t
kind-control-plane  10.0.0.0/24
kind-worker         10.0.1.0/24
kind-worker2        10.0.2.0/24

Notice that they are the first 3 /24 subnets derived from the 10.0.0.0/8 CIDR.

Let’s deploy a pod and check the IPAM status on the Cilium agent located on the same node as the pod.

$ kubectl run netshoot --image nicolaka/netshoot --command "sleep" "infinite"
pod/netshoot created
$ NETSHOOT_NODE=$(kubectl get po netshoot -o jsonpath='{.spec.nodeName}')
$ echo $NETSHOOT_NODE
kind-worker
$ NETSHOOT_CILIUM_POD=$(kubectl -n kube-system get po -l k8s-app=cilium --field-selector spec.nodeName=$NETSHOOT_NODE -o name)
$ echo $NETSHOOT_CILIUM_POD
pod/cilium-7v9mq
$ kubectl -n kube-system exec -ti $NETSHOOT_CILIUM_POD -c cilium-agent -- cilium-dbg status | grep IPAM
IPAM:                    IPv4: 3/254 allocated from 10.0.1.0/24, 

The node is now using at least 3 IPs, which we can review using cilium-dbg status --verbose, under the “Allocated addresses” section:

$ kubectl -n kube-system exec -ti $NETSHOOT_CILIUM_POD -c cilium-agent -- cilium-dbg status --verbose | grep -A6 "Allocated"
Allocated addresses:
  10.0.1.165 (default/netshoot)
  10.0.1.174 (router)
  10.0.1.227 (health)
BandwidthManager:       Disabled
Host Routing:           Legacy
Masquerading:           IPTables [IPv4: Enabled, IPv6: Disabled]

Three IPs have been taken from the node’s Pod CIDR range:

  • the IP of the new pod (default/netshoot)
  • the internal IP of the node (router)
  • the health IP of the node (health)

One advantage Cluster Scope IPAM has over Kubernetes IPAM is that we can allocate multiple CIDRs to the cluster. This can provide more flexibility, albeit it doesn’t necessarily overcome IP address exhaustions.

Let’s deploy Cilium in Cluster Scope IPAM with two small CIDRs assigned to the cluster to illustrate what happens when they get exhausted. Let’s use the values.yaml below for Cilium’s starting configuration:

ipam:
  mode: cluster-pool
  operator:
    clusterPoolIPv4MaskSize: '29'
    clusterPoolIPv4PodCIDRList:
      - 10.0.42.0/28
      - 10.0.84.0/28

Since we’ve specified 29 as the mask size, each node will receive a /29 subnet, which corresponds to 6 usable IPs (8 IPs, minus 2 IPs reserved for Cilium Host and Cilium Health interfaces).

Let’s install Cilium in this mode and verify the settings:

$ cilium config view | grep cluster-pool
cluster-pool-ipv4-cidr                     10.0.42.0/28 10.0.84.0/28
cluster-pool-ipv4-mask-size                29
ipam                                       cluster-pool
$ kubectl get ciliumnode -o jsonpath='{range .items[*]}{.metadata.name} {.spec.ipam.podCIDRs[*]}{"\n"}{end}' | column -t
kind-control-plane  10.0.42.0/29
kind-worker         10.0.84.0/29
kind-worker2        10.0.42.8/29

Since the CIDRs we passed are /28 and we’re allocating one /29 per node, only two nodes can consume IPs from the first CIDR block. For this reason, the third node gets a subnet from the second block, which is why we see three subnets configured.

Let’s create a deployment to deploy 10 pods over the cluster:

$ kubectl create deployment netshoot --image nginx --replicas 10
deployment.apps/netshoot created
$ kubectl get po -o wide
NAME                        READY   STATUS              RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
netshoot-6748849bf8-6bn9j   1/1     Running             0          60s   10.0.42.13   kind-worker2   <none>           <none>
netshoot-6748849bf8-79hr9   1/1     Running             0          60s   10.0.84.5    kind-worker    <none>           <none>
netshoot-6748849bf8-b97rn   1/1     Running             0          60s   10.0.42.12   kind-worker2   <none>           <none>
netshoot-6748849bf8-dzwvq   0/1     ContainerCreating   0          60s   <none>       kind-worker2   <none>           <none>
netshoot-6748849bf8-gcnz7   1/1     Running             0          60s   10.0.84.2    kind-worker    <none>           <none>
netshoot-6748849bf8-k9gcg   1/1     Running             0          60s   10.0.84.6    kind-worker    <none>           <none>
netshoot-6748849bf8-p9xss   1/1     Running             0          60s   10.0.84.4    kind-worker    <none>           <none>
netshoot-6748849bf8-r6q9l   1/1     Running             0          60s   10.0.42.11   kind-worker2   <none>           <none>
netshoot-6748849bf8-tgldt   1/1     Running             0          60s   10.0.42.14   kind-worker2   <none>           <none>
netshoot-6748849bf8-vzg6s   0/1     ContainerCreating   0          60s   <none>       kind-worker    <none>           <none>

Notice that some pods did not get an IP address and are now stuck at creation. When looking at the pod logs, we can see there are no available IP addresses:

Warning  FailedCreatePodSandBox  1s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b851fbfc4a052dc64fd53c6bafc567964e19acea2b6866c30430515626fdb53e": plugin type="cilium-cni" name="cilium" failed (add): unable to allocate IP via local cilium agent: [POST /ipam][502] postIpamFailure  range is full

The kubelet on the node is trying to retrieve IPs from the Cilium CNI plugin to assign to the new pod but failing to do so. The CNI plugin is replying that the range is full. We can verify on the node that all IP addresses have been allocated:

$ kubectl -n kube-system exec -ti cilium-7lzbb -c cilium-agent -- cilium-dbg status | grep IPAM
IPAM:                    IPv4: 6/6 allocated from 10.0.42.8/29,

This mode is easy to understand and lets you assign multiple IP ranges per cluster but comes with some limitations: 1) it doesn’t let you add PodCIDRs dynamically to a cluster and 2) it doesn’t provide as much control as users might like on how Pods receive their IP addresses from.

Let’s explore the mode that can address these limitations.

Multi-Pool IPAM Mode

The Multi-Pool mode is the most recent (it was introduced with Cilium 1.14) and the most flexible one. It supports allocating PodCIDRs from multiple different IPAM pools, depending on properties of the workload defined by the user, e.g., annotations. Pods on the same node can receive IP addresses from various ranges. In addition, PodCIDRs can be dynamically added to a node as and when needed.

Multi-Pool IPAM Mode

Let’s try it. My cluster is deployed with Cilium in Multi-Pool mode and set to automatically create a cluster-wide pool at startup, split into /27-large pools as and when needed.

$ cilium config view | grep multi-pool
ipam                                              multi-pool
$ cilium config view | grep auto-create
auto-create-cilium-pod-ip-pools                   default=ipv4-cidrs:10.10.0.0/16;ipv4-mask-size:27

Let’s verify the pool was deployed at start-up:

$ kubectl get ciliumpodippools.cilium.io default -o yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumPodIPPool
metadata:
  creationTimestamp: "2024-12-18T16:41:42Z"
  generation: 1
  name: default
  resourceVersion: "2270"
  uid: cd576280-11eb-45ac-83d2-03f230d3e252
spec:
  ipv4:
    cidrs:
    - 10.10.0.0/16
    maskSize: 27

Let’s also deploy another Pod IP pool called mars:

$ cat <<EOF | kubectl apply -f -
apiVersion: cilium.io/v2alpha1
kind: CiliumPodIPPool
metadata:
  name: mars
spec:
  ipv4:
    cidrs:
    - 10.20.0.0/16
    maskSize: 27
EOF
ciliumpodippool.cilium.io/mars created

One of the advantages of Multi-Pool is that it gives us more control on how we allocate IP addresses to Pods. We can force a pod – or all pods in a particular namespace – to pick up IPs from a particular group.

Let’s create two deployments with two pods each. One will use IP addresses from the default pool while the other one will receive IPs from the mars pool:

$ cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-default
spec:
  selector:
    matchLabels:
      app: nginx-default
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx-default
    spec:
      containers:
      - name: nginx
        image: nginx:1.25.1
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-mars
spec:
  selector:
    matchLabels:
      app: nginx-mars
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx-mars
      annotations:
        ipam.cilium.io/ip-pool: mars
    spec:
      containers:
      - name: nginx
        image: nginx:1.25.1
        ports:
        - containerPort: 80
EOF
deployment.apps/nginx-default created
deployment.apps/nginx-mars created

Let’s verify:

$ kubectl get pods -o wide
NAME                             READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
nginx-default-7f687944f7-64nm7   1/1     Running   0          2m53s   10.10.0.88   kind-worker    <none>           <none>
nginx-default-7f687944f7-p4scr   1/1     Running   0          2m53s   10.10.0.54   kind-worker2   <none>           <none>
nginx-mars-58bb784756-hpsq7      1/1     Running   0          2m53s   10.20.0.42   kind-worker    <none>           <none>
nginx-mars-58bb784756-pz9v7      1/1     Running   0          2m53s   10.20.0.29   kind-worker2   <none>           <none>

Before we scale and observe what happens, let’s look at our initial pools. Both nodes have a /27 from each broader pool.

$ kubectl get ciliumnodes kind-worker -o yaml | yq .spec.ipam.pools
allocated:
  - cidrs:
      - 10.10.0.64/27
    pool: default
  - cidrs:
      - 10.20.0.32/27
    pool: mars
requested:
  - needed:
      ipv4-addrs: 16
    pool: default
  - needed:
      ipv4-addrs: 1
    pool: mars
$ kubectl get ciliumnodes kind-worker2 -o yaml | yq .spec.ipam.pools
allocated:
  - cidrs:
      - 10.10.0.32/27
    pool: default
  - cidrs:
      - 10.20.0.0/27
    pool: mars
requested:
  - needed:
      ipv4-addrs: 16
    pool: default
  - needed:
      ipv4-addrs: 1
    pool: mars

Our two /27 are only enough for 64 pods so scaling up the mars deployment to 70 pods causes Cilium to react:

$ kubectl scale deployment nginx-mars --replicas=70
deployment.apps/nginx-mars scaled
$ kubectl get ciliumnodes kind-worker -o yaml | yq .spec.ipam.pools
allocated:
  - cidrs:
      - 10.10.0.64/27
    pool: default
  - cidrs:
      - 10.20.0.32/27
      - 10.20.0.64/27
    pool: mars
requested:
  - needed:
      ipv4-addrs: 16
    pool: default
  - needed:
      ipv4-addrs: 35
    pool: mars
$ kubectl get ciliumnodes kind-worker2 -o yaml | yq .spec.ipam.pools
allocated:
  - cidrs:
      - 10.10.0.32/27
    pool: default
  - cidrs:
      - 10.20.0.0/27
      - 10.20.0.96/27
    pool: mars
requested:
  - needed:
      ipv4-addrs: 16
    pool: default
  - needed:
      ipv4-addrs: 35
    pool: mars

Note the needed field above: as the number of required IPs significantly increased, another /27 CIDR from the cluster-wide 10.20.0.0/16 was allocated to each Cilium Node.

Multi-Pool works nicely with the Multi-Network feature ; giving users the ability to connect a Pod to multiple user interfaces.

Multi-Network Lab

Learn how you can connect Kubernetes Pods to multiple interfaces with Isovalent Enterprise for Cilium 1.14 multi-network!

Start Lab!

IPAM ENI Mode & Prefix Delegation for AWS EKS

As highlighted earlier, when Cilium is providing networking and security services for a managed Kubernetes service, it sometimes includes a specific IP Address Management mode.

Let’s focus on AWS EKS (Elastic Kubernetes Services).

IPAM ENI Mode

In this mode, IP allocation is based on IPs of AWS Elastic Network Interfaces (ENI).

IPAM ENI Mode

Each node creates a CiliumNode custom resource when Cilium starts up for the first time on that node. It contacts the EC2 metadata API to retrieve the instance ID, instance type, and VPC information, then it populates the custom resource with this information.

Let’s take a look at an EKS cluster with Cilium:

$ cilium config view | grep ipam
ipam                                              eni
ipam-cilium-node-update-rate                      15s

Let’s deploy the Star Wars-inspired demo app, made up of 4 pods. All pods receive an IP address as you would expect.

$ kubectl create -f https://raw.githubusercontent.com/cilium/cilium/1.16.5/examples/minikube/http-sw-app.yaml
service/deathstar created
deployment.apps/deathstar created
pod/tiefighter created
pod/xwing created
$ kubectl get pods -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP                NODE                                           NOMINATED NODE   READINESS GATES
deathstar-689f66b57d-bmlst   1/1     Running   0          97s   192.168.158.231   ip-192-168-132-54.eu-west-1.compute.internal   <none>           <none>
deathstar-689f66b57d-px2p6   1/1     Running   0          97s   192.168.164.141   ip-192-168-183-94.eu-west-1.compute.internal   <none>           <none>
tiefighter                   1/1     Running   0          97s   192.168.159.145   ip-192-168-132-54.eu-west-1.compute.internal   <none>           <none>
xwing                        1/1     Running   0          97s   192.168.151.7     ip-192-168-132-54.eu-west-1.compute.internal   <none>           <none>

The pod IPs are actually taken from our EC2 instance’s network interface and listed as a secondary private IPs, as you can see on the AWS console.

IPAM ENI

This provides a clear benefit: pods receive ENI IPs that are directly routable in the AWS VPC. There’s no NAT required and it makes networking and observability easier for operators. We can also see the IPs being populated on the Cilium Node resource.

$ kubectl get cn ip-192-168-132-54.eu-west-1.compute.internal -o yaml | yq .status.eni     
enis:
  eni-029baffa2438ce3a9:
    addresses:
      - 192.168.157.182
      - 192.168.158.68
      - 192.168.145.21
      - 192.168.152.182
    description: Cilium-CNI (i-0a037536e1a597052)
    id: eni-029baffa2438ce3a9
    ip: 192.168.145.216
    mac: 02:3b:a3:ca:34:31
    number: 1
    security-groups:
      - sg-01719c628e359ba72
    subnet:
      cidr: 192.168.128.0/19
      id: subnet-0906c59ade9ad366f
    tags:
      io.cilium/cilium-managed: "true"
      io.cilium/cluster-name: nvibert-9739-eu-west-1-eksctl-io
    vpc:
      id: vpc-09e981838492247ce
      primary-cidr: 192.168.0.0/16
  eni-083218a1ee1c4bd1f:
    addresses:
      - 192.168.129.40
      - 192.168.144.184
      - 192.168.157.188
      - 192.168.143.60
      - 192.168.151.110
      - 192.168.153.78
      - 192.168.159.145
      - 192.168.151.7
      - 192.168.158.231
    id: eni-083218a1ee1c4bd1f
    ip: 192.168.132.54
    mac: 02:0d:2d:77:50:fb
    security-groups:
      - sg-01719c628e359ba72
    subnet:
      cidr: 192.168.128.0/19
      id: subnet-0906c59ade9ad366f
    tags:
      Name: nvibert-9739-ng-1-Node
      alpha.eksctl.io/nodegroup-name: ng-1
      alpha.eksctl.io/nodegroup-type: managed
      cluster.k8s.amazonaws.com/name: nvibert-9739
      eks:cluster-name: nvibert-9739
      eks:nodegroup-name: ng-1
      node.k8s.amazonaws.com/instance_id: i-0a037536e1a597052
    vpc:
      id: vpc-09e981838492247ce
      primary-cidr: 192.168.0.0/16

However, there’s a finite amount of Pod IPs per instance, which depends on the type of EC2 model used.

Let’s check for M5 models:

$ aws ec2 describe-instance-types \
    --filters "Name=instance-type,Values=m5.*" \
    --query "InstanceTypes[].{ \
        Type: InstanceType, \
        MaxENI: NetworkInfo.MaximumNetworkInterfaces, \
        IPv4addr: NetworkInfo.Ipv4AddressesPerInterface}" \
    --output table
---------------------------------------
|        DescribeInstanceTypes        |
+----------+----------+---------------+
| IPv4addr | MaxENI   |     Type      |
+----------+----------+---------------+
|  30      |  8       |  m5.8xlarge   |
|  50      |  15      |  m5.24xlarge  |
|  30      |  8       |  m5.12xlarge  |
|  50      |  15      |  m5.metal     |
|  15      |  4       |  m5.xlarge    |
|  30      |  8       |  m5.4xlarge   |
|  15      |  4       |  m5.2xlarge   |
|  10      |  3       |  m5.large     |
|  50      |  15      |  m5.16xlarge  |
+----------+----------+---------------+

My EKS nodegroup is made up of 2 m5.large instances. This type of instance only supports up to 3 network interfaces and 10 IPs per interface. We quickly run out of IPs when we scale our deployment to 50 Deathstars across the 2-node cluster:

$ kubectl scale deployment deathstar --replicas=50
deployment.apps/deathstar scaled
$ kubectl get pods | grep Pending                 
deathstar-689f66b57d-2zxt5   0/1     Pending             0          3s
deathstar-689f66b57d-79qfp   0/1     Pending             0          3s
deathstar-689f66b57d-pxhdg   0/1     Pending             0          3s

The logs on the Cilium operator confirm our suspicions:

$ kubectl logs -n kube-system cilium-operator-558888ffb5-mfmb2
time="2024-12-19T12:37:52Z" level=warning msg="Unable to assign additional IPs to interface, will create new interface" error="operation error EC2: AssignPrivateIpAddresses, https response error StatusCode: 400, RequestID: ddbdbf61-ccd4-4567-9fb2-a9f7f22c6d67, api error PrivateIpAddressLimitExceeded: Number of private addresses will exceed limit." instanceID=i-0a037536e1a597052 ipsToAllocate=5 name=ip-192-168-132-54.eu-west-1.compute.internal selectedInterface=eni-0e2c19f3b2da72c6c subsys=ipam
time="2024-12-19T12:37:52Z" level=warning msg="Instance is out of interfaces" instanceID=i-0a037536e1a597052 name=ip-192-168-132-54.eu-west-1.compute.internal subsys=ipam

How can we get over the limit, without having to scale horizontally with more nodes?

Prefix Delegation

With Prefix Delegation, we can assign a private CIDR range to our network interface, which is then used to attribute IP addresses to pods.

A prefix still counts towards the limit of IPs – so, for example, assigning a prefix (typically, a /28) to an interface on my m5.large would count as one of the 10 IPs I can assign. Still – going from 10 individual IP addresses to 10 /28 prefixes (up to 160 IP addresses) would alleviate many IP addressing concerns.

Let’s try it. Prefix Delegation only works for new nodes that join our cluster so we have to add a new node group to our cluster in this particular instance.

$ cilium config view | grep prefix          
aws-enable-prefix-delegation                      true
$ eksctl get nodegroup --cluster=nvibert-9739                                                             
CLUSTER         NODEGROUP       STATUS  CREATED                 MIN SIZE        MAX SIZE        DESIRED CAPACITY        INSTANCE TYPE   IMAGE ID                ASG NAME                                               TYPE
nvibert-9739    ng-1            ACTIVE  2024-12-19T09:53:16Z    2               2               2                       m5.large        AL2023_x86_64_STANDARD  eks-ng-1-aec9ef2a-f868-d5b4-d082-ddd3ab993a5f          managed
nvibert-9739    ng-234b44f9     ACTIVE  2024-12-19T15:06:27Z    2               2               2                       m5.large        AL2_x86_64              eks-ng-234b44f9-c4c9efba-561a-7f18-0155-da164147e6c6   managed
$ kubectl get nodes
NAME                                           STATUS   ROLES    AGE     VERSION
ip-192-168-132-54.eu-west-1.compute.internal   Ready    <none>   5h52m   v1.30.7-eks-59bf375
ip-192-168-183-94.eu-west-1.compute.internal   Ready    <none>   5h52m   v1.30.7-eks-59bf375
ip-192-168-53-45.eu-west-1.compute.internal    Ready    <none>   39m     v1.30.7-eks-59bf375
ip-192-168-82-2.eu-west-1.compute.internal     Ready    <none>   39m     v1.30.7-eks-59bf375

Scaling to 150 pods – which would have failed in the standard ENI mode for a 4-cluster node – is now seamless.

$ kubectl scale deployment deathstar --replicas=150                                                          
deployment.apps/deathstar scaled
$ kubectl get deployment deathstar  
NAME        READY     UP-TO-DATE   AVAILABLE   AGE
deathstar   150/150   150          150         29m

All my pods have received an IP address and are functioning. This time though, there’s no secondary private IPv4 addresses attached to my node network interface. Instead, prefixes have been automatically assigned to it:

IPAM ENI with Prefix Delegation

My teammate Amit provided a much more detailed walkthrough in his personal blog post on Medium – read it for a more thorough tutorial.

IPAM for LoadBalancer and Egress

While the focus of this blog post is around assigning IP addresses for pods, we ought to cover some additional IP Address Management capabilities that are native to Cilium or to our enterprise edition.

LoadBalancer IPAM

We covered this feature in detail at launch with Cilium 1.13 but as a reminder: Cilium can assign IP addresses to Kubernetes Services of the type LoadBalancer. Often used with Ingress/Gateway API, these services expose applications running inside the cluster to the outside world.

LB-IPAM

Many users would have typically used MetalLB for this purpose but with Cilium offering this functionality natively, we see many users migrating from MetalLB to Cilium. You can find more about the migration process in our tutorial below:

Migrating from MetalLB to Cilium

In this blog post, you will learn how to migrate from MetalLB to Cilium for local service advertisement over Layer 2.

Start Reading

Egress IPAM

Enterprise only

Egress Gateway is a popular Cilium functionality that enables users to set a predictable IP for traffic leaving the cluster. Often used alongside external firewalls, this feature ensures traffic for a specific namespace or pod exits via a specific node and that the source IP is masqueraded to one of the IP addresses of said node. This would have typically leveraged secondary IP interfaces on the nodes, which comes with a manual burden to configure and manage the nodes themselves.

With a recent update to Egress Gateway in Isovalent Enterprise for Cilium, users get additional control of the IP addressing distributed to their Egress Gateway nodes.

The IPAM feature allows you to specify an IP pool in the IsovalentEgressGatewayPolicy from which Cilium leases egress IPs and assigns them to the selected egress interfaces.

Combine it with BGP support for Egress to advertise the IP address range to the rest of the network, to enable the return traffic to come back safely to the cluster.

BGP support for Egress Gateway

Final Thoughts

I hope this tutorial clarifies all the different ways Cilium can assign IP addresses to your Kubernetes entities. I have not mentioned the most obvious way to overcome IPv4 IP address exhaustion: IPv6! If you’d like to learn more about it, read this IPv6 tutorial with Cilium.

Thanks for reading.

Nico Vibert
AuthorNico VibertSenior Staff Technical Marketing Engineer

Related

Blogs

Migrating from MetalLB to Cilium

In this blog post, you will learn how to migrate from MetalLB to Cilium for local service advertisement over Layer 2.

By
Nico Vibert
Blogs

Isovalent Enterprise for Cilium 1.16 – High-Performance Networking With Per-Flow Encryption, End-To-End Multi-Cluster Visibility, BGPV2, and BFD for BGP

Dive into Isovalent Enterprise for Cilium 1.16, which includes advanced features like per-flow encryption, Hubble Timescape Lite, and enhanced BGP support for Kubernetes environments.

By
Dean Lewis
Labs

Isovalent Enterprise for Cilium: Cilium Multi-Networking

Kubernetes is built on the premise that a Pod should belong to a single network. While this approach may work for the majority of use cases, enterprise and telco often require a more sophisticated and flexible networking model. There are many use cases where a Pod may require attachments to multiple networks with different properties via different interfaces. With Cilium Multi-Networking, you can connect your Pod to multiple networks, without having to compromise on security and observability. Start this interactive hands-on lab to experience the benefits of Cilium Multi-Networking.

Industry insights you won’t delete. Delivered to your inbox weekly.