Back to blog

Optimizing Enterprise Networks: Addressing Overlapping CIDR with Cilium

Amit Gupta
Amit Gupta
Published: Updated: Isovalent
Optimizing Enterprise Networks

Modern-day enterprise networks are complex and dynamic, designed to support various applications and services while ensuring security, scalability, and reliability. Kubernetes has become a go-to platform for enterprises looking to modernize their application deployment and management processes. With Multi-Cloud and Hybrid Cloud support, Enterprises can leverage multiple cloud providers or combine on-premises and cloud resources, optimizing costs and performance based on specific needs. Now imagine a team creating a sandbox environment where some of the Pods appear to have the same IP address that Pods have on another cloud provider, leading to massive network outages and application downtime. Isovalent Enterprise for Cilium offers a unique headway into this problem by first connecting multiple Kubernetes clusters or swarms of clusters across cloud providers, hybrid cloud set-ups, and mitigating the overlapping Pod IP addressing problem. This tutorial will guide you through setting up Isovalent Cilium Enterprise’s Cluster Mesh with overlapping Pod CIDR. 

What leads to this scenario in an Enterprise Network?

Overlapping IP addresses occurs when identical IP ranges are allocated across different networks or applications, causing massive communication hurdles. Here are the key causes:

  • Merger and Acquisition
    • When organizations merge, they may have existing networks with overlapping CIDR blocks. Integrating these networks without proper planning can lead to conflicts.
  • Dynamic Environment
    • In dynamic environments (e.g., cloud-native applications), rapid resource provisioning can lead to overlapping CIDRs if not managed carefully, especially in auto-scaling or microservices architectures.
  • Multiple Cloud Providers
    • Using different cloud providers without a unified IP addressing strategy can lead to conflicts when the same CIDR ranges are employed across providers.
  • Legacy Systems
    • Legacy systems may have fixed IP addresses or ranges that conflict with newer allocations, especially if the legacy systems are poorly documented.
  • Suboptimal IP Address Management (IPAM)
    • Failure to use robust IP address management tools can result in misallocations and overlaps, particularly in large or complex networks.
  • Testing Environments
    • Duplicate IP ranges may be used in test or staging environments that mirror production setups, leading to conflicts when integrating these environments.

How does Isovalent address this issue?

You can minimize these interruptions by using Isovalent Enterprise for Cilium’s Cluster Mesh support for Overlapping Pod CIDR or deploying an Egress Gateway. Let’s look at the Overlapping Pod CIDR support from Isovalent.

How do packets traverse a Cluster Mesh with Overlapping Pod CIDRs?

The following diagram provides an overview of the packet flow of Cluster Mesh with overlapping Pod CIDR support.

apiVersion: v1
kind: Service
metadata:
  name: httpbin-service
  annotations:
    service.cilium.io/global: "true"
  • Inter-cluster communication must be performed in this mode through Global or Phantom service.
    • Global Service– Isovalent can load balance traffic to Pods across all clusters in a Cluster Mesh. This is achieved by using global services. A global service is a service that is created with the same spec in each cluster and annotated with service.cilium.io/global: "true".
      • Global service will load balance across all available backends in all clusters by default. 
    • Phantom Service– Global services mandate that an identical service be present in each cluster from which the service is accessed. Phantom services lift this requirement, allowing a given service to be accessed from remote clusters even if it is not there.
      • A phantom service is a LoadBalancer service associated with at least one VIP and annotated with service.isovalent.com/phantom: "true".
      • This makes the phantom service LoadBalancer IP address accessible from all clusters in the Cluster Mesh. Source IP addresses and identities are preserved for cross-cluster communication.
  • When the traffic crosses the cluster boundary, the source IP address is translated to Node IP.
  • For Intra-Cluster communication, the source IP address is preserved.
    • Intra-Cluster communication via a service leads to a destination IP translation.

What is Isovalent Enterprise for Cilium?

Isovalent Cilium Enterprise is an enterprise-grade, hardened distribution of open-source projects CiliumHubble, and Tetragon, built and supported by the Cilium creators. Cilium enhances networking and security at the network layer, while Hubble ensures thorough network observability and tracing. Tetragon ties it all together with runtime enforcement and security observability, offering a well-rounded solution for connectivity, compliance, multi-cloud, and security concerns.

Why Isovalent Enterprise for Cilium?

For enterprise customers requiring support and usage of Advanced NetworkingSecurity, and Observability features, “Isovalent Enterprise for Cilium” is recommended with the following benefits:

  • Advanced network policy: advanced network policy capabilities that enable fine-grained control over network traffic for micro-segmentation and improved security.
  • Hubble flow observability + User Interface: real-time network traffic flow, policy visualization, and a powerful User Interface for easy troubleshooting and network management.
  • Multi-cluster connectivity via Cluster Mesh: seamless networking and security across multiple cloud providers like AWS, Azure, Google, and on-premises environments.
  • Advanced Security Capabilities via Tetragon: Tetragon provides advanced security capabilities such as protocol enforcement, IP and port whitelisting, and automatic application-aware policy generation to protect against the most sophisticated threats. Built on eBPF, Tetragon can easily scale to meet the needs of the most demanding cloud-native environments.
  • Service Mesh: Isovalent Cilium Enterprise provides sidecar-free, seamless service-to-service communication and advanced load balancing, making deploying and managing complex microservices architectures easy.
  • Enterprise-grade support: Enterprise-grade support from Isovalent’s experienced team of experts ensures that issues are resolved promptly and efficiently. Additionally, professional services help organizations deploy and manage Cilium in production environments.

Overlapping Pod CIDR Feature Compatibility Matrix

FeatureStatus
Inter-cluster service communication Supported
L3/L4 network policy enforcementSupported
L7 Network PoliciesRoadmap
Transparent EncryptionRoadmap
Endpoint Health CheckingRoadmap
Socket-based Load BalancingRoadmap

Pre-Requisites

The following prerequisites need to be taken into account before you proceed with this tutorial:

  • Two up-and-running Kubernetes clusters. For this tutorial, we will create two Azure Kubernetes Service clusters using the network-plugin as BYOCNI:
  • The following dependencies should be installed:
  • Cluster Mesh with Overlapping PodCIDR requires Isovalent Enterprise for Cilium 1.13 or later.
  • Users can contact their partner Sales/SE representative(s) at sales@isovalent.com for more detailed insights into the features below and access the requisite documentation and hubble CLI software images.

Creating the AKS clusters

Let’s briefly see the commands to create AKS clusters with the network plugin BYOCNI.

Create an AKS cluster with BYOCNI.

Set the subscription

Choose the subscription you want to use if you have multiple Azure subscriptions.

  • Replace SubscriptionName with your subscription name.
  • You can also use your subscription ID instead of your subscription name.
az account set --subscription SubscriptionName

AKS Resource Group Creation

Create a Resource Group

clusterName="amitgag-18499"
resourceGroup="amitgag-18499"
vnet="amitgag-18499"
location="westus2"

az group create --name $resourceGroup --location $location

Create a Virtual Network (VNet).

az network vnet create \
    --resource-group "${resourceGroup}" \
    --name "${resourceGroup}-cluster-net" \
    --address-prefixes 192.168.10.0/24 \
    --subnet-name "${clusterName}-node-subnet" \
    --subnet-prefix 192.168.10.0/24

Store the ID of the created Subnet.

export NODE_SUBNET_ID=$(az network vnet subnet show \
    --resource-group "${resourceGroup}" \
    --vnet-name "${clusterName}-cluster-net" \
    --name "${clusterName}-node-subnet" \
    --query id \
    -o tsv)

AKS Cluster creation

Pass the --network-plugin parameter with the parameter value of none. During creation, we also request to use "10.10.0.0/16" as the pod CIDR and "10.11.0.0/16" as the services CIDR.

az aks create --name $clusterName --resource-group $resourceGroup \
--network-plugin none \
--pod-cidr "10.10.0.0/16" \
--service-cidr "10.11.0.0/16" \
--dns-service-ip "10.11.0.10" \
--vnet-subnet-id "${NODE_SUBNET_ID}" \
--kubernetes-version 1.29

Set the Kubernetes Context

Log in to the Azure portal, browse Kubernetes Services>, select the respective Kubernetes service created (AKS Cluster), and click connect. This will help you connect to your AKS cluster and set the respective Kubernetes context.

az aks get-credentials --name $clusterName --resource-group $resourceGroup

Cluster status check

Check the status of the nodes and make sure they are in a “Ready” state.

kubectl get nodes -o wide

NAME                                STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-nodepool1-28392065-vmss000000   Ready    <none>   10h   v1.29.7   192.168.10.5   <none>        Ubuntu 22.04.4 LTS   5.15.0-1071-azure   containerd://1.7.20-1
aks-nodepool1-28392065-vmss000001   Ready    <none>   10h   v1.29.7   192.168.10.6   <none>        Ubuntu 22.04.4 LTS   5.15.0-1071-azure   containerd://1.7.20-1
aks-nodepool1-28392065-vmss000002   Ready    <none>   10h   v1.29.7   192.168.10.4   <none>        Ubuntu 22.04.4 LTS   5.15.0-1071-azure   containerd://1.7.20-1

Install Isovalent Enterprise for Cilium

Validate Cilium version

Check the version of cilium with cilium version:

kubectl -n kube-system exec ds/cilium -- cilium status

Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
KVStore:                 Ok   Disabled
Kubernetes:              Ok   1.29 (v1.29.7) [linux/amd64]
Kubernetes APIs:         ["EndpointSliceOrEndpoint", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumCIDRGroup", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    True   [eth0   192.168.10.5 fe80::222:48ff:fec0:9a09 (Direct Routing)]
Host firewall:           Disabled
SRv6:                    Disabled
CNI Chaining:            none
CNI Config file:         successfully wrote CNI configuration file to /host/etc/cni/net.d/05-cilium.conflist
Cilium:                  Ok   1.15.8-cee.1 (v1.15.8-cee.1-44b1b109)
NodeMonitor:             Listening for events on 2 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok
IPAM:                    IPv4: 5/254 allocated from 10.10.0.0/24,
ClusterMesh:             1/1 remote clusters ready, 1 global-services
IPv4 BIG TCP:            Disabled
IPv6 BIG TCP:            Disabled
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       38/38 healthy
Proxy Status:            OK, ip 10.10.0.70, 0 redirects active on ports 10000-20000, Envoy: embedded
Global Identity Range:   min 65536, max 131071
Hubble:                  Ok         Current/Max Flows: 4095/4095 (100.00%), Flows/s: 19.65   Metrics: Disabled
Encryption:              Disabled
Cluster health:                     Probe disabled

Cilium Health Check

cilium-health is a tool available in Cilium that provides visibility into the overall health of the cluster’s networking connectivity. You can check node-to-node health with cilium-health status:

kubectl -n kube-system exec ds/cilium -- cilium-health status

Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
Probe time:   2024-09-25T11:12:50Z
Nodes:
  amitgag-18499/aks-nodepool1-28392065-vmss000000 (localhost):
    Host connectivity to 192.168.10.5:
      ICMP to stack:   OK, RTT=286.502µs
      HTTP to agent:   OK, RTT=216.102µs
  amitgag-18499/aks-nodepool1-28392065-vmss000001:
    Host connectivity to 192.168.10.6:
      ICMP to stack:   OK, RTT=1.143609ms
      HTTP to agent:   OK, RTT=705.006µs
  amitgag-18499/aks-nodepool1-28392065-vmss000002:
    Host connectivity to 192.168.10.4:
      ICMP to stack:   OK, RTT=1.402211ms
      HTTP to agent:   OK, RTT=595.304µs

What are the Pod IPs across clusters?

Once the AKS clusters are created, you can check that the Pods on the AKS clusters are on the same IP addresses.

 az aks show -n amitgag-18499 -g amitgag-18499-group --subscription <subscription-id> | grep podCidr
    "podCidr": "10.10.0.0/16",

How can you Peer the AKS clusters?

Use VNet peering to peer the AKS clusters across the two chosen regions. This step only needs to be done in one direction. The connection will automatically be established in both directions.

  • Login to the Azure Portal
  • Click Home
  • Click Virtual Network
  • Select the respective Virtual Network
  • Click Peerings
  • Click Add
  • Give the local peer a name
  • Select “Allow cluster1 to access cluster2”
  • Give the remote peer a name
  • Select the virtual network deployment model as “Resource Manager”
  • Select the subscription
  • Select the virtual network of the remote peer
  • Select “Allow cluster2 to access cluster1”
  • Click Add

How can you enable Cluster Mesh with overlapping Pod CIDR?

  • To set up cluster-mesh, reach out to sales@isovalent.com to get access to the complete Enterprise documentation.
  • Some key pre-requisites to set up Cluster Mesh:
    • A unique Cluster ID and Cluster Name must identify all clusters.
    • All clusters must be configured with the same datapath mode, in other words, either native routing or encapsulation (using the same encapsulation protocol).
      • Overlapping PodCIDR mandates tunneling.
    • The two clusters should be set up with Kube-Proxy Replacement set to true.
    • Install the cert-manager CRDs and set up the cilium issuer associated with the same Certification Authority in all clusters.
      • It doesn’t have to be via Cert-Manager, but it’s highly recommended, as manual CA cert copying and pasting is error-prone.
  • Create a sample yaml file. (Unique per cluster)
    • The yaml configuration file contains the basic properties to set up Cilium, Cluster Mesh, and Hubble.
      • Configures Cilium in CRD identity allocation mode.
      • Enables Hubble and Hubble Relay.
      • Enables the Cluster Mesh API Server and exposes it using a service of Type LoadBalancer. Cloud-provider-specific annotations are added to force the usage of private IP addresses.
      • Enables the automatic generation of the certificates using cert-manager, leveraging the existing cilium Issuer associated with the shared certificate authority.
      • Configures the most granular cross-cluster authentication scheme for improved segregation. 
  • Sample configuration file for Cluster Mesh:
aksbyocni:
  enabled: true

clustermesh:
  useAPIServer: true

  apiserver:
    service:
      type: LoadBalancer
      annotations:
        service.beta.kubernetes.io/azure-load-balancer-internal: "true"

    tls:
      authMode: cluster
      auto:
        enabled: true
        method: certmanager
        certManagerIssuerRef:
          group: cert-manager.io
          kind: Issuer
          name: cilium

hubble:
  enabled: true
  relay:
    enabled: true

  tls:
    auto:
      enabled: true
      method: certmanager
      certManagerIssuerRef:
        group: cert-manager.io
        kind: Issuer
        name: cilium
  • To enable support for the overlapping Pod CIDR feature, you would need to use the following helm flag:
enterprise:
  clustermesh:
    enableOverlappingPodCIDRSupport: true
  • Install Isovalent Enterprise for Cilium and connect the clusters using the Cluster Mesh documentation.

How can you verify Cluster Mesh status?

Check the status of the clusters by running cilium clustermesh status on either of the clusters.  If you use a service of type LoadBalancer, it will also wait for the LoadBalancer to be assigned an IP.

cilium clustermesh status --context amitgag-18499 --wait

✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ Cluster access information is available:
  - 192.168.10.7:2379
✅ Deployment clustermesh-apiserver is ready
ℹ️  KVStoreMesh is disabled

✅ All 3 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]

🔌 Cluster Connections:
  - amitgag-30384: 3/3 configured, 3/3 connected

🔀 Global services: [ min:1 / avg:1.0 / max:1 ]

What’s the status of the clustermesh-api pod?

The Cluster Mesh API Server contains an etcd instance to keep track of the cluster’s state. The state from multiple clusters is never mixed. Cilium agents in other clusters connect to the Cluster Mesh API Server to watch for changes and replicate the multi-cluster state into their cluster. Access to the Cluster Mesh API Server is protected using TLS certificates. Access from one cluster to another is always read-only, ensuring failure domains remain unchanged. A failure in one cluster never propagates to other clusters.

Ensure that the clustermesh-api pod is running on both clusters.

kubectl get pods -n kube-system --context amitgag-30384 | grep clustermesh
clustermesh-apiserver-7f565697f-xw4dx   2/2     Running   0          17h

kubectl get pods -n kube-system --context amitgag-18499 | grep clustermesh
clustermesh-apiserver-7f565697f-qxs65   2/2     Running   0          16h

How can you test Inter-Cluster Service communication?

  • Let’s deploy a client on Cluster-1
apiVersion: apps/v1
kind: Deployment
metadata:
  name: netshoot
  labels:
    app: netshoot
spec:
  replicas: 1
  selector:
    matchLabels:
      app: netshoot
  template:
    metadata:
      labels:
        app: netshoot
    spec:
      containers:
      - name: netshoot
        image: nicolaka/netshoot:v0.9
        command: ["sleep", "infinite"]
kubectl apply -f client.yaml
  • Deploy a Deployment and Global Service on both clusters.
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin
  labels:
    app: httpbin
spec:
  replicas: 2
  selector:
    matchLabels:
      app: httpbin
  template:
    metadata:
      labels:
        app: httpbin
    spec:
      containers:
      - name: httpbin
        image: kennethreitz/httpbin
---
apiVersion: v1
kind: Service
metadata:
  name: httpbin-service
  annotations:
    service.cilium.io/global: "true"
spec:
  type: ClusterIP
  selector:
    app: httpbin
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
kubectl apply -f deployment-service.yaml --context=amitgag-18499

kubectl apply -f deployment-service.yaml --context=amitgag-30384
  • Check connectivity.
    • Notice that when the backend of a local cluster is selected, the response shows the original source IP of the Pod.
kubectl get pods -o wide

NAME                        READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
httpbin-66c877d7d-fj7jc     1/1     Running   0          30h   10.10.0.119   aks-nodepool1-28392065-vmss000000   <none>           <none>
httpbin-66c877d7d-wpn5z     1/1     Running   0          30h   10.10.3.174   aks-nodepool1-28392065-vmss000001   <none>           <none>
netshoot-7cd4fdf959-4fvr6   1/1     Running   0          30h   10.10.3.68    aks-nodepool1-28392065-vmss000001   <none>           <none>

kubectl exec -it deploy/netshoot -- curl http://httpbin-service.default.svc.cluster.local/get
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Host": "httpbin-service.default.svc.cluster.local",
    "User-Agent": "curl/7.87.0"
  },
  "origin": "10.10.3.68",
  "url": "http://httpbin-service.default.svc.cluster.local/get"
}
  • Check connectivity again.
    • Notice that when the backend of a remote cluster is selected, the response shows the IP of the node on which the source Pod is running.
kubectl exec -it deploy/netshoot -- curl http://httpbin-service.default.svc.cluster.local/get
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Host": "httpbin-service.default.svc.cluster.local",
    "User-Agent": "curl/7.87.0"
  },
  "origin": "192.168.10.6",
  "url": "http://httpbin-service.default.svc.cluster.local/get"
}

How can you use L3/L4 policies in an Overlapping Pod CIDR scenario?

  • When using Cilium, endpoint IP addresses are irrelevant when defining security policies. Instead, you can use the labels assigned to the Pods to define security policies. The policies will be applied to the right Pods based on the labels, irrespective of where or when they run within the cluster.
    • The layer 3 policy establishes the base connectivity rules regarding which endpoints can talk to each other. 
    • The layer 4 policy can be specified independently or independently in addition to the layer 3 policies. It restricts an endpoint’s ability to emit and/or receive packets on a particular port using a protocol.
  • A cilium network policy is always only applied to a single cluster and not automatically synced to the remote one if Cluster Mesh is enabled.
  • Let’s deploy an egress policy on cluster 1.
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "allow-cross-cluster-egress"
  namespace: default
spec:
  endpointSelector:
    matchLabels:
      app: netshoot
      io.cilium.k8s.policy.cluster: amitgag-18499
  egress:
  - toEndpoints:
    - matchLabels:
        app: httpbin
        io.cilium.k8s.policy.cluster: amitgag-30384
    toPorts:
    - ports:
      - port: "80"
        protocol: TCP
  - toEndpoints:
    - matchLabels:
        k8s-app: kube-dns
        io.kubernetes.pod.namespace: kube-system
    toPorts:
    - ports:
      - port: "53"
        protocol: ANY
kubectl apply -f egresscluster1.yaml
  • Let’s deploy an ingress policy on cluster 2.
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "allow-cross-cluster-ingress"
  namespace: default
spec:
  endpointSelector:
    matchLabels:
      app: httpbin
      io.cilium.k8s.policy.cluster: amitgag-30384
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: netshoot
        io.cilium.k8s.policy.cluster: amitgag-18499
kubectl apply -f ingresscluster2.yaml
  • Check connectivity again. You will notice that the pod cannot connect to local backends but will connect only to remote backends.
kubectl exec -it deploy/netshoot -- curl http://httpbin-service.default.svc.cluster.local/get
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Host": "httpbin-service.default.svc.cluster.local",
    "User-Agent": "curl/7.87.0"
  },
  "origin": "192.168.10.6",
  "url": "http://httpbin-service.default.svc.cluster.local/get"
}
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Host": "httpbin-service.default.svc.cluster.local",
    "User-Agent": "curl/7.87.0"
  },
  "origin": "192.168.10.6",
  "url": "http://httpbin-service.default.svc.cluster.local/get"
}

Troubleshooting Cluster Mesh Issues

You can use the following commands to troubleshoot Cluster Mesh-related deployments.

  • Once the clusters are connected via Cluster Mesh, you can check the health of Nodes from either cluster.
    • Notice Nodes for both clusters are displayed.
kubectl -n kube-system exec ds/cilium -- cilium-health status

Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
Probe time:   2024-09-26T07:32:50Z
Nodes:
  amitgag-18499/aks-nodepool1-28392065-vmss000000 (localhost):
    Host connectivity to 192.168.10.5:
      ICMP to stack:   OK, RTT=339.001µs
      HTTP to agent:   OK, RTT=201.801µs
  amitgag-18499/aks-nodepool1-28392065-vmss000001:
    Host connectivity to 192.168.10.6:
      ICMP to stack:   OK, RTT=1.212905ms
      HTTP to agent:   OK, RTT=1.192004ms
  amitgag-18499/aks-nodepool1-28392065-vmss000002:
    Host connectivity to 192.168.10.4:
      ICMP to stack:   OK, RTT=1.196305ms
      HTTP to agent:   OK, RTT=643.202µs
  amitgag-30384/aks-nodepool1-15397795-vmss000000:
    Host connectivity to 192.168.20.4:
      ICMP to stack:   OK, RTT=73.772068ms
      HTTP to agent:   OK, RTT=73.182465ms
  amitgag-30384/aks-nodepool1-15397795-vmss000001:
    Host connectivity to 192.168.20.5:
      ICMP to stack:   OK, RTT=74.130569ms
      HTTP to agent:   OK, RTT=73.496767ms
  amitgag-30384/aks-nodepool1-15397795-vmss000002:
    Host connectivity to 192.168.20.6:
      ICMP to stack:   OK, RTT=68.362948ms
      HTTP to agent:   OK, RTT=67.507044ms
  • Check the service endpoints from cluster 1; the remote endpoints are marked as preferred with the @2 suffix that denotes the Cluster-ID of the cluster.
    • Notice the httpbin-service has cluster-IP 10.11.42.230, and the remote backend Pods in cluster-2 are the preferred Pods with IPs 10.10.0.252@2 and 10.10.4.129@2.
kubectl exec -n kube-system -ti ds/cilium -- cilium service list --clustermesh-affinity

Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
ID   Frontend             Service Type   Backend
1    10.11.42.230:80      ClusterIP      1 => 10.10.0.252@2:80 (active)
                                         2 => 10.10.4.129@2:80 (active)
                                         3 => 10.10.3.174:80 (active)
                                         4 => 10.10.0.119:80 (active)
2    10.11.0.1:443        ClusterIP      1 => 4.155.152.37:443 (active)
3    10.11.155.247:2379   ClusterIP      1 => 10.10.0.218:2379 (active)
4    192.168.10.7:2379    LoadBalancer   1 => 10.10.0.218:2379 (active)
5    192.168.10.5:31755   NodePort       1 => 10.10.0.218:2379 (active)
6    0.0.0.0:31755        NodePort       1 => 10.10.0.218:2379 (active)
7    10.11.155.89:443     ClusterIP      1 => 192.168.10.5:4244 (active)
8    10.11.0.10:53        ClusterIP      1 => 10.10.3.145:53 (active)
                                         2 => 10.10.0.188:53 (active)
9    10.11.234.73:443     ClusterIP      1 => 10.10.3.243:4443 (active)
                                         2 => 10.10.1.250:4443 (active)

kubectl get svc

NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
httpbin-service   ClusterIP   10.11.42.230   <none>        80/TCP    8d
kubernetes        ClusterIP   10.11.0.1      <none>        443/TCP   9d

kubectl get pods -o wide

NAME                        READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
httpbin-66c877d7d-fj7jc     1/1     Running   0          31h   10.10.0.119   aks-nodepool1-28392065-vmss000000   <none>           <none>
httpbin-66c877d7d-wpn5z     1/1     Running   0          31h   10.10.3.174   aks-nodepool1-28392065-vmss000001   <none>           <none>
netshoot-7cd4fdf959-4fvr6   1/1     Running   0          31h   10.10.3.68    aks-nodepool1-28392065-vmss000001   <none>           <none>

kubectl get pods -o wide --context=amitgag-30384

NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
httpbin-66c877d7d-pvxtp   1/1     Running   0          32h   10.10.4.129   aks-nodepool1-15397795-vmss000002   <none>           <none>
httpbin-66c877d7d-sshvf   1/1     Running   0          32h   10.10.0.252   aks-nodepool1-15397795-vmss000000   <none>           <none>
  • Verify whether Cilium agents are successfully connected to all remote clusters.
kubectl exec -n kube-system -ti ds/cilium -- cilium-dbg status --all-clusters

Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
KVStore:                Ok   Disabled
Kubernetes:             Ok   1.29 (v1.29.7) [linux/amd64]
Kubernetes APIs:        ["EndpointSliceOrEndpoint", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumCIDRGroup", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   True   [eth0   192.168.10.5 fe80::222:48ff:fec0:9a09 (Direct Routing)]
Host firewall:          Disabled
SRv6:                   Disabled
CNI Chaining:           none
CNI Config file:        successfully wrote CNI configuration file to /host/etc/cni/net.d/05-cilium.conflist
Cilium:                 Ok   1.15.8-cee.1 (v1.15.8-cee.1-44b1b109)
NodeMonitor:            Listening for events on 2 CPUs with 64x4096 of shared memory
Cilium health daemon:   Ok
IPAM:                   IPv4: 5/254 allocated from 10.10.0.0/24,
ClusterMesh:            1/1 remote clusters ready, 1 global-services
   amitgag-30384: ready, 3 nodes, 8 endpoints, 5 identities, 1 services, 0 reconnections (last: never)
   └  etcd: 1/1 connected, leases=0, lock leases=1, has-quorum=true: endpoint status checks are disabled, ID: e5cffb6071101644
   └  remote configuration: expected=true, retrieved=true, cluster-id=2, kvstoremesh=false, sync-canaries=true
   └  synchronization status: nodes=true, endpoints=true, identities=true, services=true
IPv4 BIG TCP:            Disabled
IPv6 BIG TCP:            Disabled
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       38/38 healthy
Proxy Status:            OK, ip 10.10.0.70, 0 redirects active on ports 10000-20000, Envoy: embedded
Global Identity Range:   min 65536, max 131071
Hubble:                  Ok         Current/Max Flows: 4095/4095 (100.00%), Flows/s: 19.52   Metrics: Disabled
Encryption:              Disabled
Cluster health:                     Probe disabled
kubectl exec -n kube-system -ti ds/cilium -- cilium-dbg status --all-clusters

Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
KVStore:                Ok   Disabled
Kubernetes:             Ok   1.29 (v1.29.7) [linux/amd64]
Kubernetes APIs:        ["EndpointSliceOrEndpoint", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumCIDRGroup", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   True   [eth0   192.168.20.4 fe80::7e1e:52ff:fe43:809c (Direct Routing)]
Host firewall:          Disabled
SRv6:                   Disabled
CNI Chaining:           none
CNI Config file:        successfully wrote CNI configuration file to /host/etc/cni/net.d/05-cilium.conflist
Cilium:                 Ok   1.15.8-cee.1 (v1.15.8-cee.1-44b1b109)
NodeMonitor:            Listening for events on 2 CPUs with 64x4096 of shared memory
Cilium health daemon:   Ok
IPAM:                   IPv4: 4/254 allocated from 10.10.0.0/24,
ClusterMesh:            1/1 remote clusters ready, 1 global-services
   amitgag-18499: ready, 3 nodes, 9 endpoints, 6 identities, 1 services, 2 reconnections (last: 31h58m18s ago)
   └  etcd: 1/1 connected, leases=0, lock leases=0, has-quorum=true: endpoint status checks are disabled, ID: f4644e0a228aaf04
   └  remote configuration: expected=true, retrieved=true, cluster-id=1, kvstoremesh=false, sync-canaries=true
   └  synchronization status: nodes=true, endpoints=true, identities=true, services=true
IPv4 BIG TCP:            Disabled
IPv6 BIG TCP:            Disabled
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       33/33 healthy
Proxy Status:            OK, ip 10.10.0.153, 0 redirects active on ports 10000-20000, Envoy: embedded
Global Identity Range:   min 131072, max 196607
Hubble:                  Ok         Current/Max Flows: 4095/4095 (100.00%), Flows/s: 16.78   Metrics: Disabled
Encryption:              Disabled
Cluster health:                     Probe disabled
  • Multiple causes can prevent Cilium agents (or KVStoreMesh, when enabled) from correctly connecting to the remote etcd cluster, being it the sidecar instance part of the clustermesh-apiserver, or a separate etcd cluster when Cilium operates in KVStore mode.
    • Cilium features an automatic cilium-dbg troubleshoot Cluster Mesh command, which performs automatic checks to validate DNS resolution, network connectivity, mTLS authentication, etcd authorization, and more and reports the output in a user-friendly format.
kubectl exec -n kube-system -ti ds/cilium -- cilium-dbg troubleshoot clustermesh

Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
Found 1 cluster configurations

Cluster "amitgag-30384":
📄 Configuration path: /var/lib/cilium/clustermesh/amitgag-30384

🔌 Endpoints:
   - https://amitgag-30384.mesh.cilium.io:2379
     ✅ Hostname resolved to: 192.168.20.7
     ✅ TCP connection successfully established to 192.168.20.7:2379
     ✅ TLS connection successfully established to 192.168.20.7:2379
     ℹ️  Negotiated TLS version: TLS 1.3, ciphersuite TLS_AES_128_GCM_SHA256
     ℹ️  Etcd server version: 3.5.15

🔑 Digital certificates:
   ✅ TLS Root CA certificates:
      - Serial number:      ###################################
        Subject:             CN=Cilium CA
        Issuer:              CN=Cilium CA
        Validity:
          Not before:  2024-09-17 07:54:58 +0000 UTC
          Not after:   2027-09-17 07:54:58 +0000 UTC
   ✅ TLS client certificates:
      - Serial number:       ##############################################
        Subject:             CN=remote
        Issuer:              CN=Cilium CA
        Validity:
          Not before:  2024-09-17 07:54:00 +0000 UTC
          Not after:   2027-09-17 07:54:00 +0000 UTC

⚙️ Etcd client:
   ✅ Etcd connection successfully established
   ℹ️  Etcd cluster ID: e5cffb6071101644
kubectl exec -n kube-system -ti ds/cilium -- cilium-dbg troubleshoot clustermesh

Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
Found 1 cluster configurations

Cluster "amitgag-18499":
📄 Configuration path: /var/lib/cilium/clustermesh/amitgag-18499

🔌 Endpoints:
   - https://amitgag-18499.mesh.cilium.io:2379
     ✅ Hostname resolved to: 192.168.10.7
     ✅ TCP connection successfully established to 192.168.10.7:2379
     ✅ TLS connection successfully established to 192.168.10.7:2379
     ℹ️  Negotiated TLS version: TLS 1.3, ciphersuite TLS_AES_128_GCM_SHA256
     ℹ️  Etcd server version: 3.5.15

🔑 Digital certificates:
   ✅ TLS Root CA certificates:
      - Serial number:      ###################################
        Subject:             CN=Cilium CA
        Issuer:              CN=Cilium CA
        Validity:
          Not before:  2024-09-17 07:53:19 +0000 UTC
          Not after:   2027-09-17 07:53:19 +0000 UTC
   ✅ TLS client certificates:
      - Serial number:       ############################################
        Subject:             CN=remote
        Issuer:              CN=Cilium CA
        Validity:
          Not before:  2024-09-17 07:52:00 +0000 UTC
          Not after:   2027-09-17 07:52:00 +0000 UTC

⚙️ Etcd client:
   ✅ Etcd connection successfully established
   ℹ️  Etcd cluster ID: f4644e0a228aaf04

Conclusion

The evolution of network architectures poses challenges, and the Isovalent team is here to help you surge through these challenges. Overlapping addresses is one such challenge, but as you can see we can easily overcome that. Hopefully, this post gave you an overview of setting up Isovalent Cilium Enterprise’s Cluster Mesh with overlapping Pod CIDR. You can schedule a demo with our experts if you’d like to learn more.

Try it out

Start with the Cluster Mesh lab and see how to enable Cluster Mesh in your enterprise environment.

Further Reading

Amit Gupta
AuthorAmit GuptaSenior Technical Marketing Engineer

Related

Blogs

Isovalent Enterprise for Cilium 1.13: SRv6 L3VPN, Overlapping CIDR Support, FromFQDN in Network Policy, Grafana plugin and more!

We are proud to announce Isovalent Enterprise for Cilium 1.13! Includes support for SRv6, ClusterMesh for overlapping CIDRs and much more!

By
Nico Vibert
Blogs

Cilium Cluster Mesh in AKS

This tutorial describes the steps of how to enable cilium cluster mesh on an AKS cluster running Isovalent Enterprise for Cilium from Azure Marketplace.

By
Amit Gupta
Labs

Cilium Cluster Mesh

With the rise of Kubernetes adoption, an increasing number of clusters is deployed for various needs, and it is becoming common for companies to have clusters running on multiple cloud providers, as well as on-premise. Kubernetes Federation has for a few years brought the promise of connecting these clusters into multi-zone layers, but latency issues are more often than not preventing such architectures. Cilium Cluster Mesh allows you to connect the networks of multiple clusters in such as way that pods in each cluster can discover and access services in all other clusters of the mesh, provided all the clusters run Cilium as their CNI. This allows to effectively join multiple clusters into a large unified network, regardless of the Kubernetes distribution each of them is running. In this lab, we will see how to set up Cilium Cluster Mesh, and the benefits from such an architecture.

Industry insights you won’t delete. Delivered to your inbox weekly.