What is Cilium Cluster Mesh? Cilium cluster mesh can be called “Global Services made accessible across a swarm of clusters.” These swarms of clusters could be within the respective cloud provider, span multiple cloud providers, span HA zones, span availability zones, and span on-prem and cloud environments. This tutorial guides you through enabling Cilium Cluster Mesh (using Azure CLI) in an AKS cluster (in Dynamic IP allocation mode) running Isovalent Enterprise for Cilium from Azure Marketplace.
What is Isovalent Enterprise for Cilium?
Isovalent Cilium Enterprise is an enterprise-grade, hardened distribution of open-source projects Cilium, Hubble, and Tetragon, built and supported by the Cilium creators. Cilium enhances networking and security at the network layer, while Hubble ensures thorough network observability and tracing. Tetragon ties it all together with runtime enforcement and security observability, offering a well-rounded solution for connectivity, compliance, multi-cloud, and security concerns.
Why Isovalent Enterprise for Cilium?
For enterprise customers requiring support and usage of Advanced Networking, Security, and Observability features, “Isovalent Enterprise for Cilium” is recommended with the following benefits:
Advanced network policy: Isovalent Cilium Enterprise provides advanced network policy capabilities, including DNS-aware policy, L7 policy, and deny policy, enabling fine-grained control over network traffic for micro-segmentation and improved security.
Hubble flow observability + User Interface: Isovalent Cilium Enterprise Hubble observability feature provides real-time network traffic flow, policy visualization, and a powerful User Interface for easy troubleshooting and network management.
Multi-cluster connectivity via Cluster Mesh: Isovalent Cilium Enterprise provides seamless networking and security across multiple clouds, including public cloud providers like AWS, Azure, and Google Cloud Platform, as well as on-premises environments.
Advanced Security Capabilities via Tetragon: Tetragon provides advanced security capabilities such as protocol enforcement, IP and port whitelisting, and automatic application-aware policy generation to protect against the most sophisticated threats. Built on eBPF, Tetragon can easily scale to meet the needs of the most demanding cloud-native environments.
Service Mesh: Isovalent Cilium Enterprise provides sidecar-free, seamless service-to-service communication and advanced load balancing, making it easy to deploy and manage complex microservices architectures.
Enterprise-grade support: Isovalent Cilium Enterprise includes enterprise-grade support from Isovalent’s experienced team of experts, ensuring that any issues are resolved promptly and efficiently. Additionally, professional services help organizations deploy and manage Cilium in production environments.
Pre-Requisites
The following prerequisites need to be taken into account before you proceed with this tutorial:
Users can contact their partner Sales/SE representative(s) at sales@isovalent.com for more detailed insights into the features below and access the requisite documentation and Cilium CLI software images.
Azure CLI version 2.48.1 or later. Run az –version to see the currently installed version. If you need to install or upgrade, see Install Azure CLI.
Ensure you have enough quota resources to create an AKS cluster. Go to the Subscription blade, navigate to “Usage + Quotas,” and make sure you have enough quota for the following resources: -Regional vCPUs -Standard Dv4 Family vCPUs
Creating the AKS clusters
Let’s briefly see the commands used to bring up the AKS clusters running Azure CNI powered by Cilium.
How can you Upgrade your AKS clusters to Isovalent Enterprise for Cilium?
Upgrade the AKS clusters to Isovalent Enterprise for Cilium to enable Cluster Mesh on an AKS cluster. In this tutorial, we will upgrade the AKS clusters using Azure Marketplace.
Search for Isovalent Enterprise for Cilium and select the container app.
Click Create
Select the Subscription, Resource Group, and Region where the cluster was created
Select “No” for the option that says “Create New Dev Cluster”
Click Next
Select the AKS cluster that was created in the previous section.
Click Next
The AKS cluster will now be upgraded to Isovalent Enterprise for Cilium.
What are the Node and Pod IPs across clusters?
Once the AKS clusters are upgraded to Isovalent Enterprise for Cilium, you can check that the nodes and pods on the AKS clusters are on distinct IP addresses.
Note- This is a mandatory requirement. You can leverage support for Overlapping IP CIDR, but that support will be available in a future release of Isovalent Enterprise for Cilium from the Azure marketplace.
How can you Peer the AKS clusters?
Use VNet peering to peer the AKS clusters across the two chosen regions. This step only needs to be done in one direction. The connection will automatically be established in both directions.
Login to the Azure Portal
Click Home
Click Virtual Network
Select the respective Virtual Network
Click Peerings
Click Add
Give the local peer a name
Select “Allow cluster1 to access cluster2”
Give the remote peer a name
Select the virtual network deployment model as “Resource Manager”
Select the subscription
Select the virtual network of the remote peer
Select “Allow cluster2 to access cluster1”
Click Add
Are the pods and nodes across clusters reachable?
With VNet peering in place, ensure that pods and nodes across clusters are reachable. As you can see below, the cluster mesh pods are reachable from distinct nodes on the AKS clusters running in different regions:
How can you enable Cluster Mesh?
You can enable Cluster Mesh via Azure CLI or an ARM template.
Note- since you are altering the ClusterID, it will recreate all Cilium identities in a production cluster with existing connections, which should be handled appropriately.
Azure CLI
Follow the detailed instructions on Azure CLI to create and upgrade a cluster on AKS running Isovalent Enterprise for Cilium.
How can you verify Cluster Mesh status?
Check the status of the clusters by running cilium clustermesh status on either of the clusters.
cilium clustermesh status --context cluster1 --wait
cilium clustermesh status --context cluster2 --wait
What’s the status of the clustermesh-api pod?
The Cluster Mesh API Server contains an etcd instance to keep track of the cluster’s state. The state from multiple clusters is never mixed. Cilium agents in other clusters connect to the Cluster Mesh API Server to watch for changes and replicate the multi-cluster state into their cluster. Access to the Cluster Mesh API Server is protected using TLS certificates.
Access from one cluster to another is always read-only, ensuring that failure domains remain unchanged. A failure in one cluster never propagates to other clusters.
Ensure that the clustermesh-api pod is running on both clusters.
Note- Since the isovalent image available from the marketplace is 1.12, the Cilium CLI is recommended.
Connect the AKS clusters using Cilium CLI. This step only needs to be done in one direction. The connection will automatically be established in both directions:
cilium clustermesh connect --context cluster1 --destination-context cluster2
✅ Detected Helm release with Cilium version 1.12.17
⚠️ Cilium Version is less than 1.14.0. Continuing in classic mode.
✨ Extracting access information of cluster cluster2...
🔑 Extracting secrets from cluster cluster2...
ℹ️ Found ClusterMesh service IPs: [20.116.185.121]✨ Extracting access information of cluster cluster1...
🔑 Extracting secrets from cluster cluster1...
ℹ️ Found ClusterMesh service IPs: [52.142.113.148]✨ Connecting cluster cluster1 -> cluster2...
🔑 Secret cilium-clustermesh does not exist yet, creating it...
🔑 Patching existing secret cilium-clustermesh...
✨ Patching DaemonSet with IP aliases cilium-clustermesh...
✨ Connecting cluster cluster2 -> cluster1...
🔑 Secret cilium-clustermesh does not exist yet, creating it...
🔑 Patching existing secret cilium-clustermesh...
✨ Patching DaemonSet with IP aliases cilium-clustermesh...
✅ Connected cluster cluster1 and cluster2!
The output would be something as below:
What can you do now that cluster mesh is enabled?
Load-balancing & Service Discovery
The global service discovery of Cilium’s multi-cluster model is built using standard Kubernetes services and designed to be completely transparent to existing Kubernetes application deployments.
Create a global service that can be accessed from both clusters. Establishing load-balancing between clusters is achieved by defining a Kubernetes service with an identical name and namespace in each cluster and adding the annotation io.cilium/global-service: "true" to declare it global. Cilium will automatically perform load-balancing to pods in both clusters.
apiVersion: v1
kind: Service
metadata:name: rebel-base
annotations:io.cilium/global-service:"true"spec:type: ClusterIP
ports:-port:80selector:name: rebel-base
From either cluster, access the global service. Notice the response is received from pods across both clusters.
kubectl exec -ti deployments/x-wing -- /bin/sh -c 'for i in $(seq 1 10); do curl rebel-base; done'{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}
Service Affinity
Using the Local Service Affinity annotation, it is possible to load balance traffic only to local endpoints unless the local endpoints aren’t available. Then, traffic will be sent to endpoints in remote clusters. Load-balancing across multiple clusters might not be ideal in some cases. The annotation io.cilium/service-affinity: "local|remote|none" can be used to specify the preferred endpoint destination.
For example, if the value of annotation io.cilium/service-affinity is local, the Global Service will load-balance across healthy local backends, and only user remote endpoints if and only if all of the local backends are not available or unhealthy.
In cluster 1, add io.cilium/service-affinity="local" to existing global service
kubectl annotate service rebel-base io.cilium/service-affinity=local --overwrite
From cluster 1, access the global service. You will see replies from pods in cluster 1 only.
kubectl exec -ti deployments/x-wing -- /bin/sh -c 'for i in $(seq 1 10); do curl rebel-base; done'{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}
From cluster 2, access the global service. As usual, you will see replies from pods in both clusters.
kubectl exec -ti deployments/x-wing -- /bin/sh -c 'for i in $(seq 1 10); do curl rebel-base; done'{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}{"Galaxy":"Alderaan", "Cluster":"Cluster-1"}
From cluster 1, check the service endpoints; the local endpoints are marked as preferred.
In cluster 1, changeio.cilium/service-affinity the value to remote for existing global service
kubectl annotate service rebel-base io.cilium/service-affinity=remote --overwrite
From cluster 1, access the global service. This time, the replies are coming from pods in cluster 2 only.
kubectl exec -ti deployments/x-wing -- /bin/sh -c 'for i in $(seq 1 10); do curl rebel-base; done'{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}{"Galaxy":"Alderaan", "Cluster":"Cluster-2"}
Conclusion
Hopefully, this post gave you a good overview of deploying Cluster mesh in an AKS cluster running Isovalent Enterprise for Cilium. If you have any feedback on the solution, please share it with us. You’ll find us on the Cilium Slack channel.
Amit Gupta is a senior technical marketing engineer at Isovalent, powering eBPF cloud-native networking and security. Amit has 21+ years of experience in Networking, Telecommunications, Cloud, Security, and Open-Source. He has previously worked with Motorola, Juniper, Avi Networks (acquired by VMware), and Prosimo. He is keen to learn and try out new technologies that aid in solving day-to-day problems for operators and customers.
He has worked in the Indian start-up ecosystem for a long time and helps new folks in that area outside of work. Amit is an avid runner and cyclist and also spends considerable time helping kids in orphanages.
In this tutorial, users will learn how to deploy Isovalent Enterprise for Cilium on your AKS cluster from Azure Marketplace on a new cluster and also upgrade an existing cluster from an AKS cluster running Azure CNI powered by Cilium to Isovalent Enterprise for Cilium.
In this tutorial, you will learn how to enable Enterprise features (Layer-3, 4 & 7 policies, DNS-based policies, and observe the Network Flows using Hubble-CLI) in an Azure Kubernetes Service (AKS) cluster running Isovalent Enterprise for Cilium.
By
Amit Gupta
Industry insights you won’t delete. Delivered to your inbox weekly.