Back to blog

How to Deploy Cilium and Egress Gateway in Azure Kubernetes Service (AKS)

Amit Gupta
Amit Gupta
Published: Updated: Isovalent
Integrating Kubernetes into Traditional Infrastructure with HA Egress Gateway

Kubernetes changes the way we think about networking. In an ideal Kubernetes world, the network would be flat, and the Pod network would control all routing and security between the applications using Network Policies. In many Enterprise environments, though, the applications hosted on Kubernetes need to communicate with workloads outside the Kubernetes cluster, subject to connectivity constraints and security enforcement. Because of the nature of these networks, traditional firewalling usually relies on static IP addresses (or at least IP ranges). This can make it difficult to integrate a Kubernetes cluster, which has a varying and, at times, dynamic number of nodes, into such a network. Cilium’s Egress Gateway feature changes this by allowing you to specify which nodes should be used by a pod to reach the outside world. This blog post will walk you through deploying Cilium and Egress Gateway in AKS (Azure Kubernetes Service) using BYOCNI as the network plugin.

What is an Egress Gateway?

The egress gateway feature allows redirecting traffic originating in pods destined to specific CIDRs outside the cluster to be routed through particular nodes.

When the egress gateway feature is enabled and egress gateway policies are in place, packets leaving the cluster are masqueraded with selected, predictable IPs associated with the gateway nodes. This feature can be used with legacy firewalls to allow traffic to legacy infrastructure only from specific pods within a given namespace. These pods typically have ever-changing IP addresses. Even if masquerading were to be used to mitigate this, the IP addresses of nodes can also change frequently over time.

Egress IP in Cilium 1.10

For example, with the following resource:

apiVersion: isovalent.com/v1
kind: IsovalentEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  destinationCIDRs:
  - "192.168.11.0/24"
  selectors:
  - podSelector: {}
  egressGroups:
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"

All pods that match the label=egress-node will be routed through the 192.168.11.4 IP when they reach out to any address in the 192.168.11.0/24 IP range, which is outside the cluster.

What is Isovalent Enterprise for Cilium?

Isovalent Enterprise for Cilium is an enterprise-grade, hardened distribution of open-source projects CiliumHubble, and Tetragon, built and supported by the Cilium creators. Cilium enhances networking and security at the network layer, while Hubble ensures thorough network observability and tracing. Tetragon ties it all together with runtime enforcement and security observability, offering a well-rounded solution for connectivity, compliance, multi-cloud, and security concerns.

Why Isovalent Enterprise for Cilium?

While Egress Gateway in Cilium is a great step forward, most enterprise environments should not rely on a single point of failure for network routing. For this reason, Isovalent introduced Egress Gateway High Availability (HA), which supports multiple egress nodes. The nodes acting as egress gateways will then load-balance traffic in a round-robin fashion and provide fallback nodes in case one or more egress nodes fail.

The multiple egress nodes can be configured using a egressGroups parameter in the IsovalentEgressGatewayPolicy resource specification that we will detail in Scenario 2 in the tutorial below.

Pre-Requisites

The following prerequisites need to be taken into account before you proceed with this tutorial:

  • An Azure account with an active subscription- Create an account for free
  • Azure CLI version 2.48.1 or later. Run az --version to see the currently installed version. If you need to install or upgrade, see Install Azure CLI.
  • If using ARM templates or the REST API, the AKS API version must be 2022-09-02-preview or later.
  • The kubectl command line tool is installed on your device. The version can be the same as or up to one minor version earlier or later than the Kubernetes version of your cluster. For example, if your cluster version is 1.26, you can use kubectl version 1.25, 1.26, or 1.27 with it. To install or upgrade kubectl, see Installing or updating kubectl.
  • Install Cilium CLI.

Limitations to keep in mind?

You must remember certain limitations, which will be added over time.

  • The Egress gateway feature is partially incompatible with L7 policies.
    • Specifically, when an egress gateway policy and an L7 policy both select the same endpoint, traffic from that endpoint does not go through the egress gateway, even if the policy allows it.
  • Egress Gateway is incompatible with Isovalent’s Cluster Mesh feature.

Which network plugin can I use for Egress Gateway in AKS?

Considering two scenarios, we will create an Azure Kubernetes (AKS) Cluster with Bring Your Own CNI (BYOCNI) as the network plugin for this tutorial.

Scenario 1- Egress Gateways in a single Availability Zone.

Pre-Requisites:

  • The AKS cluster is created in VNET A, subnet A
  • The Egress Gateway is created in VNET A, subnet B
    • VNET= 192.168.8.0/22
    • Subnet A= 192.168.10.0/24
    • Subnet B= 192.168.11.0/24
  • A test VM is created in VNET A, subnet B

Set the subscription

Choose the subscription you want to use if you have multiple Azure subscriptions.

  • Replace SubscriptionName with your subscription name.
  • You can also use your subscription ID instead of your subscription name.
az account set --subscription SubscriptionName

AKS Cluster creation

Create an AKS cluster with the network plugin as BYOCNI.

az group create -l eastus -n byocni

az network vnet create -g byocni --location canadacentral --name byocni-vnet --address-prefixes 192.168.8.0/22 -o none

az network vnet subnet create -g byocni --vnet-name byocni-vnet --name byocni-subnet --address-prefixes 192.168.10.0/24 -o none 

az network vnet subnet create -g byocni --vnet-name byocni-vnet --name egressgw-subnet --address-prefixes 192.168.11.0/24 -o none 

az aks create -l eastus -g byocni -n byocni --network-plugin none --vnet-subnet-id /subscriptions/#############################/resourceGroups/byocni/providers/Microsoft.Network/virtualNetworks/byocni-vnet/subnets/byocni-subnet

az aks get-credentials --resource-group byocni --name byocni

Note- You can also create an AKS cluster with BYOCNI using Terraform.

Create an unmanaged AKS nodepool in a different subnet.

Create an AKS nodepool in the egressgw-subnet (created in the previous step).

az aks nodepool add -g byocni --cluster-name byocni -n egressgw --enable-node-public-ip --node-count 1  --vnet-subnet-id /subscriptions/###############################/resourceGroups/byocni/providers/Microsoft.Network/virtualNetworks/byocni-vnet/subnets/egressgw-subnet

Assign a label to the unmanaged nodepool

  • Create a node pool with a label and specify a name for the --name parameters and labels for the --labels Parameter. Labels must be a key/value pair and have a valid syntax.
az aks nodepool update --resource-group byocni --cluster-name byocni --name egressgw --labels io.cilium/egress-gateway=true
  • Check the status of the nodes.
kubectl get nodes -o wide

NAME                               STATUS   ROLES    AGE     VERSION   INTERNAL-IP    EXTERNAL-IP     OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-byocni-35717205-vmss000000     Ready    <none>   5h34m   v1.29.2   192.168.10.4   <none>          Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1
aks-byocni-35717205-vmss000001     Ready    <none>   5h34m   v1.29.2   192.168.10.5   <none>          Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1
aks-egressgw-36500661-vmss000000   Ready    <none>   3h52m   v1.29.2   192.168.11.5   52.156.19.241   Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1
aks-egressgw-36500661-vmss000001   Ready    <none>   3h52m   v1.29.2   192.168.11.6   53.172.12.120   Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1
  • Note- this doesn’t create a new NIC. It means traffic from the client pod is EGW-redirected to the egress-node: “true” node’s eth0 192.168.11.5, and from there, it’s also automatically NATed to the node’s assigned public IP.

Install Isovalent Enteprise for Cilium

--set egressGateway.enabled=true \
--set enterprise.egressGatewayHA.enabled=true \
--set bpf.masquerade=true \
--set kubeProxyReplacement=true \
--set l7Proxy=false

Restart Cilium Operator and Cilium Daemonset

  • Restart the cilium operator and cilium daemonset for egress gateway changes to take effect.
kubectl rollout restart ds cilium -n kube-system
kubectl rollout restart deploy cilium-operator -n kube-system
  • Check the status of the pods.
kubectl get pods -o wide -A

NAMESPACE     NAME                                  READY   STATUS    RESTARTS        AGE     IP             NODE                               NOMINATED NODE   READINESS GATES
default       busybox                               1/1     Running   0               3h37m   10.0.0.165     aks-byocni-35717205-vmss000000     <none>           <none>
default       server                                1/1     Running   0               3h38m   10.0.1.84      aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   cilium-9q4fx                          1/1     Running   0               3h35m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   cilium-gjvft                          1/1     Running   0               3h34m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   cilium-mlxhq                          1/1     Running   0               3h35m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   cilium-node-init-kd85n                1/1     Running   0               3h55m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   cilium-node-init-p74df                1/1     Running   0               5h27m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   cilium-node-init-t5mrn                1/1     Running   0               5h27m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   cilium-operator-7d84bcbbc8-9rxzj      1/1     Running   0               3h34m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   cilium-operator-7d84bcbbc8-rrq2f      1/1     Running   0               3h34m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   cloud-node-manager-288rv              1/1     Running   0               5h38m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   cloud-node-manager-8hv5x              1/1     Running   0               3h55m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   cloud-node-manager-xt222              1/1     Running   0               5h38m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   coredns-5b97789cf4-b27vp              1/1     Running   0               5h26m   10.0.1.105     aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   coredns-5b97789cf4-s4mbm              1/1     Running   0               5h38m   10.0.0.189     aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   coredns-autoscaler-7c88465478-mccff   1/1     Running   7 (4h36m ago)   5h38m   10.0.0.251     aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   csi-azuredisk-node-75bnm              3/3     Running   0               3h55m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   csi-azuredisk-node-85hrg              3/3     Running   0               5h38m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   csi-azuredisk-node-9h9s2              3/3     Running   0               5h38m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   csi-azurefile-node-dwqsk              3/3     Running   0               5h38m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   csi-azurefile-node-lwkrn              3/3     Running   0               3h55m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   csi-azurefile-node-tddf5              3/3     Running   0               5h38m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   konnectivity-agent-844cd49468-dqdkj   1/1     Running   0               4h49m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   konnectivity-agent-844cd49468-tmw6l   1/1     Running   0               4h49m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   kube-proxy-85rm2                      1/1     Running   0               3h55m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   kube-proxy-qsjxx                      1/1     Running   0               5h38m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   kube-proxy-tnrh2                      1/1     Running   0               5h38m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   metrics-server-6bb9c967d6-5cwnh       2/2     Running   0               3h52m   10.0.1.253     aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   metrics-server-6bb9c967d6-8sdft       2/2     Running   0               3h52m   10.0.1.100     aks-byocni-35717205-vmss000001     <none>           <none>

Create an Egress Gateway Policy

  • The API provided by Isovalent to drive the Egress Gateway feature is the IsovalentEgressGatewayPolicy resource.
  • The selectors field of an IsovalentEgressGatewayPolicy resource is used to select source pods via a label selector. This can be done using matchLabels:
selectors:
- podSelector:
    matchLabels:
      labelKey: labelVal
  • One or more destination CIDRs can be specified with destinationCIDRs:
destinationCIDRs:
- "a.b.c.d/32"
- "e.f.g.0/24"
  • The group of nodes that should act as gateway nodes for a given policy can be configured with the egressGroups field. Nodes are matched based on their labels, with the nodeSelector field:
egressGroups:
- nodeSelector:
    matchLabels:
      testLabel: testVal
  • Sample policy as below:
apiVersion: isovalent.com/v1
kind: IsovalentEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  destinationCIDRs:
  - "192.168.11.0/24"
  selectors:
  - podSelector: {}
  egressGroups:
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"

Testing Egress Gateway

  • Deploy a client pod and apply the IsovalentEgressGatewayPolicy, and observe that the pod’s connection gets redirected through the Gateway node.
  • The client pod gets deployed to one of the two nodes (managed), and the IEGP (Isovalent Egress Gateway Policy) selects one or both the nodes ( depending on the egress gateway IPs specified) as the Gateway node.
  • Sample client pod yaml:
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: busybox
  name: busybox
spec:
  containers:
  - image: nginx
    name: nginx
    command:
    - /bin/sh
    - -c
    securityContext:
      capabilities:
        add:
          - NET_ADMIN  # Add the cap_net_admin capability
    env:
    - name: EGRESS_IPS
      value: 192.168.11.5/24, 192.168.11.4/24
    resources: {}
  dnsPolicy: ClusterFirst
  nodeSelector:
    kubernetes.io/hostname: aks-byocni-35717205-vmss000000
  restartPolicy: Always
status: {}
  • Create the client pod and check that it’s up and running and pinned on one of the worker nodes as specified in the yaml file for the client pod.
kubectl apply -f busyboxegressgw.yaml

kubectl get pods -o wide
NAME      READY   STATUS    RESTARTS   AGE     IP           NODE                             NOMINATED NODE   READINESS GATES
busybox   1/1     Running   0          3h53m   10.0.0.165   aks-byocni-35717205-vmss000000   <none>           <none> 
Apply an Egress Gateway Policy
  • Apply an Egress Gateway Policy
apiVersion: isovalent.com/v1
kind: IsovalentEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  destinationCIDRs:
  - "192.168.11.0/24"
  selectors:
  - podSelector: {}
  egressGroups:
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"
Label the Egress Gateway Node

To let the policy select the node designated as the Egress Gateway, apply the label, egress-node:true to it:

kubectl label nodes aks-egressgw-36500661-vmss00000 egress-node=true

kubectl get nodes -o wide --show-labels=true | grep egress-node

aks-egressgw-36500661-vmss000000   Ready    <none>   3h57m   v1.29.2   192.168.11.5   52.156.19.241   Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1   agentpool=egressgw,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_DS2_v2,beta.kubernetes.io/os=linux,egress-node=true,failure-domain.beta.kubernetes.io/region=canadacentral,failure-domain.beta.kubernetes.io/zone=0,io.cilium/egress-gateway=true,kubernetes.azure.com/agentpool=egressgw,kubernetes.azure.com/cluster=MC_byocni_byocni_canadacentral,kubernetes.azure.com/consolidated-additional-properties=04f7d3de-0602-11ef-bb36-22c8fb861105,kubernetes.azure.com/kubelet-identity-client-id=###############################,kubernetes.azure.com/mode=user,kubernetes.azure.com/node-image-version=AKSUbuntu-2204gen2containerd-202404.09.0,kubernetes.azure.com/nodepool-type=VirtualMachineScaleSets,kubernetes.azure.com/os-sku=Ubuntu,kubernetes.azure.com/role=agent,kubernetes.azure.com/storageprofile=managed,kubernetes.azure.com/storagetier=Premium_LRS,kubernetes.io/arch=amd64,kubernetes.io/hostname=aks-egressgw-36500661-vmss000000,kubernetes.io/os=linux,node.kubernetes.io/instance-type=Standard_DS2_v2,storageprofile=managed,storagetier=Premium_LRS,topology.disk.csi.azure.com/zone=,topology.kubernetes.io/region=canadacentral,topology.kubernetes.io/zone=0
Create a test VM in the Egress Gateway subnet.
  • Create a VM in the same subnet as Egress Gateway and run a simple service on port 80 (like NGINX) that will respond to traffic sent from a pod on one of the worker nodes.
  • Test VM IP, in this case, is 192.168.11.4
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:22:48:3c:63:51 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.4/24 brd 192.168.11.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::222:48ff:fe3c:6351/64 scope link
       valid_lft forever preferred_lft forever
Traffic Generation (towards the server in Egress GW subnet)

Send traffic toward the test VM.

kubectl exec busybox -- curl -I 192.168.11.4
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0   612    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
HTTP/1.1 200 OK
Server: nginx/1.18.0 (Ubuntu)
Date: Mon, 29 Apr 2024 12:44:07 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Mon, 29 Apr 2024 08:55:12 GMT
Connection: keep-alive
ETag: "662f6070-264"
Accept-Ranges: byte
Traffic Generation (outside of the cluster towards the Internet)
  • Send traffic to a public service.
    • Note the IP it returns is the egress gateway node’s Public IP.
kubectl exec busybox -- curl ifconfig.me
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    13  100    13    0     0    181      0 --:--:-- --:--:-- --:--:--   183
52.156.19.241
  • Take a tcpdump from one of the egress gateway nodes.
    • Install tcpdump on the egress gateway node via apt-get install tcpdump
    • As you can see 10.0.0.165 is the client-pod IP that the egress gateway node is receiving packets from and 192.168.11.5 is the egress gateway node’s eth0 IP address.
IP 34.117.118.44.80 > 10.0.0.165.45468: Flags [S.], seq 2883623483, ack 1878158107, win 65535, options [mss 1412,sackOK,TS val 1551427192 ecr 3654296066,nop,wscale 8], length 0
IP 10.0.0.165.45468 > 34.117.118.44.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 3654296076 ecr 1551427192], length 0
13:09:10.372213 IP 192.168.11.5.45468 > 34.117.118.44.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 3654296076 ecr 1551427192], length 0
IP 10.0.0.165.45468 > 34.117.118.44.80: Flags [P.], seq 1:76, ack 1, win 507, options [nop,nop,TS val 3654296077 ecr 1551427192], length 75: HTTP: GET / HTTP/1.1
13:09:10.372313 IP 192.168.11.5.45468 > 34.117.118.44.80: Flags [P.], seq 1:76, ack 1, win 507, options [nop,nop,TS val 3654296077 ecr 1551427192], length 75: HTTP: GET / HTTP/1.1
13:09:10.380927 IP 34.117.118.44.80 > 192.168.11.5.45468: Flags [.], ack 76, win 256, options [nop,nop,TS val 1551427202 ecr 3654296077], length 0
IP 34.117.118.44.80 > 10.0.0.165.45468: Flags [.], ack 76, win 256, options [nop,nop,TS val 1551427202 ecr 3654296077], length 0
13:09:10.430485 IP 34.117.118.44.80 > 192.168.11.5.45468: Flags [P.], seq 1:183, ack 76, win 256, options [nop,nop,TS val 1551427251 ecr 3654296077], length 182: HTTP: HTTP/1.1 200 OK
IP 34.117.118.44.80 > 10.0.0.165.45468: Flags [P.], seq 1:183, ack 76, win 256, options [nop,nop,TS val 1551427251 ecr 3654296077], length 182: HTTP: HTTP/1.1 200 OK
IP 10.0.0.165.45468 > 34.117.118.44.80: Flags [.], ack 183, win 506, options [nop,nop,TS val 3654296135 ecr 1551427251], length 0
13:09:10.430983 IP 192.168.11.5.45468 > 34.117.118.44.80: Flags [.], ack 183, win 506, options [nop,nop,TS val 3654296135 ecr 1551427251], length 0
13:09:10.434396 IP 192.168.10.4.44693 > 192.168.11.5.8472: OTV, flags [I] (0x08), overlay 0, instance 53596
IP 10.0.0.165.45468 > 34.117.118.44.80: Flags [F.], seq 76, ack 183, win 506, options [nop,nop,TS val 3654296139 ecr 1551427251], length 0
13:09:10.434476 IP 192.168.11.5.45468 > 34.117.118.44.80: Flags [F.], seq 76, ack 183, win 506, options [nop,nop,TS val 3654296139 ecr 1551427251], length 0
13:09:10.443468 IP 34.117.118.44.80 > 192.168.11.5.45468: Flags [F.], seq 183, ack 77, win 256, options [nop,nop,TS val 1551427264 ecr 3654296139], length 0
IP 34.117.118.44.80 > 10.0.0.165.45468: Flags [F.], seq 183, ack 77, win 256, options [nop,nop,TS val 1551427264 ecr 3654296139], length 0
IP 10.0.0.165.45468 > 34.117.118.44.80: Flags [.], ack 184, win 506, options [nop,nop,TS val 3654296148 ecr 1551427264], length 0
13:09:10.443787 IP 192.168.11.5.45468 > 34.117.118.44.80: Flags [.], ack 184, win 506, options [nop,nop,TS val 3654296148 ecr 1551427264], length 0

Scenario 2- Egress Gateways in a Multi-Availability Zone environment.

Geo redundancy across availability zones is a must, and combined with HA for the Egress GW, it is a solution that enterprises are always willing to consider.

Pre-Requisites:

  • The AKS cluster is created in VNET A, subnet A
  • The Egress Gateway is created in VNET A, subnet B
    • VNET= 192.168.8.0/22
    • Subnet A= 192.168.10.0/24
    • Subnet B= 192.168.11.0/24
  • A test VM is created in VNET A, subnet B

Set the subscription

Choose the subscription you want to use if you have multiple Azure subscriptions.

  • Replace SubscriptionName with your subscription name.
  • You can also use your subscription ID instead of your subscription name.
az account set --subscription SubscriptionName

AKS cluster creation with nodepools across AZ’s

Create an AKS cluster with the network plugin as BYOCNI and nodepools across different Availability Zones.

az group create -l eastus -n byocni

az network vnet create -g byocni --location canadacentral --name byocni-vnet --address-prefixes 192.168.8.0/22 -o none

az network vnet subnet create -g byocni --vnet-name byocni-vnet --name byocni-subnet --address-prefixes 192.168.10.0/24 -o none 

az network vnet subnet create -g byocni --vnet-name byocni-vnet --name egressgw-subnet --address-prefixes 192.168.11.0/24 -o none 

az aks create -l eastus -g byocni -n byocni --network-plugin none --vm-set-type VirtualMachineScaleSets --zones 1 2 --vnet-subnet-id /subscriptions/###############################/resourceGroups/byocni/providers/Microsoft.Network/virtualNetworks/byocni-vnet/subnets/byocni-subnet

az aks get-credentials --resource-group byocni --name byocni

Create an unmanaged AKS nodepool in a different subnet.

Create an AKS nodepool in the egressgw-subnet (created in the previous step).

az aks nodepool add -g byocni --cluster-name byocni -n egressgw --enable-node-public-ip --node-count 2  --vnet-subnet-id /subscriptions/#######################################/resourceGroups/byocni/providers/Microsoft.Network/virtualNetworks/byocni-vnet/subnets/egressgw-subnet --vm-set-type VirtualMachineScaleSets --zones 1 2

Assign a label to the unmanaged nodepool

  • Create a node pool with a label and specify a name for the --name parameters and labels for the --labels Parameter. Labels must be a key/value pair and have a valid syntax.
az aks nodepool update --resource-group byocni --cluster-name byocni --name egressgw --labels io.cilium/egress-gateway=true
  • Check the status of the nodes.
kubectl get nodes -o wide

NAME                               STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP     OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-byocni-22760683-vmss000000     Ready    <none>   16d   v1.29.2   192.168.10.4   <none>          Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1
aks-byocni-22760683-vmss000001     Ready    <none>   16d   v1.29.2   192.168.10.5   <none>          Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1
aks-egressgw-27814974-vmss000000   Ready    <none>   15d   v1.29.2   192.168.11.4   20.151.98.78    Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1
aks-egressgw-27814974-vmss000001   Ready    <none>   15d   v1.29.2   192.168.11.5   4.172.207.202   Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1
  • Note- this doesn’t create a new NIC. It means traffic from the client pod is EGW-redirected to the egress-node: “true” node’s eth0 192.168.11.5 or 192.168.11.4, and from there, it’s also automatically NATed to the node assigned public IP.
  • Check that all nodes have been created in different Availability Zones.
kubectl describe nodes | grep -e "Name:" -e "topology.kubernetes.io/zone"

Name:               aks-byocni-22760683-vmss000000
                    topology.kubernetes.io/zone=canadacentral-2
Name:               aks-byocni-22760683-vmss000001
                    topology.kubernetes.io/zone=canadacentral-1
Name:               aks-egressgw-27814974-vmss000000
                    topology.kubernetes.io/zone=canadacentral-1
Name:               aks-egressgw-27814974-vmss000001
                    topology.kubernetes.io/zone=canadacentral-2

Install Isovalent Enteprise for Cilium

--set egressGateway.enabled=true \
--set enterprise.egressGatewayHA.enabled=true \
--set bpf.masquerade=true \
--set kubeProxyReplacement=true \
--set l7Proxy=false

Restart Cilium Operator and Cilium Daemonset

  • Restart the cilium operator and cilium daemonset for egress gateway changes to take effect.
kubectl rollout restart ds cilium -n kube-system
kubectl rollout restart deploy cilium-operator -n kube-system
  • Check the status of the pods.
kubectl get pods -o wide -A

NAMESPACE     NAME                                  READY   STATUS    RESTARTS        AGE     IP             NODE                               NOMINATED NODE   READINESS GATES
default       busybox                               1/1     Running   0               3h37m   10.0.0.165     aks-byocni-35717205-vmss000000     <none>           <none>
default       server                                1/1     Running   0               3h38m   10.0.1.84      aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   cilium-9q4fx                          1/1     Running   0               3h35m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   cilium-gjvft                          1/1     Running   0               3h34m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   cilium-mlxhq                          1/1     Running   0               3h35m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   cilium-node-init-kd85n                1/1     Running   0               3h55m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   cilium-node-init-p74df                1/1     Running   0               5h27m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   cilium-node-init-t5mrn                1/1     Running   0               5h27m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   cilium-operator-7d84bcbbc8-9rxzj      1/1     Running   0               3h34m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   cilium-operator-7d84bcbbc8-rrq2f      1/1     Running   0               3h34m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   cloud-node-manager-288rv              1/1     Running   0               5h38m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   cloud-node-manager-8hv5x              1/1     Running   0               3h55m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   cloud-node-manager-xt222              1/1     Running   0               5h38m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   coredns-5b97789cf4-b27vp              1/1     Running   0               5h26m   10.0.1.105     aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   coredns-5b97789cf4-s4mbm              1/1     Running   0               5h38m   10.0.0.189     aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   coredns-autoscaler-7c88465478-mccff   1/1     Running   7 (4h36m ago)   5h38m   10.0.0.251     aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   csi-azuredisk-node-75bnm              3/3     Running   0               3h55m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   csi-azuredisk-node-85hrg              3/3     Running   0               5h38m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   csi-azuredisk-node-9h9s2              3/3     Running   0               5h38m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   csi-azurefile-node-dwqsk              3/3     Running   0               5h38m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   csi-azurefile-node-lwkrn              3/3     Running   0               3h55m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   csi-azurefile-node-tddf5              3/3     Running   0               5h38m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   konnectivity-agent-844cd49468-dqdkj   1/1     Running   0               4h49m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   konnectivity-agent-844cd49468-tmw6l   1/1     Running   0               4h49m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   kube-proxy-85rm2                      1/1     Running   0               3h55m   192.168.11.5   aks-egressgw-36500661-vmss000000   <none>           <none>
kube-system   kube-proxy-qsjxx                      1/1     Running   0               5h38m   192.168.10.4   aks-byocni-35717205-vmss000000     <none>           <none>
kube-system   kube-proxy-tnrh2                      1/1     Running   0               5h38m   192.168.10.5   aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   metrics-server-6bb9c967d6-5cwnh       2/2     Running   0               3h52m   10.0.1.253     aks-byocni-35717205-vmss000001     <none>           <none>
kube-system   metrics-server-6bb9c967d6-8sdft       2/2     Running   0               3h52m   10.0.1.100     aks-byocni-35717205-vmss000001     <none>           <none>

Create an Egress Gateway Policy

The API provided by Isovalent to drive the Egress Gateway feature is the IsovalentEgressGatewayPolicy resource.

apiVersion: isovalent.com/v1
kind: IsovalentEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  destinationCIDRs:
  - "192.168.11.0/24"
  selectors:
  - podSelector: {}
  egressGroups:
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"

Testing Egress Gateway

  • Deploy a client pod and apply the IsovalentEgressGatewayPolicy, and observe that the pod’s connection gets redirected through the Gateway node.
  • The client pod gets deployed to one of the two nodes (managed), and the IEGP (Isovalent Egress Gateway Policy) selects one or both the nodes ( depending on the egress gateway IPs specified) as the Gateway node.
  • Sample client pod yaml:
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: busybox
  name: busybox
spec:
  containers:
  - image: nginx
    name: nginx
    command:
    - /bin/sh
    - -c
    securityContext:
      capabilities:
        add:
          - NET_ADMIN  # Add the cap_net_admin capability
    env:
    - name: EGRESS_IPS
      value: 192.168.11.5/24, 192.168.11.4/24
    resources: {}
  dnsPolicy: ClusterFirst
  nodeSelector:
    kubernetes.io/hostname: aks-byocni-22760683-vmss000000
  restartPolicy: Always
status: {}
  • Create the client pod and check that it’s up and running and pinned on one of the worker nodes as specified in the yaml file for the client pod.
kubectl apply -f busyboxegressgw.yaml

kubectl get pods -o wide
NAME      READY   STATUS    RESTARTS   AGE     IP           NODE                             NOMINATED NODE   READINESS GATES
busybox   1/1     Running   0          3h53m   10.0.0.241   aks-byocni-22760683-vmss000000   <none>           <none> 
Apply an Egress Gateway Policy
  • Apply an Egress Gateway Policy
apiVersion: isovalent.com/v1
kind: IsovalentEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  destinationCIDRs:
  - "192.168.11.0/24"
  selectors:
  - podSelector: {}
  egressGroups:
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"
Label the Egress Gateway Node

To let the policy select the node designated as the Egress Gateway, apply the label, egress-node:true to it:

kubectl label nodes aks-egressgw-27814974-vmss000000 egress-node=true
node/aks-egressgw-27814974-vmss000000 labeled

kubectl label nodes aks-egressgw-27814974-vmss000001 egress-node=true
node/aks-egressgw-27814974-vmss000001 labeled

kubectl get nodes -o wide --show-labels=true | grep egress-node
aks-egressgw-27814974-vmss000000   Ready    <none>   15d   v1.29.2   192.168.11.4   20.151.98.78    Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1   agentpool=egressgw,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_DS2_v2,beta.kubernetes.io/os=linux,egress-node=true,failure-domain.beta.kubernetes.io/region=canadacentral,failure-domain.beta.kubernetes.io/zone=canadacentral-1,io.cilium/egress-gateway=true,kubernetes.azure.com/agentpool=egressgw,kubernetes.azure.com/cluster=MC_byocni_byocni_canadacentral,kubernetes.azure.com/consolidated-additional-properties=f57efe65-070c-11ef-a4d6-aab4eb6cd74a,kubernetes.azure.com/kubelet-identity-client-id=f22bbec0-4040-4237-b958-6deee20881a3,kubernetes.azure.com/mode=user,kubernetes.azure.com/node-image-version=AKSUbuntu-2204gen2containerd-202404.16.0,kubernetes.azure.com/nodepool-type=VirtualMachineScaleSets,kubernetes.azure.com/os-sku=Ubuntu,kubernetes.azure.com/role=agent,kubernetes.azure.com/storageprofile=managed,kubernetes.azure.com/storagetier=Premium_LRS,kubernetes.io/arch=amd64,kubernetes.io/hostname=aks-egressgw-27814974-vmss000000,kubernetes.io/os=linux,node.kubernetes.io/instance-type=Standard_DS2_v2,storageprofile=managed,storagetier=Premium_LRS,topology.disk.csi.azure.com/zone=canadacentral-1,topology.kubernetes.io/region=canadacentral,topology.kubernetes.io/zone=canadacentral-1
aks-egressgw-27814974-vmss000001   Ready    <none>   15d   v1.29.2   192.168.11.5   4.172.207.202   Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1   agentpool=egressgw,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_DS2_v2,beta.kubernetes.io/os=linux,egress-node=true,failure-domain.beta.kubernetes.io/region=canadacentral,failure-domain.beta.kubernetes.io/zone=canadacentral-2,io.cilium/egress-gateway=true,kubernetes.azure.com/agentpool=egressgw,kubernetes.azure.com/cluster=MC_byocni_byocni_canadacentral,kubernetes.azure.com/consolidated-additional-properties=f57efe65-070c-11ef-a4d6-aab4eb6cd74a,kubernetes.azure.com/kubelet-identity-client-id=f22bbec0-4040-4237-b958-6deee20881a3,kubernetes.azure.com/mode=user,kubernetes.azure.com/node-image-version=AKSUbuntu-2204gen2containerd-202404.16.0,kubernetes.azure.com/nodepool-type=VirtualMachineScaleSets,kubernetes.azure.com/os-sku=Ubuntu,kubernetes.azure.com/role=agent,kubernetes.azure.com/storageprofile=managed,kubernetes.azure.com/storagetier=Premium_LRS,kubernetes.io/arch=amd64,kubernetes.io/hostname=aks-egressgw-27814974-vmss000001,kubernetes.io/os=linux,node.kubernetes.io/instance-type=Standard_DS2_v2,storageprofile=managed,storagetier=Premium_LRS,topology.disk.csi.azure.com/zone=canadacentral-2,topology.kubernetes.io/region=canadacentral,topology.kubernetes.io/zone=canadacentral-2
Create a test VM in the Egress Gateway subnet.
  • Create a VM in the same subnet as Egress Gateway and run a simple service on port 80 (like NGINX) that will respond to traffic sent from a pod on one of the worker nodes.
  • Test VM IP, in this case, is 192.168.11.4
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:22:48:3c:63:51 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.4/24 brd 192.168.11.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::222:48ff:fe3c:6351/64 scope link
       valid_lft forever preferred_lft forever
Traffic Generation (towards the server in Egress GW subnet)

Send traffic toward the test VM.

kubectl exec busybox -- curl -I 192.168.11.4
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0   612    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
HTTP/1.1 200 OK
Server: nginx/1.18.0 (Ubuntu)
Date: Mon, 29 Apr 2024 12:44:07 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Mon, 29 Apr 2024 08:55:12 GMT
Connection: keep-alive
ETag: "662f6070-264"
Accept-Ranges: byte
Traffic Generation (outside of the cluster towards the Internet)
  • Send traffic to a public service.
    • Note the IP it returns is the egress gateway node’s Public IP.
kubectl exec busybox -- curl ifconfig.me
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    13  100    13    0     0    161      0 --:--:-- --:--:-- --:--:--   162

4.172.200.224

kubectl exec busybox -- curl ifconfig.me
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    13  100    13    0     0    155      0 --:--:-- --:--:-- --:--:--   15620.63.116.127
  • Take a tcpdump from one of the egress gateway nodes.
    • Install tcpdump on the egress gateway node via apt-get install tcpdump.
    • As you can see 10.0.0.225 is the client-pod IP that the egress gateway node is receiving packets from and 192.168.11.5 is the egress gateway node’s eth0 IP address.
08:15:22.713376 IP 168.63.129.16.53 > 192.168.11.5.34878: 42594 1/0/1 A 34.117.118.44 (56)
IP 168.63.129.16.53 > 10.0.2.21.34878: 42594 1/0/1 A 34.117.118.44 (56)
IP 10.0.0.225.42814 > 34.117.118.44.80: Flags [S], seq 102756168, win 64860, options [mss 1410,sackOK,TS val 2905874533 ecr 0,nop,wscale 7], length 0
08:15:22.716722 IP 192.168.11.5.42814 > 34.117.118.44.80: Flags [S], seq 102756168, win 64860, options [mss 1410,sackOK,TS val 2905874533 ecr 0,nop,wscale 7], length 0
08:15:22.725284 IP 34.117.118.44.80 > 192.168.11.5.42814: Flags [S.], seq 1883155317, ack 102756169, win 65535, options [mss 1412,sackOK,TS val 3464590701 ecr 2905874533,nop,wscale 8], length 0
IP 34.117.118.44.80 > 10.0.0.225.42814: Flags [S.], seq 1883155317, ack 102756169, win 65535, options [mss 1412,sackOK,TS val 3464590701 ecr 2905874533,nop,wscale 8], length 0
IP 10.0.0.225.42814 > 34.117.118.44.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 2905874545 ecr 3464590701], length 0
08:15:22.727367 IP 192.168.11.5.42814 > 34.117.118.44.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 2905874545 ecr 3464590701], length 0
IP 10.0.0.225.42814 > 34.117.118.44.80: Flags [P.], seq 1:76, ack 1, win 507, options [nop,nop,TS val 2905874545 ecr 3464590701], length 75: HTTP: GET / HTTP/1.1
08:15:22.727389 IP 192.168.11.5.42814 > 34.117.118.44.80: Flags [P.], seq 1:76, ack 1, win 507, options [nop,nop,TS val 2905874545 ecr 3464590701], length 75: HTTP: GET / HTTP/1.1
08:15:22.735433 IP 34.117.118.44.80 > 192.168.11.5.42814: Flags [.], ack 76, win 256, options [nop,nop,TS val 3464590712 ecr 2905874545], length 0
IP 34.117.118.44.80 > 10.0.0.225.42814: Flags [.], ack 76, win 256, options [nop,nop,TS val 3464590712 ecr 2905874545], length 0
08:15:22.765735 IP 34.117.118.44.80 > 192.168.11.5.42814: Flags [P.], seq 1:183, ack 76, win 256, options [nop,nop,TS val 3464590742 ecr 2905874545], length 182: HTTP: HTTP/1.1 200 OK
IP 34.117.118.44.80 > 10.0.0.225.42814: Flags [P.], seq 1:183, ack 76, win 256, options [nop,nop,TS val 3464590742 ecr 2905874545], length 182: HTTP: HTTP/1.1 200 OK
IP 10.0.0.225.42814 > 34.117.118.44.80: Flags [.], ack 183, win 506, options [nop,nop,TS val 2905874585 ecr 3464590742], length 0
IP 10.0.0.225.42814 > 34.117.118.44.80: Flags [F.], seq 76, ack 183, win 506, options [nop,nop,TS val 2905874585 ecr 3464590742], length 0
08:15:22.768788 IP 192.168.11.5.42814 > 34.117.118.44.80: Flags [.], ack 183, win 506, options [nop,nop,TS val 2905874585 ecr 3464590742], length 0
08:15:22.768857 IP 192.168.11.5.42814 > 34.117.118.44.80: Flags [F.], seq 76, ack 183, win 506, options [nop,nop,TS val 2905874585 ecr 3464590742], length 0
08:15:22.777039 IP 34.117.118.44.80 > 192.168.11.5.42814: Flags [F.], seq 183, ack 77, win 256, options [nop,nop,TS val 3464590753 ecr 2905874585], length 0
IP 34.117.118.44.80 > 10.0.0.225.42814: Flags [F.], seq 183, ack 77, win 256, options [nop,nop,TS val 3464590753 ecr 2905874585], length 0
IP 10.0.0.225.42814 > 34.117.118.44.80: Flags [.], ack 184, win 506, options [nop,nop,TS val 2905874596 ecr 3464590753], length 0
08:15:22.778466 IP 192.168.11.5.42814 > 34.117.118.44.80: Flags [.], ack 184, win 506, options [nop,nop,TS val 2905874596 ecr 3464590753], length 0

Availability Zone Affinity

It is possible to control the AZ affinity of the egress gateway traffic with azAffinity. This feature relies on the well-known, topology.kubernetes.io/zone node label to match or prefer gateway nodes within the same AZ of the source pods (“local” gateways) based on the configured mode of operation.

The following modes of operation are available:

  • disabled: This mode uses all the active gateways available, regardless of their AZ. This is the default mode of operation.
    • By taking a tcpdump from both the egress nodes, we can see that the traffic flows across both the egress nodes.
    • sample egress policy for the mode of operation
apiVersion: isovalent.com/v1
kind: IsovalentEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  destinationCIDRs:
  - "192.168.11.0/24"
  selectors:
  - podSelector: {}
  azAffinity: disabled
  egressGroups:
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"
        topology.kubernetes.io/zone: canadacentral-1
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"
        topology.kubernetes.io/zone: canadacentral-2
  • localOnly: This mode selects only local gateways. If no local gateways are available, traffic will not pass through the non-local gateways and will be dropped.
    • By taking a tcpdump from both the egress nodes, we can see that the traffic flows across one of the local gateways.
    • sample egress policy for the mode of operation
apiVersion: isovalent.com/v1
kind: IsovalentEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  destinationCIDRs:
  - "192.168.11.0/24"
  selectors:
  - podSelector: {}
  azAffinity: localOnly
  egressGroups:
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"
        topology.kubernetes.io/zone: canadacentral-1
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"
        topology.kubernetes.io/zone: canadacentral-2
  • localOnlyFirst: This mode selects only local gateways as long as at least one is available in a given AZ. When no more local gateways are available, non-local gateways will be selected.
    • By taking a tcpdump from both the egress nodes, we can see that the traffic flows across one of the local gateways.
    • sample egress policy for the mode of operation
apiVersion: isovalent.com/v1
kind: IsovalentEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  destinationCIDRs:
  - "192.168.11.0/24"
  selectors:
  - podSelector: {}
  azAffinity: localOnlyFirst
  egressGroups:
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"
        topology.kubernetes.io/zone: canadacentral-1
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"
        topology.kubernetes.io/zone: canadacentral-2
  • localPriority: this mode selects all gateways, but local gateways are picked up first. In conjunction with maxGatewayNodes, this can prioritize local gateways over non-local ones, allowing for a graceful fallback to non-local gateways in case the local ones become unavailable.
    • By taking a tcpdump from both the egress nodes, we can see that the traffic flows across one of the local gateways.
    • sample egress policy for the mode of operation
apiVersion: isovalent.com/v1
kind: IsovalentEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  destinationCIDRs:
  - "192.168.11.0/24"
  selectors:
  - podSelector: {}
  azAffinity: localPriority
  egressGroups:
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"
        topology.kubernetes.io/zone: canadacentral-1
  -
    nodeSelector:
      matchLabels:
        egress-node: "true"
        topology.kubernetes.io/zone: canadacentral-2

How can you scale the Egress Gateway solution?

Note- Isovalent support does not approve this solution because the changes will be lost if the nodes are rebooted or upgraded. Users must route the packets for the respective IP addresses they add to the egress gateway node.

  • An AKS cluster in BYOCNI mode with managed and unmanaged nodepools is created with a single NIC.
    • This thus limits the capability for users not to associate more IP addresses if they have more outbound connections to servers/databases spread across multiple subnets or within the same subnet.
  • AKS doesn’t allow the creation of more than 1 NIC on a nodepool, but you can add more IPs on the existing NIC and thus solve this issue to some extent and use either the label interface:ethX or egressIP: x.x.x.x.
    • This is limited to 254 IP addresses per NIC.
# Specify the IP address used for egress.
    # It must exist as an IP associated with a network interface on the instance.
    egressIP: 10.100.255.50
  -
    # Specify the node or set of nodes that should be part of this egress group.
    # When 'interface' is specified this node selector can target multiple nodes.
    nodeSelector:
        matchLabels:
            node.kubernetes.io/pool: wg-2
            topology.kubernetes.io/zone: canadacentral
    # Specify the interface to be used for egress traffic.
    # A single IP address is expected to be associated with this interface in each node.
    interface: eth1
  • Update the existing nodepool where the egress gateway has been created (in either scenario 1 or scenario 2)
az vmss update --resource-group MC_byocni_byocni_canadacentral --name aks-egressgw-27814974-vmss  --add virtualMachineProfile.networkProfile.networkInterfaceConfigurations[0].ipConfigurations '{"name": "config-2", "primary": false, "privateIpAddressVersion": "IPv4", "publicIpAddressConfiguration": null, "subnet": {"id": "/subscriptions/#######################################################/resourceGroups/byocni/providers/Microsoft.Network/virtualNetworks/byocni-vnet/subnets/egressgw-subnet", "resourceGroup": "MC_byocni_byocni_canadacentral"}}'
  • Once the update goes through, based on the host OS on the nodepool, you need to add the respective IP addresses on the host.
    • In this case, it was Ubuntu 22.04, for which we will use netplan for OS network management.
  • Verify that the changes are in place.
root@aks-egressgw-27814974-vmss000000:/# ip a | grep eth0

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 192.168.11.4/24 metric 100 brd 192.168.11.255 scope global eth0
    inet 192.168.11.100/24 brd 192.168.11.255 scope global secondary eth0
  • You can now add egress gateway policies for the new IP addresses that have been added and scale the solution.

Common questions for Egress Gateway?

  • How is the traffic encapsulated from the worker node to the egress node?
    • The traffic is encapsulated from a worker node to an egress node regardless of the tunnel mode, and in this case, the AKS cluster with BYOCNI uses VXLAN as the encapsulation.
  • How can you find the identity of the source endpoint if the traffic is encapsulated?
    • VNI of the VXLAN header equals the Identity of a source endpoint. In this case, the VNI maps to 53596
    • You can then track the identity using the Cilium CLI, which indicates that it’s the busybox Pod.
kubectl -n kube-system exec ds/cilium -- cilium identity get 53596

Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
ID      LABELS
53596   k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default
        k8s:io.cilium.k8s.policy.cluster=default
        k8s:io.cilium.k8s.policy.serviceaccount=default
        k8s:io.kubernetes.pod.namespace=default
        k8s:run=busybox
  • How can you find the identity of the remote endpoint if the traffic is encapsulated?
    • Traffic is encapsulated over VXLAN from the server to the busy box pod behind one of the worker nodes.
    • In this case, VNI=6 is the identity of the server VM called remote-node by Cilium.
    • You can then track the identity using the Cilium CLI, indicating it’s the remote node.
kubectl -n kube-system exec ds/cilium -- cilium identity get 6

Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
ID   LABELS
6    reserved:remote-node

Conclusion

Hopefully, this post gave you a good overview of deploying Cilium and Egress Gateway in AKS (Azure Kubernetes Service) using BYOCNI as the network plugin. If you have any feedback on the solution, please share it with us. Talk to us, and let’s see how Isovalent can help with your use case.

Try it out

Start with the Egress Gateway lab and explore Egress Gateway in action.

Further Reading

To dive deeper into the topic of Egress Gateway, check out these two videos:

Amit Gupta
AuthorAmit GuptaSenior Technical Marketing Engineer

Related

Blogs

Cilium in Azure Kubernetes Service (AKS)

In this tutorial, users will learn how to deploy Isovalent Enterprise for Cilium on your AKS cluster from Azure Marketplace on a new cluster and also upgrade an existing cluster from an AKS cluster running Azure CNI powered by Cilium to Isovalent Enterprise for Cilium.

By
Amit Gupta
Blogs

Enabling Enterprise features for Cilium in Azure Kubernetes Service (AKS)

In this tutorial, you will learn how to enable Enterprise features (Layer-3, 4 & 7 policies, DNS-based policies, and observe the Network Flows using Hubble-CLI) in an Azure Kubernetes Service (AKS) cluster running Isovalent Enterprise for Cilium.

By
Amit Gupta

Industry insights you won’t delete. Delivered to your inbox weekly.