Back to blog

How to Deploy Cilium and Egress Gateway in Elastic Kubernetes Service (EKS)

Raphaël Pinson
Raphaël Pinson
Amit Gupta
Amit Gupta
Published: Updated: Isovalent
Integrating Kubernetes into Traditional Infrastructure with HA Egress Gateway

Kubernetes changes the way we think about networking. In an ideal Kubernetes world, the network would be flat, and the Pod network would control all routing and security between the applications using Network Policies. In many Enterprise environments, though, the applications hosted on Kubernetes need to communicate with workloads outside the Kubernetes cluster, subject to connectivity constraints and security enforcement. Because of the nature of these networks, traditional firewalling usually relies on static IP addresses (or at least IP ranges). This can make it difficult to integrate a Kubernetes cluster, which has a varying and at times dynamic number of nodes, into such a network. Cilium’s Egress Gateway feature changes this by allowing you to specify which nodes should be used by a pod to reach the outside world. This blog will walk you through deploying Cilium and Egress Gateway in Elastic Kubernetes Service (EKS).

What is an Egress Gateway?

The Egress Gateway feature was first introduced in Cilium 1.10.

Egress IP in Cilium 1.10

Egress Gateway allows one to specify a single Egress IP to go through when a pod reaches out to one or more CIDRs outside the cluster.

Note- The blog caters to the 1.14 release of Isovalent Enterprise for Cilium. If you are on the open-source version of Cilium and would like to migrate/upgrade to Isovalent Enterprise for Cilium, you can contact sales@isovalent.com.

For example, with the following resource:

apiVersion: isovalent.com/v1
kind:IsovalentEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  egress:
    - podSelector:
      matchLabels:
        app: test-app
  destinationCIDRs:
    - 1.2.3.0/24
  egressSourceIP: 10.1.2.1

All pods that match the app=test-app label will be routed through the 10.1.2.1 IP when they reach out to any address in the 1.2.3.0/24 IP range —which is outside the cluster.

How can you achieve High Availability for an Egress Gateway?

While Egress Gateway in Cilium is a great step forward, most enterprise environments should not rely on a single point of failure for network routing. For this reason, Cilium Enterprise 1.11 introduced Egress Gateway High Availability (HA), which supports multiple egress nodes. The nodes acting as egress gateways will then load-balance traffic in a round-robin fashion and provide fallback nodes in case one or more egress nodes fail.

The multiple egress nodes can be configured using an egressGroups parameter in the IsovalentEgressGatewayPolicy resource specification:

apiVersion: isovalent.com/v1
kind:IsovalentEgressGatewayPolicy
metadata:
  name: egress-ha-sample
spec:
  egress:
    - podSelector:
        matchLabels:
          app: test-app
  destinationCIDRs:
    - 1.2.3.0/24
  egressGroups:
    - nodeSelector:
        matchLabels:
          egress-group: egress-group-1
        interface: eth1
        maxGatewayNodes: 2

In this example, all pods matching the app=test-app label and reach out to the 1.2.3.0/24 IP range will be routed through a group of two cluster nodes bearing the egress-group=egress-group-1 label, using the first IP address associated with the eth1 network interface of these nodes.

Note that it is also possible to specify an egress IP address instead of an interface.

Egress animation

Requirements

Cilium must use network-facing interfaces and IP addresses on the designated gateway nodes to send traffic to the Egress nodes. These interfaces and IP addresses must be provisioned and configured by the operator based on their networking environment. The process is highly dependent on said networking environment. For example, in AWS/EKS, and depending on the requirements, this may mean creating one or more Elastic Network Interfaces with one or more IP addresses and attaching them to instances that serve as gateway nodes so that AWS can adequately route traffic flowing from and to the instances. Other cloud providers have similar networking requirements and constructs.

Enabling the egress gateway feature requires enabling both BPF masquerading and the kube-proxy replacement in the Cilium configuration.

High Availability and Health Checks

The operator must specify multiple gateway nodes in a policy to achieve a highly available configuration. In a scenario where multiple gateway nodes are specified, and whenever a given node is detected as unhealthy, Cilium will remove it from the pool of gateway nodes, so traffic stops being forwarded to it. The period used for the health checks can be configured in the Cilium Helm values:

enterprise:
  egressGatewayHA:
    healthcheckTimeout: 1s  # Defaults to '2s'.

HA Egress Gateway walk-through

Setup

For the following feature example walk-through, we set up a monitoring system using Cilium Enterprise 1.10 on top of an AWS EKS cluster.

We are using two containerized apps to monitor traffic:

  • An HTTP echo server that returns the caller’s IP address (similar to ifconfig.me/ip), deployed on an EC2 instance outside of the Kubernetes cluster;
  • A monitoring server sends HTTP requests to the echo server once every 50ms and provides Prometheus metrics on the requests, such as the HTTP load time, the number of HTTP errors, or whether the returned IP is one of the known egress nodes IPs.
Example walk-through architecture

The example walk-through will be running on the Oregon (eu-west-2) AWS region, which features four availability zones labeled eu-west-2aeu-west-2beu-west-2c, and eu-west-2d.

For a proper HA setup, and since our EKS cluster is spread on all four zones, we set up one egress node per zone. Each node has one extra Elastic Network Interface attached to it, with an IP in the VPC’s subnet. Each of these IPs is thus specific to an availability zone:

  • 10.2.200.10 for eu-west-2a;
  • 10.2.201.10 for eu-west-2b;
  • 10.2.202.10 for eu-west-2c;
  • 10.2.203.10 for eu-west-2d.

The nodes have a user data script to retrieve and mount the IP assigned to their zone.

Note that this setup is specific to AWS. On other platforms, a different method would have to be used to assign IPs to the Egress nodes. Prometheus then scraped the monitoring server, and we built a Grafana dashboard based on the metrics.

Grafana dashboard

This dashboard displays:

  • The stacked number of requests per second (about 20req/s in total, with a 50ms delay between requests) per outbound IP;
  • The load times per outbound IP;
  • The % of HTTP errors in total requests.

Note that the graphs display rates over 1 minute, so they do not allow for the measure of unavailability windows reliably. When calculating unavailability in the example walk-through, we relied on the monitor container logs instead.

Adding an Egress Gateway Policy

To demonstrate the HA Egress Gateway feature, we are going to apply the following Cilium Egress Policy to the cluster:

apiVersion: isovalent.com/v1
kind:IsovalentEgressGatewayPolicy
metadata:
  name: egress-gw-policy
spec:
  destinationCIDRs:
    - 10.2.0.0/16
  egress:
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: egress-gw-monitor
          io.kubernetes.pod.namespace: cilium-egress-gateway
  egressGroups:
    - egressIP: 10.2.200.10
      nodeSelector:
        matchLabels:
          io.cilium/egress-gateway: 'true'
          topology.kubernetes.io/zone: eu-west-2a
    - egressIP: 10.2.201.10
      nodeSelector:
        matchLabels:
          io.cilium/egress-gateway: 'true'
          topology.kubernetes.io/zone: eu-west-2b
    - egressIP: 10.2.202.10
      nodeSelector:
        matchLabels:
          io.cilium/egress-gateway: 'true'
          topology.kubernetes.io/zone: eu-west-2c
    - egressIP: 10.2.203.10
      nodeSelector:
        matchLabels:
          io.cilium/egress-gateway: 'true'
          topology.kubernetes.io/zone: eu-west-2d

The podSelector is still very similar to the previous example. The egressGroups parameter is a bit different, though.

With this policy in place, the pods with the app.kubernetes.io/name=egress-gw-monitor label in the cilium-egress-gateway namespace will go through one of the egress nodes when reaching out to any IP in the 10.2.0.0/16 IP range, which includes our echo server instance.

Let’s see what happens when adding the Cilium Egress Policy to the cluster.

Requests per second

Before applying the policy, all traffic reached the echo server directly using the pod’s IP (10.2.3.165, in red) since our cluster uses AWS ENI with direct routing. After applying it, traffic to the echo server goes through one of the 4 elastic IPs set up on the egress nodes in a round-robin fashion.

Load times per IP

Load times depend on the number of hops and distance. Direct connections (red) are faster, followed by connections through the eu-west-2d instance (where the monitor pod is located) and eu-west-2a, where the echo server is located, and finally, the two other zones, which require a hop through a third availability zone.

How good is the resiliency with Egress Gateway(s)?

Now is the time to test the resiliency of the HA setup by rebooting one or more egress instances.

Reboot test

When rebooting a single egress node (in eu-west-2d In this case), a few packets are lost, but the load is quickly rebalanced on the 3 remaining nodes. In our tests using a 1-second health check timeout setting, we recorded timeouts for 3 seconds when the node came down.

Traffic comes back to normal without losses after the node is back up.

Node comes back up

When rebooting three (out of four) egress nodes simultaneously, the timeout period was 3 seconds if all nodes came down simultaneously and up to 6 seconds when they went down in multiple batches. After that, all traffic was re-balanced on the last node until the nodes returned. The same happens if we terminate the nodes and let AWS autoscaling spawn up new nodes (though the nodes take longer to come back up).

How can you scale the Egress Gateway solution?

  • AWS announced the general availability of VPC Route Server, which simplifies dynamic routing between virtual appliances in your Amazon VPC. Route Server allows you to advertise routing information through Border Gateway Protocol (BGP) from virtual appliances and dynamically update the VPC route tables associated with subnets and the internet gateway—source aws.amazon.com.
  • AWS Route Server allows Cilium to dynamically advertise prefixes through BGP sessions and insert them into the VPC route tables.
  • This has the following benefits:
    • It automates advertisements of prefixes without additional scripting or functions.
    • Any routable IPs or CIDRs that belong to the customer can be dynamically advertised.
    • A resilient floating IP can be achieved in the active/standby and active/active scenarios.
    • A routing policy can be applied through BGP.
    • Traffic must be switched quickly to the new path. Using BGP keepalives, this takes about 3 seconds, but it can be lower using BFD.
  • A drawback of the feature is the additional cost per endpoint.
  • Prior to the new feature, static entries have to be added for IP/CIDR to the VPC route tables either manually or through scripts/functions.

Scenario

A Kubernetes cluster can utilize an IP address as the source egress identifier for a selected application. In this case, the IP address 10.0.100.1 is from the BYOIP (Bring Your Own IP) pool advertised via BGP to two Endpoints that belong to a Route Server.

  • There are three worker nodes, two of which are acting as the egress gateway nodes.
  • Egress gateway nodes work in the Active/Standby mode:
    • Both nodes have BGP sessions established with the Endpoints.
    • One worker is an active egress gateway node that performs SNAT and forwards the traffic.
kubectl get nodes -o wide

NAME                                        STATUS   ROLES    AGE    VERSION                INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME

ip-10-0-17-238.eu-west-1.compute.internal   Ready    <none>   6d7h   v1.29.13-eks-5d632ec   10.0.17.238   <none>        Amazon Linux 2   5.10.234-225.921.amzn2.x86_64   containerd://1.7.27
ip-10-0-35-87.eu-west-1.compute.internal    Ready    <none>   6d7h   v1.29.13-eks-5d632ec   10.0.35.87    <none>        Amazon Linux 2   5.10.234-225.921.amzn2.x86_64   containerd://1.7.27
ip-10-0-5-189.eu-west-1.compute.internal    Ready    <none>   6d7h   v1.29.13-eks-5d632ec   10.0.5.189    <none>        Amazon Linux 2   5.10.234-225.921.amzn2.x86_64   containerd://1.7.27

BGP and Egress Gateway configuration

  • Sample BGP configuration
    • BGP nodes should have the label cilium-bgp-peering=instance-1
apiVersion: isovalent.com/v1alpha1
kind: IsovalentBGPClusterConfig
metadata:
  name: cilium-bgp1
spec:
  nodeSelector:
    matchLabels:
      cilium-bgp-peering: instance-1
  bgpInstances:
  - name: "instance1-65002"
    localASN: 65002
    peers:
    - name: "fw-1"
      peerASN: 65001
      peerAddress: 10.0.6.210
      peerConfigRef:
        name: "fw-1"
    - name: "fw-2"
      peerASN: 65001
      peerAddress: 10.0.5.189
      peerConfigRef:
        name: "fw-2"
---
apiVersion: isovalent.com/v1alpha1
kind: IsovalentBGPPeerConfig
metadata:
  name: fw-1
spec:
  ebgpMultihop: 4
  families:
    - afi: ipv4
      safi: unicast
      advertisements:
        matchLabels:
          advertise: "bgp"
---
apiVersion: isovalent.com/v1alpha1
kind: IsovalentBGPAdvertisement
metadata:
  name: bgp-advertisements
  labels:
    advertise: bgp
spec:
  advertisements:
    - advertisementType: "EgressGateway"
      selector:            
        matchExpressions:
          - {key: somekey, operator: NotIn, values: ['never-used-value']}
  • Sample Egress Gateway Policy
    • EGW nodes should have the label  egressGW=use
apiVersion: isovalent.com/v1
kind: IsovalentEgressGatewayPolicy
metadata:
  labels:
    egw: bgp-advertise
  name: egw-advertise
spec:
  destinationCIDRs:
  - 8.8.8.8/32
  egressGroups:
  - egressIP: 10.0.100.1
    maxGatewayNodes: 1
    nodeSelector:
      matchLabels:
        egressGW: use
  selectors:
  - podSelector:
      matchLabels:
        app: curl
status:
  groupStatuses:
  - activeGatewayIPs:
    - 10.0.17.238
    healthyGatewayIPs:
    - 10.0.17.238
    - 10.0.5.189
  observedGeneration: 2

EKS Cluster configuration

The EKS cluster is distributed to three Availability Zones.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: amit-6033
  region: ap-southeast-1

vpc:
  id: "vpc-039a8014ccd1bc5ce"
  subnets:
    private:
      ap-southeast-1a:
          id: "subnet-0ab37810d1901bb0e"
      ap-southeast-1b:
          id: "subnet-062a1b00c744fef3a"
      ap-southeast-1c:
          id: "subnet-0f33e4df743d1313b"
    public:
      ap-southeast-1a:
          id: "subnet-021d5c9b8fb06c7b5"
      ap-southeast-1b:
          id: "subnet-09ff93b4d244c70a4"
      ap-southeast-1c:
          id: "subnet-09b1cb3d62fc41576"

managedNodeGroups:
- name: ng-1
  iam:
     attachPolicyARNs:
       - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
       - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
       - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
       - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
  desiredCapacity: 2
  privateNetworking: true
  # taint nodes so that application pods are
  # not scheduled/executed until Cilium is deployed.
  # Alternatively, see the note below.
  taints:
   - key: "node.cilium.io/agent-not-ready"
     value: "true"
     effect: "NoExecute"

Interface settings

Disabling “the source/destination checking” on instances is necessary when the localization of IP/CIDR is not known to AWS.

  • The same configuration is applied to all nodes. 
  • The BYOIP address 10.0.100.1 is added on eth0 of both EGW nodes.
[root@ip-10-0-5-189 ~]# ip -4 addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    inet 10.0.5.189/20 brd 10.0.15.255 scope global dynamic eth0
       valid_lft forever preferred_lft forever
    inet 10.0.100.1/32 scope global eth0
       valid_lft forever preferred_lft forever

[root@ip-10-0-17-238 ~]# ip -4 addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    inet 10.0.17.238/20 brd 10.0.31.255 scope global dynamic eth0
       valid_lft forever preferred_lft forever
    inet 10.0.100.1/32 scope global eth0
       valid_lft forever preferred_lft forever

Security Group

The traffic is not restricted by security groups in this scenario.

VPC Route Server

BYOIPs and CIDRs advertised with BGP and a VPC Router Server don’t have to be added as an additional CIDR to the VPC.

  • Route Server associated with a VPC
  • Propagations to three route tables.
  • Two endpoints to establish BGP sessions with.
  • Two advertisements of the same prefix from a Cilium node 10.0.17.238.
  • Four BGP sessions in total, two workers and two endpoints.
  • Two BGP sessions are UP on one of the endpoints.
  • Learnt routes installed in three route tables.
  • A view of the route table with the entry to the advertised prefix by Cilium.
  • Check the status of BGP peers across Cilium and AWS Route Server (Active)
root@ip-10-0-17-238:/home/cilium# cilium bgp peers
Local AS   Peer AS   Peer Address      Session       Uptime      Family         Received   Advertised
65002      65001     10.0.29.139:179   established   49h20m53s   ipv4/unicast   0          1
65002      65001     10.0.6.210:179    established   48h36m37s   ipv4/unicast   0          1
root@ip-10-0-17-238:/home/cilium# cilium bgp routes
(Defaulting to `available ipv4 unicast` routes, please see help for more options)
VRouter   Prefix          NextHop   Age          Attrs
65002     10.0.100.1/32   0.0.0.0   166h20m27s   [{Origin: i} {Nexthop: 0.0.0.0}]
root@ip-10-0-17-238:/home/cilium# cilium bgp routes advertised ipv4 unicast peer 10.0.29.139
VRouter   Prefix          NextHop       Age          Attrs
65002     10.0.100.1/32   10.0.17.238   166h20m30s   [{Origin: i} {AsPath: 65002} {Nexthop: 10.0.17.238}]
root@ip-10-0-17-238:/home/cilium# cilium bgp routes advertised ipv4 unicast peer 10.0.6.210
VRouter   Prefix          NextHop       Age          Attrs
65002     10.0.100.1/32   10.0.17.238   166h20m37s   [{Origin: i} {AsPath: 65002} {Nexthop: 10.0.17.238}]
  • Check the status of BGP peers across Cilium and AWS Route Server (Standby)
root@ip-10-0-5-189:/home/cilium# cilium bgp peers
Local AS   Peer AS   Peer Address      Session       Uptime      Family         Received   Advertised
65002      65001     10.0.29.139:179   established   49h25m17s   ipv4/unicast   0          0
65002      65001     10.0.6.210:179    established   48h38m6s    ipv4/unicast   0          0

Conclusion

Hopefully, this post gave you a good overview of deploying Cilium and Egress Gateway in Elastic Kubernetes Service (EKS). If you have any feedback on the solution, please share it with us. Talk to us, and let’s see how Cilium can help with your use case.

Try it out

Start with the Egress Gateway lab and explore Egress Gateway in action.

Further Reading

To dive deeper into the topic of Egress Gateway, check out these two videos:

Raphaël Pinson
AuthorRaphaël PinsonSenior Technical Marketing Engineer
Amit Gupta
AuthorAmit GuptaSenior Technical Marketing Engineer

Industry insights you won’t delete. Delivered to your inbox weekly.