Integrating Kubernetes into Traditional Infrastructure with HA Egress Gateway
Kubernetes changes the way we think about networking. In an ideal Kubernetes world, the network would be entirely flat and all routing and security between the applications would be controlled by the Pod network, using Network Policies.
In many Enterprise environments, though, the applications hosted on Kubernetes need to communicate with workloads living outside the Kubernetes cluster, which are subject to connectivity constraints and security enforcement. Because of the nature of these networks, traditional firewalling usually relies on static IP addresses (or at least IP ranges). This can make it difficult to integrate a Kubernetes cluster, which has a varying —and at times dynamic— number of nodes into such a network.
Cilium’s Egress Gateway feature changes this, by allowing you to specify which nodes should be used by a pod in order to reach the outside world.
The Egress IP Gateway feature was first introduced in Cilium 1.10.
In the Open Source version, Egress Gateway allows one to specify a single Egress IP to go through when a pod reaches out to one or more CIDRs outside the cluster. For example, with the following resource:
apiVersion: cilium.io/v2alpha1 kind: CiliumEgressNATPolicy metadata: name: egress-sample spec: egress: - podSelector: matchLabels: app: test-app destinationCIDRs: - 184.108.40.206/24 egressSourceIP: 10.1.2.1
All pods matching the
app=test-app label will be routed through the
10.1.2.1 IP when they reach out to any address in the
220.127.116.11/24 IP range —which is outside the cluster.
HA Egress Gateway in Cilium EE
While Egress Gateway in Open Source Cilium is a great step forward, most enterprise environments should not rely on a single point of failure for network routing. For this reason, Cilium Enterprise 1.11 introduced Egress Gateway High Availability (HA), which supports multiple egress nodes. The nodes acting as egress gateways will then load-balance traffic in a round-robin fashion, and will provide fallback nodes in case one or more egress nodes fail.
The multiple egress nodes can be configured using an
egressGroups parameter in the
CiliumEgressNATPolicy resource specification:
apiVersion: cilium.io/v2alpha1 kind: CiliumEgressNATPolicy metadata: name: egress-ha-sample spec: egress: - podSelector: matchLabels: app: test-app destinationCIDRs: - 18.104.22.168/24 egressGroups: - nodeSelector: matchLabels: egress-group: egress-group-1 interface: eth1 maxGatewayNodes: 2
In this example, all pods matching the
app=test-app label and reaching out to the
22.214.171.124/24 IP range will be routed through a group of two cluster nodes bearing the
egress-group=egress-group-1 label, using the first IP address associated with the
eth1 network interface of these nodes.
Note that it is also possible to specify an egress IP address instead of an interface.
In order to send traffic to Egress nodes, Cilium needs to make use of network-facing interfaces and IP addresses present on the designated gateway nodes. These interfaces and IP addresses must be provisioned and configured by the operator based on their networking environment. The process is highly-dependent on said networking environment —for example, in AWS/EKS, and depending on the requirements, this may mean creating one or more Elastic Network Interfaces with one or more IP addresses and attaching them to instances that serve as gateway nodes so that AWS can adequately route traffic flowing from and to the instances. Other cloud providers have similar networking requirements and constructs.
Additionally, the enablement of the egress gateway feature requires that both (BPF) masquerading and the kube-proxy replacement be enabled in the Cilium configuration.
In order to achieve a highly available configuration, the operator needs to specify multiple gateway nodes in a policy. In a scenario in which multiple gateway nodes are specified, and whenever a given node is detected as being unhealthy, Cilium will remove it from the pool of gateway nodes so that traffic stops being forwarded to it. The period used for the health checks can be configured in the Cilium Helm values:
cilium: egressGateway: healthcheckTimeout: 1s # Defaults to '2s'.
HA Egress Gateway Example Walk-through
For the following feature example walk-through, we set up a monitoring system on top of an AWS EKS cluster using Cilium Enterprise 1.10.
We are using two containerized apps to monitor traffic:
- An HTTP echo server which returns the caller’s IP address (similar to ifconfig.me/ip), deployed on an EC2 instance outside of the Kubernetes cluster;
- A monitoring server which sends HTTP requests to the echo server once every 50ms and provides Prometheus metrics on the requests, such as the HTTP load time, the number of HTTP errors, or whether the returned IP is one of the known egress nodes IPs.
The example walk-through will be running on the Oregon (us-west-2) AWS region, which features four availability zones, labeled
For a proper HA setup, and since our EKS cluster is spread on all four zones, we set up one egress node per zone. Each of these nodes has one extra Elastic Network Interface attached to it, with an IP in the VPC’s subnet. Each of these IP is thus specific to an availability zone:
The nodes have a user data script allowing them to retrieve and mount the IP assigned to their zone.
Note that this setup is specific to AWS. On other platforms, a different method would have to be used in order to assign IPs to the Egress nodes. The monitoring server is then scraped by Prometheus and we built a Grafana dashboard from the metrics.
This dashboard displays:
- The stacked number of requests per second (about 20req/s in total, with a 50ms delay between requests), per outbound IP;
- The load times, per outbound IP;
- The % of HTTP errors in total requests.
Note that the graphs display rates over 1 minute, so they do not allow to measure unavailability windows in a reliable way. When calculating unavailability in the example walk-through, we relied on the monitor container logs instead.
To demonstrate the HA Egress Gateway feature, we are going to apply the following Cilium Egress NAT Policy to the cluster:
apiVersion: cilium.io/v2alpha1 kind: CiliumEgressNATPolicy metadata: name: egress-gw-nat-policy spec: destinationCIDRs: - 10.2.0.0/16 egress: - podSelector: matchLabels: app.kubernetes.io/name: egress-gw-monitor io.kubernetes.pod.namespace: cilium-egress-gateway egressGroups: - egressIP: 10.2.200.10 nodeSelector: matchLabels: io.cilium/egress-gateway: 'true' topology.kubernetes.io/zone: us-west-2a - egressIP: 10.2.201.10 nodeSelector: matchLabels: io.cilium/egress-gateway: 'true' topology.kubernetes.io/zone: us-west-2b - egressIP: 10.2.202.10 nodeSelector: matchLabels: io.cilium/egress-gateway: 'true' topology.kubernetes.io/zone: us-west-2c - egressIP: 10.2.203.10 nodeSelector: matchLabels: io.cilium/egress-gateway: 'true' topology.kubernetes.io/zone: us-west-2d
podSelector is still very similar to the previous example. The
egressGroups parameter is a bit different, though.
With this policy in place, the pods with the
app.kubernetes.io/name=egress-gw-monitor label in the
cilium-egress-gateway namespace will go through one of the egress nodes when reaching out to any IP in the
10.2.0.0/16 IP range, which includes our echo server instance.
Let’s see what happens when adding the Cilium Egress NAT Policy to the cluster.
Before applying the policy, all traffic reached the echo server directly, using the pod’s IP (
10.2.3.165, in red) since our cluster is using AWS ENI, with direct routing. After applying it, traffic to the echo server goes through one of the 4 elastic IPs set up on the egress nodes in a round-robin fashion.
Load times depend on the number of hops and distance. Direct connections (red) are faster, followed by connections through the us-west-2d instance (where the monitor pod is located) and eu-west-2a where the echo server is located and finally the two other zones which require a hop through a third availability zone.
Now is the time to test the resiliency of the HA setup, by rebooting one or more egress instances.
When rebooting a single egress node (in
us-west-2d in this case), a few packets are lost, but the load is quickly re-balanced on the 3 remaining nodes. In our tests using a 1-second health check timeout setting, we recorded timeouts for a maximum period of 3 seconds when the node came down.
Traffic comes back to normal without losses after the node is back up.
When rebooting three (out of four) egress nodes at the same time, the timeout period was again 3 seconds if all nodes came down at the same time, and up to 6 seconds when they went down in multiple batches. After that, all traffic was re-balanced on the last node, until the nodes came back up. The same happens if we terminate the nodes and let AWS autoscaling spawn up new nodes (though the nodes take longer to come back up).
This example walk-through is just one example of a specific Egress Gateway HA implementation on AWS. Isovalent Cilium Egress Gateway HA can be designed and adapted to any Enterprise environment, whether in Cloud or on-prem environments, to ensure egress traffic from your Kubernetes clusters to be deterministic and highly available.
Integrating Kubernetes in its surrounding environment can be a challenge. In addition to connecting clusters with Cluster Mesh for seamless inter-cluster communication, Isovalent Cilium Enterprise also allows you to configure highly available egress nodes. That way you can easily and safely filter the traffic coming out of your cluster and into your legacy network infrastructure.
To dive deeper into the topic of Egress Gateway, check out these two video:
You will find more information about Isovalent Cilium Enterprise or eBPF in our resources pages:
- Introduction to Isovalent Cilium Enterprise – Overview & Features
- List of Cilium & eBPF Resources & Reading Material
We have also regular calls you can attend to discuss Cilium and related topics in more detail: