Raphaël is a Solutions Architect with Cloud Native networking and security specialists Isovalent, creators of the Cilium eBPF-based networking project. He works on Cilium, Hubble & Tetragon and the future of Cloud-Native networking & security using eBPF. An early adept of the DevOps principle, he has been a practitioner of Configuration Management and Agile principles in Operations for many years, with a special involvement in the Puppet and Terraform communities over the years.
Cilium Tech Talks – Egress Gateway
Integrating Kubernetes clusters in a legacy networking environment can be a challenge, especially when legacy firewalls are involved. Join us to learn how Cilium Enterprise allows you to define highly-available groups of egress nodes and IP addresses, making it possible to fit Kubernetes egress traffic pretty much to any security policy that may be in place in your infrastructure.
So let’s talk about Cilium Egress Gateway and in particular Cilium Egress Gateway High Availability. So what is this? The Egress gateway feature in Cilium was introduced in Cilium 1.10 and the goal is to allow you to redirect traffic from specific pods to specific servers outside of the cluster. So that they are routed through specific nodes of the cluster. When the egress gateway feature is activated in a cluster, you can use specific resources called egress NAT policies. These egress map policies will allow you to specify how to route packets outside the cluster. So that they are actually masqueraded with selected predictable IPs associated with this egress node.
What you need in order to use this feature, you need network facing interfaces and IP addresses on the nodes that will work as egress nodes. You need masquerading activated and available in your kernel with ebpf. And you need a kube-proxy replacement enabled in your Cilium configuration. Once you have this, you can get started with the egress gateway.
So what does it look like? We have some pods here on a worker node in the cluster. And we have a node that we’ve identified as an egress gateway node. In kubernetes, we have set up an egress NAT policy that says that for this pod to access a server outside the cluster, typically a legacy database for example, then it needs to go through this egress gateway node. So when the pod is trying to reach out to this IP here 10.24.0.5, the Cilium agent on the node will know that it needs to route it through the egress node. It will send the traffic through the egress gateway nod and the egress gateway node will know that the traffic comes from the pod and will apply the egress NAT policy. It will masquerade the IP by replacing the source IP of the pod with the source IP that is specified, either the interface and it will look up the IP or the IP itself can be specified in the NAT policy. Then it will send the traffic to legacy DB. When the traffic returns, it will be written back to send it back to the pod.
So, what does it look like in Kubernetes? The resources look like this and this is the open source version. We can specify how to select a pod. So here we have a pod that matches the labels app equals test app. For this pod reaching out to this CIDR, we want to go out through this IP and Cilium will figure out how to reach it, how to use this IP, and how to select the nodes to exit the cluster. This makes it very easy to filter out IPs going out of the cluster when you need to filter out using traditional firewalls, for example, in your infrastructure.
Now, what we’re going to talk about here is the specific high availability version of this egress gateway feature, which is available in Cilium enterprise. In Cilium enterprise, what we have is the ability to specify several nodes here. So here we have three nodes for a NAT policy. If a node fails, the traffic will keep going through the other nodes. When the nodes are not failing, the traffic will actually be load balanced between the three nodes.
I have a little demo to show you. It will look like this. This is the NAT policy I will use, with the destination CIDR here. When the pod specified here has a label app.kubernetes.io/name=egress-gw-monitor and this namespace, and when this pod is trying to reach this destination CIDR, it should go through one of these nodes. I actually have four nodes, but it takes too much space. You get the idea. We’ve got some ingress IPs here and the zones in which they are located on Amazon. This is an Amazon Web Services demo.
So let’s look at the demo itself. We’re going to have a monitor application here, which is not part of Cilium. It’s just a pod running in the cluster and it happens to be in zone D of my VPC here. This monitor will send HTTP requests to an echo server that will return the IP that the requests are coming from, so we can see where it comes from. It will be load balanced over the four egress nodes here. So let’s quickly look at the demo here.
I’ve got here the monitor that is running and we can see the logs and we see that the requests are going and returning the IPs and we can see it’s going through the four different IPs. I am actually collecting metrics for this pod and displaying in Grafana here and we can see it’s being clearly load-balanced. The requests are going through the four different IPs so exiting through through different nodes and for a total of almost twenty requests per second, we get five requests per second going through each node. The load times are a little bit different per node – that’s because some nodes are located in the same availability zone as the monitor pod or as the echoserver whereas other nodes are located in other availability zones so it’s a little bit further, there’s a hop from another availability zone.
And what we see here is cilium bpf list command running on the Cilium agent where th monitor pod is running and we see that Cilium knows that it needs to route the traffic coming from this IP, from this Pod, to the CIDR through one of these four nodes.
So what happens now? I’m just going to show you quickly. If I reboot a node, I’m going to choose the node that is running in zone A here and I will reboot it. There you go. Let’s reboot it. And very quickly, Cilium should figure out that the node is down. When it comes down, see here, the node is down. Cilium takes it out of the load balancing and we actually lose some connections. It’s expected right before Cilium finds out that the node is is down because we haven’t told Cilium that the node was going down. So here, it started at 10:58 towards the beginning of the 58th second, and all the way to 10:59 towards the end, so two seconds. So for two seconds, Cilium is finding out that the node is gone and rewriting the traffic. So before it was going down, the traffic was successfully going through this node, which is 210, and after it’s not going through this node anymore. We don’t see any green anymore, but instead it’s going through the three other nodes. And now we see, in Grafana how the traffic is being load balanced. Again, there’s still a little bit here that’s graphing updating. It’s got a rate over one minute, but it will quickly show the proper results. Here, the node should be coming back up. We can keep looking at this. So again, this feature would allow us here to typically set let’s say the echo server instead was a database and we wanted to filter out the IPs coming out of the cluster. We could typically set a firewall and say, “I know exactly which IPs are coming out of the cluster to this database so I can filter by IP to control the traffic coming out.