Back to blog

BFD: A Networking Beacon for Highly Available Kubernetes Clusters

Nico Vibert
Nico Vibert
Published: Updated: Cilium
BFD: A Networking Beacon for Highly Available Kubernetes Clusters

Lighthouses emit a powerful, regular light that informs incoming ships of potential obstacles and dangers. If a vessel stops seeing the signal, it knows immediately that something might be blocking or obstructing the passage. The faster the ships react to the missing signal, the more likely they can find an alternate, safe route.

The same principle applies to networking. We tend to rely on routing protocol heartbeats to act as the signals that indicate whether our destination is accessible. If several of them go missing in a row, it indicates the passage might be blocked. But the frequency at which they’re sent is, by default, high and it sometimes takes minutes for a device to realize that its path is degraded.

We need a more effective networking beacon to ensure packets are not swept away and are redirected to a safe course.

With Isovalent Enterprise for Cilium 1.16, you can now enable Bidirectional Forwarding Detection (BFD) and detect link or neighbor loss faster, forcing traffic to take an alternate path and greatly reducing downtime.

Introducing BFD

According to its initial RFC, BFD aims at the following:

The goal of Bidirectional Forwarding Detection (BFD) is to provide low-overhead, short-duration detection of failures in the path between adjacent forwarding engines, including the interfaces, data link(s), and, to the extent possible, the forwarding engines themselves.

RFC5880

In other words, BFD aims at detecting network failures in order to find an alternative path for packets to take. BFD is commonly used with dynamic routing protocols such as OSPF and BGP and is ideal for scenarios where you have multiple paths: by quickly identifying a lossy link or unresponsive peer, the routing table can be updated and traffic can switch over to a healthy path instead.

BFD works by establishing a peering session between two endpoints. It uses a keepalive system where each endpoint at a frequency (BFD Interval) sends heartbeats (BFD Echo packets) to verify that the path is still active. If one endpoint stops receiving these messages within a predefined time (Detection Time), it considers the path down and informs the routing protocol so that routing convergence can occur.

The BFD heartbeats work similarly to BGP KEEPALIVE messages. So why would we need BFD?

Firstly, the BGP timers are, by default, high (30 seconds for the keepalive interval and 90 seconds for the hold time). We previously explored in a Cilium BGP blog post how we can reduce these timers but even with the minimum values, you might still experience a ~10 second outage.

BFD’s lightweightness means it can detect outages faster – subsecond failure detection – without putting a strain on resources.

Let’s take a look at some of the benefits achievable with BFD in a demo environment.

Demo Environment

For this lab, we will once again use the excellent containerized networking platform containerlab, with the following:

  • Kubernetes cluster with 2 nodes running Isovalent Enterprise for Cilium 1.16
  • FRR (FRRouting Project) connected to the Kubernetes cluster
  • Cisco Cloud Services Router 1000v (CSR 1000v) connected to the Kubernetes cluster

Our Kubernetes cluster is based on Kind, has 2 nodes and is running Isovalent Enterprise for Cilium.

root@nico-ubuntu:~# kubectl get nodes
NAME                         STATUS   ROLES           AGE   VERSION
clab-bgp-bfd-control-plane   Ready    control-plane   20h   v1.31.1
clab-bgp-bfd-worker          Ready    <none>          20h   v1.31.1
root@nico-ubuntu:~# cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    OK
 \__/¯¯\__/    Hubble Relay:       disabled
    \__/       ClusterMesh:        disabled

DaemonSet              cilium             Desired: 2, Ready: 2/2, Available: 2/2
DaemonSet              cilium-envoy       Desired: 2, Ready: 2/2, Available: 2/2
Deployment             cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
Containers:            cilium             Running: 2
                       cilium-envoy       Running: 2
                       cilium-operator    Running: 1
Cluster Pods:          3/3 managed by Cilium
Helm chart version:    1.17.0-dev
Image versions         cilium             quay.io/isovalent-dev/cilium-ci:latest: 2
                       cilium-envoy       quay.io/cilium/cilium-envoy:v1.30.6-1729608965-1e298fad5ecff399849a689fb0730551afe42422@sha256:4bf4d4dfd23477666d9d2c05b701954df268902d9f31d691e1a0dc85661ac5ac: 2
                       cilium-operator    quay.io/isovalent-dev/operator-generic-ci:latest: 1

We have configured an external BGP (eBGP) session between a Cilium-managed Kubernetes cluster in Autonomous System (AS) 65001 with a Cisco CSR 1000V device in AS 65005. Cilium’s BGP Daemon is advertising the Pod CIDR network 10.244.0.0/24 to the CSR, to make our pods accessible to the rest of the network.

csr#show ip bgp summary 
BGP router identifier 172.5.0.1, local AS number 65005
BGP table version is 18, main routing table version 18
1 network entries using 248 bytes of memory
1 path entries using 136 bytes of memory
1/1 BGP path/bestpath attribute entries using 288 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 696 total bytes of memory
BGP activity 8/7 prefixes, 9/8 paths, scan interval 60 secs
1 networks peaked at 13:10:59 Oct 30 2024 UTC (1d22h ago)

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
172.5.0.2       4        65001       5       6       18    0    0 00:01:08        1
      
csr#show ip route bgp
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2, m - OMP
       n - NAT, Ni - NAT inside, No - NAT outside, Nd - NAT DIA
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       H - NHRP, G - NHRP registered, g - NHRP registration summary
       o - ODR, P - periodic downloaded static route, l - LISP
       a - application route
       + - replicated route, % - next hop override, p - overrides from PfR
       & - replicated local route overrides by connected

Gateway of last resort is not set

      10.0.0.0/24 is subnetted, 1 subnets
B        10.244.0.0 [20/0] via 172.5.0.2, 00:01:15

Let’s now compare, without and with BFD, the observed behaviour when a link becomes degraded.

Without BFD

Using their default settings, the CSR and Cilium negotiate a 30-second BGP keepalive interval and a 90-second hold time – meaning they exchange keepalives every 30 seconds and if they don’t receive any over a period of 90 seconds, they will conclude that their peering session has failed.

csr#show ip bgp neighbors 
BGP neighbor is 172.5.0.2,  remote AS 65001, external link
  BGP version 4, remote router ID 172.18.0.2
  BGP state = Established, up for 00:29:39
  Last read 00:00:09, last write 00:00:22, hold time is 90, keepalive interval is 30 seconds

It means that it could take well over a minute for the BGP peers to detect that the connection to its peer is unsuccessful:

Let’s introduce some packet loss on the link using containerlab’s native support for netem, a network emulator that can introduce packet loss and delay. Note how I will show the current time through the tutorial so that you can see for yourself the outage with and without BFD.

root@nico-ubuntu:~# date
Fri Nov  1 12:35:11 UTC 2024
root@nico-ubuntu:~# containerlab tools netem set -n clab-bgp-bfd-csr -i eth1 --loss 100
+-----------+-------+--------+-------------+-------------+
| Interface | Delay | Jitter | Packet Loss | Rate (kbit) |
+-----------+-------+--------+-------------+-------------+
| eth1      | 0s    | 0s     | 100.00%     |           0 |
+-----------+-------+--------+-------------+-------------+

As soon as we start dropping all packets on the interface on the CSR device, BGP keepalives start going missing but, because of the BGP’s holdtime interface, the session takes about 2 minutes to come down and back in Active state.

csr#show clock
*12:35:18.995 UTC Fri Nov 1 2024
csr#show ip bgp summary 
BGP router identifier 172.5.0.1, local AS number 65005
BGP table version is 20, main routing table version 20
1 network entries using 248 bytes of memory
1 path entries using 136 bytes of memory
1/1 BGP path/bestpath attribute entries using 288 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 696 total bytes of memory
BGP activity 9/8 prefixes, 10/9 paths, scan interval 60 secs
1 networks peaked at 13:10:59 Oct 30 2024 UTC (1d23h ago)

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
172.5.0.2       4        65001      12      14       20    0    0 00:04:36        1

csr#show ip bgp summary 
BGP router identifier 172.5.0.1, local AS number 65005
BGP table version is 20, main routing table version 20
1 network entries using 248 bytes of memory
1 path entries using 136 bytes of memory
1/1 BGP path/bestpath attribute entries using 288 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 696 total bytes of memory
BGP activity 9/8 prefixes, 10/9 paths, scan interval 60 secs
1 networks peaked at 13:10:59 Oct 30 2024 UTC (1d23h ago)

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
172.5.0.2       4        65001      13      18       20    0    0 00:06:29        1

csr#show ip bgp summary 
BGP router identifier 172.5.0.1, local AS number 65005
BGP table version is 21, main routing table version 21
1 networks peaked at 13:10:59 Oct 30 2024 UTC (1d23h ago)

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
172.5.0.2       4        65001       0       0        1    0    0 00:00:11 Active

csr#show clock
*12:37:30.880 UTC Fri Nov 1 2024

2 minutes of downtime is obviously not acceptable for anyone running mission critical applications. It’s possible to reduce the BGP timers (as described in the Cilium docs) but timers too aggressively low can cause some unwanted instability.

Let’s see what we can achieve with BFD instead.

With BFD

Let’s deploy BFD on our remote Cisco CSR end. We need to enable it on the interface and in the BGP configuration:

!
interface GigabitEthernet2
 no shutdown
 ip address 172.5.0.1 255.255.255.0
 ipv6 address b:5::1/64
 ipv6 enable
 bfd interval 340 min_rx 340 multiplier 3
!
router bgp 65005
 bgp log-neighbor-changes
 neighbor 172.5.0.2 remote-as 65001
 neighbor 172.5.0.2 fall-over bfd
!
ipv6 unicast-routing
ipv6 route A:B:C::1/128 GigabitEthernet2 B:5::2
ipv6 route static bfd GigabitEthernet2 B:5::2

Let’s now enable the BFD feature in Cilium:

root@nico-ubuntu:~# cilium config view | grep -e enterprise-bgp -e bfd
enable-bfd                                           true
enable-enterprise-bgp-control-plane                  true

BFD first requires a BFD profile to be created. You can specify how often the BFD heartbeats are sent and received (note that the timers are negotiated between the peers when the BFD session is being established). Notice how BFD support is available for both IPv4 and IPv6.

apiVersion: isovalent.com/v1alpha1
kind: IsovalentBFDNodeConfig
metadata:
  name: manual-clab-bgp-bfd-control-plane
spec:
  nodeRef: clab-bgp-bfd-control-plane
  peers:
    # FRR
    # IPv4 BFD peering is configured via BGP config
    - name: frr-ipv6
      peerAddress: b:2::1
      bfdProfileRef: frr
 
    # CSR
    - name: csr-ipv4
      peerAddress: 172.5.0.1
      bfdProfileRef: csr
    - name: csr-ipv6
      peerAddress: b:5::1
      bfdProfileRef: csr
---
apiVersion: isovalent.com/v1alpha1
kind: IsovalentBFDProfile
metadata:
  name: frr
spec:
  receiveIntervalMilliseconds: 310
  transmitIntervalMilliseconds: 310
  echoFunction:
    directions:
      - Transmit
    transmitIntervalMilliseconds: 60

---
apiVersion: isovalent.com/v1alpha1
kind: IsovalentBFDProfile
metadata:
  name: csr
spec:
  receiveIntervalMilliseconds: 340
  transmitIntervalMilliseconds: 340
  echoFunction:
    directions:
      - Receive
      - Transmit
    receiveIntervalMilliseconds: 90
    transmitIntervalMilliseconds: 90

We then refer to the BFD profile in the BGP configuration:

apiVersion: isovalent.com/v1alpha1
kind: IsovalentBGPClusterConfig
metadata:
  name: cilium-bgp
spec:
  nodeSelector:
    matchLabels:
      kubernetes.io/hostname: clab-bgp-bfd-control-plane
  bgpInstances:
    - name: "65001"
      localASN: 65001
      peers:
        - name: "frr"
          peerASN: 65002
          peerAddress: "172.2.0.1"
          peerConfigRef:
            name: "frr-peer"
        - name: "csr"
          peerASN: 65005
          peerAddress: "172.5.0.1"
          peerConfigRef:
            name: "csr-peer"


---
apiVersion: isovalent.com/v1alpha1
kind: IsovalentBGPPeerConfig
metadata:
  name: frr-peer
spec:
  families:
    - afi: ipv4
      safi: unicast
      advertisements:
        matchLabels:
          advertise: "bgp"
  bfdProfileRef: frr

---
apiVersion: isovalent.com/v1alpha1
kind: IsovalentBGPPeerConfig
metadata:
  name: csr-peer
spec:
  families:
    - afi: ipv4
      safi: unicast
      advertisements:
        matchLabels:
          advertise: "bgp"
  bfdProfileRef: csr-ipv4

---
apiVersion: isovalent.com/v1alpha1
kind: IsovalentBGPAdvertisement
metadata:
  name: bgp-advertisements
  labels:
    advertise: bgp
spec:
  advertisements:
    - advertisementType: "PodCIDR"

After deploying the configuration, the BGP peering between the Cisco CSR and Cilium works as expected, with the CSR once again receiving the Pod CIDR:

csr#show ip bgp ipv4 unicast summary 
BGP router identifier 172.5.0.1, local AS number 65005
BGP table version is 22, main routing table version 22
1 network entries using 248 bytes of memory
1 path entries using 136 bytes of memory
1/1 BGP path/bestpath attribute entries using 288 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 696 total bytes of memory
BGP activity 10/9 prefixes, 11/10 paths, scan interval 60 secs
1 networks peaked at 13:10:59 Oct 30 2024 UTC (2d00h ago)

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
172.5.0.2       4        65001      10      11       22    0    0 00:03:35        1

csr#
csr#show bgp ipv4 unicast  
BGP table version is 22, local router ID is 172.5.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
              t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>   10.244.0.0/24    172.5.0.2                              0 65001 i

On the CSR, we can see the established BFD sessions (there’s one for IPv4 and one for IPv6):

csr#show bfd neighbors               

IPv4 Sessions
NeighAddr                              LD/RD         RH/RS     State     Int
172.5.0.2                            4097/1614084818 Up        Up        Gi2

IPv6 Sessions
NeighAddr                              LD/RD         RH/RS     State     Int
B:5::2                                  1/1948284173 Up        Up        Gi2

On the Cilium side, we can validate that BFD is also successfully configured by logging onto the Cilium agent:

root@nico-ubuntu:~# kubectl exec -it -n kube-system cilium-r64cb -- cilium-dbg bfd peers
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), clean-cilium-state (init), install-cni-binaries (init)
# PeerAddress   Interface   Discriminator   RemDiscriminator   State   LastStateChange   Multi   RemMulti   RxInt   RemRxInt   TxInt   RemTxInt   EchoRxInt   RemEchoRxInt   EchoTxInt   Diagnostic             RemDiagnostic
b:2::1          net1        3209926287      437407563          Up      48h56m17s         3       3          1s      300ms      310ms   300ms      0s          50ms           60ms        No Diagnostic          No Diagnostic
172.2.0.1       net1        2083603199      2072335566         Up      48h56m10s         3       3          1s      300ms      310ms   300ms      0s          50ms           60ms        No Diagnostic          No Diagnostic
b:5::1          net4        1948284173      1                  Up      3m19s             3       3          340ms   340ms      340ms   340ms      90ms        0s             90ms        No Diagnostic          No Diagnostic
172.5.0.1       net4        1614084818      4097               Up      1m37s             3       3          1s      1s         340ms   1s         90ms        340ms          90ms        No Diagnostic          No Diagnostic

When capturing the traffic with tcpdump, we can see our BFP packets being exchanged – you can see in the BFD control packets some of the timers settings specified during the negotiation:

wireshark output of BFD traffic

Let’s now introduce some loss again. BFD immediately notices the loss of echo packets and brings down the BGP session immediately:

root@nico-ubuntu:~# cilium bgp peers
Node                         Local AS   Peer AS   Peer Address   Session State   Uptime     Family         Received   Advertised
clab-bgp-bfd-control-plane   65001      65002     172.2.0.1      established     49h17m9s   ipv4/unicast   1          1    
                             65001      65005     172.5.0.1      established     17s        ipv4/unicast   0          1    
root@nico-ubuntu:~# date
Fri Nov  1 14:28:39 UTC 2024
root@nico-ubuntu:~# containerlab tools netem set -n clab-bgp-bfd-csr -i eth1 --loss 100
+-----------+-------+--------+-------------+-------------+
| Interface | Delay | Jitter | Packet Loss | Rate (kbit) |
+-----------+-------+--------+-------------+-------------+
| eth1      | 0s    | 0s     | 100.00%     |           0 |
+-----------+-------+--------+-------------+-------------+
root@nico-ubuntu:~# 
root@nico-ubuntu:~# cilium bgp peers
Node                         Local AS   Peer AS   Peer Address   Session State   Uptime      Family         Received   Advertised
clab-bgp-bfd-control-plane   65001      65002     172.2.0.1      established     49h17m41s   ipv4/unicast   1          1    
                             65001      65005     172.5.0.1      idle            0s          ipv4/unicast   0          1    
root@nico-ubuntu:~# date
Fri Nov  1 14:28:59 UTC 2024

With BFD immediately noticing the defective link and the BGP session going down, traffic can be re-routed via an alternative path and downtime can be greatly reduced.

BFD immediately notices the loss of echo packets and brings down the BGP session immediately, and the traffic can take an alternate path.

Final Thoughts

Minimizing interruption to live traffic remains a priority for all infrastructure and platform engineers. BFD support in Isovalent Enterprise for Cilium provides another method to make platforms more robust and resilient in the event of a failure.

If you’d like to learn more, don’t hesitate to contact us to request a demo with our talented team of solution architects:

  • Request a Demo – Schedule a demo session with an Isovalent Solution Architect.
Nico Vibert
AuthorNico VibertSenior Staff Technical Marketing Engineer
Share on social media

Related

Blogs

Introducing The New “Kubernetes Networking and Cilium for the Network Engineer” eBook!

Introducing a new eBook! A Kubernetes Networking and Cilium instructions manual for the network engineer!

By
Nico Vibert
Labs

BGP on Cilium

Learn how to connect your Kubernetes Clusters with your on-premises network using BGP. As Kubernetes becomes more pervasive in on-premise environments, users increasingly have both traditional applications and Cloud Native applications in their environments. In order to connect them together and allow outside access, a mechanism to integrate Kubernetes and the existing network infrastructure running BGP is needed. Cilium offers native support for BGP, exposing Kubernetes to the outside and all the while simplifying users’ deployments.

Blogs

Connecting your Kubernetes island to your network with Cilium BGP

In this blog post, learn how to connect your Kubernetes cluster to your network using BGP !

By
Raymond de Jong

Industry insights you won’t delete. Delivered to your inbox weekly.