Back to blog

Cilium 1.14 – Effortless Mutual Authentication, Service Mesh, Networking Beyond Kubernetes, High-Scale Multi-Cluster, and Much More

Thomas Graf
Thomas Graf
Published: Updated: Cilium
Cilium 1.14 – Effortless Mutual Authentication, Service Mesh, Networking Beyond Kubernetes, High-Scale Multi-Cluster, and Much More

Cilium 1.14 has landed! This might be the biggest release yet. We are particularly excited to introduce support for a much-requested feature: mutual authentication. From its inception, we looked at delivering an optimal effortless user experience to achieve mutual authentication. The result is simple: add 2 lines of YAML to your Cilium Network Policy, and that’s it – your workload communication is now authorized with a mutual TLS handshake. Try it and give us some feedback – we hope you love it!

To achieve your integrity and confidentiality requirements alongside the mutual authentication feature, enable our enhanced Transparent Encryption feature with WireGuard – it now supports Layer 7 Network Policies and Node-to-Node encryption. But that’s not all on service mesh security: our leading Gateway API implementation continues to get better with the introduction of TLS Passthrough in Cilium 1.14.

Of course we continue to innovate in networking: in 1.13, we announced support for the groundbreaking BIG TCP protocol for IPv6. In 1.14, we now support BIG TCP for IPv4. As far as we know, we’re the only public platform that supports BIG TCP.

And it’s not just for Kubernetes – Cilium’s reach is extending beyond.

Networking from Kubernetes to external networks is getting easier with native support for L2 Announcement: you no longer need to deploy MetalLB for this use case. We are also adding lots of new BGP features for those looking at connecting their clusters to their existing data center networks. 

Networking beyond Kubernetes comes with the arrival of our new Cilium Mesh project. Announced at KubeCon Europe 2023, Cilium Mesh is a natural evolution of Cilium to extend the reach of Cilium-based networking and security. And if you’re not into Kubernetes, Cilium is now available on the second most popular container orchestrator: Nomad.

The third major focus for this release is scale and operational improvements. We see more and more large-scale Cilium deployments across thousands of nodes, and we are introducing KVStoreMesh support for Cluster Mesh to improve the reliability of extra-large Cilium Cluster Mesh deployments. We are also introducing better tooling for not only managing your Cilium environment but also migrating to Cilium.

Before we go through some of the new features, we would like to personally thank all the contributors to this exciting release. Cilium now has over 600 contributors since its inception, and we look forward to reaching 1,000 and beyond.

Cilium 1.14 – New Features at a Glance

Service Mesh & Mutual Authentication

  • Mutual Authentication: improve your security posture with zero effort (more details)
  • Envoy DaemonSet: a new option to deploy Envoy as a DaemonSet instead of embedded inside the Cilium agent (more details)
  • WireGuard Improvements: encryption with Cilium is getting better – you can now encrypt the traffic from node-to-node and also use Layer 7 policies alongside WireGuard (more details)
  • Gateway API Update: our leading Gateway API implementation is updated with support for the latest Gateway API version, additional route type support and multiple labs (more details)

Networking beyond Kubernetes

  • Cilium Mesh: consistent networking across clouds and heterogeneous workloads (more details)
  • L2 Announcements: Cilium can now natively advertise External IPs to local networks over Layer 2, reducing the need to install and manage tools such as MetalLB (more details)
  • BGP Enhancements: introducing support for better operational tools and faster failover (more details)
  • Cilium on Nomad: you can now run Cilium on the second most popular container orchestrator (more details)

CNI Networking and Security

  • Multi-Pool IPAM: introducing support to allocate IPs to Pods from multiple IPAM pools. Multi-pool is a step towards Cilium Multi-homing (more details)
  • BIG TCP for IPv4: after the introduction of BIG TCP support for IPv6 in Cilium 1.13, here comes IPv4 support. Ready for a 50% throughput improvement? (more details)
  • Deny Policies Graduated to Stable: the Deny Policies have now been promoted to Stable (more details)

Day 2 Operations and Scale

  • Cluster Mesh Scale Improvements: for improved stability of large-scale Cluster Mesh deployments with KVstoreMesh (more details)
  • Cilium CLI Helm Mode: consistent installation and configuration of Cilium with the new Cilium CLI Helm Mode (more details)
  • Migrating to Cilium: it’s never been easier to migrate to Cilium with the CiliumNodeConfig resource (more details)

Hubble & Observability

  • Mutual Authentication Observability: Mutual Authentication Observability: Hubble provides insight on whether the mutual authentication with Cilium is successful or not (more details)
  • Grafana Network Observability + Hubble UI: Hubble and Grafana together can give you insight into your application performance golden signals (more details)
  • New and Updated Labs: Our new Golden Signals Lab covers the topic of HTTP observability, whilst the updated Connectivity Lab adds an exploration of Hubble Timescape which has just reached general availability. (more details)

Cilium in the Cloud

  • Cilium on AKS: Azure CNI powered by Cilium is now Generally Available on AKS! (more details)
  • Cilium on EKS: Cilium on EKS-Anywhere has some new features! (more details)
  • Cilium on GKE: Hubble is now available for GKE Datapath V2 users! (more details)

Tetragon

  • Project update: The Tetragon momentum is only getting stronger since its public release last year (more details)
  • New Tetragon Landing Page: Tetragon now has its own landing page, with detailed guides and tutorials (more details)
  • Tetragon at KubeCon Europe 2023: Tetragon was a popular topic in Amsterdam (more details)
  • Pass the Tetragon Lab and Get Your Tetragon Badge: Get to know Tetragon with some of the new labs! (more details)

Service Mesh & Mutual Authentication

Mutual Authentication

The flagship feature of Cilium 1.14 is one that we started over a year ago in this blog post: mutual authentication!

What we have learned over the past few years from talking to hundreds, if not thousands of users is that encryption of traffic between micro-services and mutual authentication between such services are simply must-have security features.

Achieving these requirements would have typically required the installation and management of additional tools like a service mesh and an identity platform.

We’ve had support for Transparent Encryption through either IPsec or WireGuard since early releases of Cilium. But while the Cilium Network Policies are powerful tools to impose a Zero-Trust model between workloads, the workload identity was not proven cryptographically, and that’s what mutual authentication adds to Cilium’s existing identity mechanisms.

Our objective for this feature was to avoid more tools sprawl and to provide a frictionless experience for users to achieve their security requirements.

First, we needed a framework for identity verification.

In Cilium’s new mutual auth support, that is provided through SPIFFE (Secure Production Identity Framework for Everyone) and SPIRE (SPIRE is a production-ready implementation of the SPIFFE APIs).

You can now deploy Cilium with a SPIRE server (with the authentication.mutual.spire.install.enabled=true flag) and enable Cilium mutual authentication (with the authentication.mutual.spire.enabled=true flag).

Workloads will then have their identities created on the SPIRE server; which automatically manages and rotates your certificates.

To enforce mutual authentication, you can then amend your existing Cilium Network Policies (a must-have for clusters with any moderate security requirements) with authentication.mode: "required".

Let’s walk through a simple example and enforce mutual authentication between Pods such as tiefighter and deathstar (based on the popular Star Wars inspired demo).

We’ve seen how Cilium Network Policies can restrict the traffic between 2 workloads, but preserving the integrity of the workloads is essential.

Preventing a compromised tiefighter from accessing the deathstar can only be done by verifying its identity. By enabling mutual authentication on the network policy, packets from tiefighter to deathstar will not flow until an mTLS handshake is completed.

This is what such Cilium Network Policy might look like – with just 2 lines of extra YAML, we enable mutual authentication between the workloads to which this policy applies:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "rule1"
spec:
  description: "Mutual authentication enabled L7 policy"
  endpointSelector:
    matchLabels:
      org: empire
      class: deathstar
  ingress:
  - fromEndpoints:
    - matchLabels:
        org: empire
    authentication:
      mode: "required"
    toPorts:
    - ports:
      - port: "80"
        protocol: TCP
      rules:
        http:
        - method: "POST"
          path: "/v1/request-landing"

As soon as traffic matches the rule, the Cilium agent retrieves the identity for tiefighter, connect to the node where the deathstar pod is running, and perform a mutual TLS authentication handshake. Traffic is dropped until the handshake is completed:

default/tiefighter:43412 (ID:10685) <> default/deathstar-f694cf746-5wxks:80 (ID:53245) Authentication required DROPPED (TCP Flags: SYN)

When the handshake is successful, mutual authentication is now complete, and packets from tiefighter to deathstar flow until the network policy is removed or the certificate expires.

Note that SPIRE will rotate certificates automatically halfway through their lifetime – and Cilium will seamlessly pick up the new ones.

default/tiefighter:57248 (ID:10685) -> default/deathstar-f694cf746-6nkxz:80 (ID:53245) policy-verdict:L3-L4 INGRESS ALLOWED (TCP Flags: SYN; Auth: SPIRE)

You can read more about it in the official Cilium documentation or you could try out our new “Mutual Authentication with Cilium” lab. We now offer 25 fun interactive hands-on labs and this new release includes 3 new labs and 2 labs updated with the new 1.14 features.

Mutual Authentication Lab

Learn about Mutual Authentication in Cilium, the integration with SPIRE/SPIFFE and how to protect the Death Star against the Rebellion in this new lab!

Start Lab

Envoy DaemonSet

The Envoy proxy has been a key component of Cilium from early on (we presented the Cilium and Envoy integration as far back as EnvoyCon 2018). When specific Cilium features require Layer 7 processing (Ingress, Gateway API, Network Policies with L7 functionality, L7 Protocol Visibility), the Cilium agent starts an Envoy proxy as a separate process within the Cilium agent pod.

This means both the Cilium agent and the Envoy proxy not only share the same lifecycle but also the same blast radius in the event of a compromise.

In Cilium 1.14, we are introducing support for Envoy as a DaemonSet. This provides a number of potential benefits, such as:

  • Cilium Agent restarts (for example, for upgrades) do not impact the live traffic proxied via Envoy.
  • Envoy patch release upgrades do not impact the Cilium Agent.
  • Reduced blast radius in the (unlikely) event of a compromise
  • Envoy application log isn’t mixed with the log of the Cilium Agent.
  • Dedicated health probes for the Envoy proxy.

Best of all, deploying it is as simple as running this command:

# cilium install --version 1.14.0 --set envoy.enabled=true
🔮 Auto-detected Kubernetes kind: kind
✨ Running "kind" validation checks
✅ Detected kind version "0.14.0"
ℹ️  Using Cilium version 1.14.0
🔮 Auto-detected cluster name: kind-kind
ℹ️  kube-proxy-replacement disabled
🔮 Auto-detected datapath mode: tunnel
🔮 Auto-detected kube-proxy has been installed
#
# kubectl get -n kube-system  daemonsets.apps cilium-envoy
NAME           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
cilium-envoy   3         3         3       3            3           kubernetes.io/os=linux   2m9s
# 
# cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    OK
 \__/¯¯\__/    Hubble Relay:       disabled
    \__/       ClusterMesh:        disabled

DaemonSet              cilium-envoy       Desired: 3, Ready: 3/3, Available: 3/3
Deployment             cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
DaemonSet              cilium             Desired: 3, Ready: 3/3, Available: 3/3
Containers:            cilium             Running: 3
                       cilium-envoy       Running: 3
                       cilium-operator    Running: 1
Cluster Pods:          4/4 managed by Cilium
Helm chart version:    1.14.0
Image versions         cilium             quay.io/cilium/cilium:v1.14.0: 3
                       cilium-envoy       quay.io/cilium/cilium-envoy:v1.25.6-4350471813b173839df78f7a1ea5d77b5cdf714b@sha256:5d03695af25448768062fa42bffec7dbaa970f0d2b320d39e60b0a12f45027e8: 3
                       cilium-operator    quay.io/cilium/operator-generic:v1.14.0: 1

We expect this model to become the default deployment model in the medium term.

WireGuard Node-to-Node encryption and Layer 7 Policies support

Encryption with WireGuard was introduced in Cilium 1.10 and enabled users to transparently encrypt traffic between Pods running on different nodes. It quickly became a favorite feature for many users requiring data confidentiality – mainly because of WireGuard’s simplicity. WireGuard is very opinionated so there’s no need to pick an encryption algorithm, to manage or rotate keys – that’s all done in the background for you.

WireGuard on Cilium had a couple of limitations, though – it wasn’t compatible with Layer 7 policies and it did not support node-to-node encryption, only pod-to-pod – but both limitations are removed in the new Cilium 1.14 release.You can now both encrypt traffic from pod to pod, from pod to node, and from node to node by using the encryption.nodeEncryption=true flag. Note this feature is in Beta.

We also introduce support for L7 network policies – based on HTTP, Kafka or DNS parameters for example – applied to traffic encrypted with WireGuard.

Watch the videos to learn more – first on Node-to-Node Encryption:

Then on Layer 7 Policies support with WireGuard:

If you’d like to try this feature, the Transparent Encryption lab has been updated with these new capabilities:

WireGuard Lab

Learn about WireGuard and transparent encryption on Cilium!

Start Lab

Gateway API Improvements

In Cilium 1.14, our leading implementation of Gateway API is getting even better.

In our previous blog posts, we explained the ‘why‘ and the ‘how‘ of Gateway API (the future of Kubernetes Ingress routing) and thousands of learners have learned about these features in our popular labs (introduction to Gateway API and advanced Gateway API use cases).

Gateway API support was introduced in the Cilium 1.13 release and included TLS Termination support. In this mode, HTTPS traffic would terminate at the Gateway and would be unencrypted from the Gateway to the Pods.

In Cilium 1.14, we are introducing support for TLS Passthrough through the use of the TLSRoute Resource.

With this feature, the Gateway lets the TLS stream pass through unchanged, using the TLS SNI (hostname) to route to the right service.

In other words:

Client ▶ GatewayGateway ▶ Pod
TLS TerminateHTTPS 🔒HTTP 🔒
TLS PassthroughHTTPS 🔒HTTPS 🔒

Take the Advanced Gateway API Use Cases lab to try out this use case and more:

Advanced Gateway API Use Cases Lab

Try some advanced Gateway API Use Cases, such as Traffic Splitting, HTTP Header Manipulation and TLS Passthrough!

Start Lab

Cilium 1.14 also introduces support for the latest Gateway API version (v0.7.0).

Cilium contributors are also actively contributing to Gateway API, and we are aiming to support the forthcoming 1.0 Gateway API release as soon as we can.

Networking beyond Kubernetes

As popular and ubiquitous as Kubernetes is, we see clear signals that users want to leverage Cilium’s eBPF-based powers in non-Kubernetes environments and that users need to integrate their Kubernetes clusters with their traditional VM-based infrastructure. Let’s explore how we are addressing these requirements, starting with Cilium Mesh.

Cilium Mesh

The reception to the Cilium Mesh announcement at KubeCon Europe 2023 exceeded our expectations.

We announced it during the CiliumCon and we heard great feedback from the community.

Liz’s live demo session might have been the last scheduled session of the week but it was absolutely packed.

We talked to dozens of users who loved its concept: a universal networking platform to connect workloads, whether they are running on Kubernetes or traditional VMs, whatever infrastructure or cloud they are running on.

Testing of the preview Cilium Mesh is progressing well and we plan on offering a private beta later this year.

L2 Announcements (Beta)

In Cilium 1.13, we introduced support for LoadBalancer IP Address Management (LB-IPAM) and the ability to allocate IP addresses to Kubernetes Services of the type LoadBalancer.

Cloud providers natively provide this feature for managed Kubernetes Services and therefore this feature is more one for self-managed Kubernetes deployments or home labs. LB-IPAM works seamlessly with Cilium BGP: the IP addresses allocated by Cilium can be advertised to BGP peers to integrate your cluster with the rest of your network.

For users who do not want to use BGP or that just want to make these IP addresses accessible over the local network, we are introducing a new feature called L2 Announcements in Cilium 1.14.

When you deploy a L2 Announcement Policy, Cilium will start responding to ARP requests from local clients for ExternalIPs and/or LoadBalancer IPs.

Typically, this would have required a tool like MetalLB but Cilium now natively supports this functionality – and shortly after we released this new feature in the early 1.14 releases, we heard from users that had removed MetalLB entirely.

To enable this feature, set the l2announcements.enabled=true flag and apply a L2 Announcement policy such as:

apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: policy1
spec:
  serviceSelector:
    matchLabels:
      color: blue
  nodeSelector:
    matchExpressions:
    - key: node-role.kubernetes.io/control-plane
      operator: DoesNotExist
  interfaces:
  - ^eth[0-9]+
  externalIPs: true
  loadBalancerIPs: true

Look out for a blog post series coming soon that will cover the various ways to integrate your clusters with your existing network.

To try this feature, head out to the lab:

L2 Announcement

In this lab, learn about L2 Service Announcement with Cilium !

Start Lab

BGP Enhancements

With every release, we add more features to our BGP implementation. With Cilium 1.14, the focus is on BGP operations with the introduction of:

  • BGP commands in Cilium CLI
  • BGP Graceful Restart support
  • eBGP Multi-Hop support
  • Customized BGP Timers support

For the sake of brevity, we won’t cover all these features, but fans of BGP please rest assured: you’ll be able to learn more about each feature in an upcoming blog post. We’ll only cover Graceful Restart briefly as it’s a relatively simple change that can provide significant benefits.

As the BGP daemon is embedded within the Cilium agent, a Cilium agent restart would typically cause the BGP session to immediately restart. The peer would remove the PodCIDR or Service IP addresses and access to the cluster via BGP would be interrupted.

This differs from the usual behavior with regards to the Cilium agent behaviour where a Cilium agent restart is not expected to affect connectivity (in other words – the data plane should not be affected by a control plane agent restart).

With BGP Graceful Restart, during a control plane restart the BGP data plane is not interrupted and traffic still flows.

To learn more, watch these videos:

Or try out this new lab:

Advanced BGP Lab

In this lab, you will learn why and how to use some advanced BGP features on Cilium!

Start Lab!

Cilium on Nomad

HashiCorp Nomad might be the most popular container scheduler behind Kubernetes and is particularly popular for its simplicity and its job scheduling capabilities. We are delighted that the WebAssembly startup Cosmonic has launched a tool that lets you run Cilium on Nomad. Netreap provides a similar role to the Cilium Operator but for Nomad environments.

To learn more, head over to Netreap’s GitHub repo or watch the video below:

CNI Networking and Security

Multi-Pool IPAM (Beta)

We have frequently been hearing from Cilium users with requirements for more intricate Pod network configurations. A common request is to build an IP Address Management (IPAM) mode that goes beyond the standard of Kubernetes. So we’ve built a new feature – Multi-Pool IPAM – which allows a Pod to select an IP address from a specific pool.

Traditionally in Kubernetes, Pods select an IP address from a large pool of IPs. In enterprise environments, there’s often a requirement that a Pod needs to have a particular IP from a certain block (for example, to integrate Kubernetes with standard firewalls).

The first step to address this requirement is to provide multiple IPAM pools. This feature enables the Cilium operator to allocate PodCIDRs from multiple different IPAM pools, depending on properties of the workload defined by the user, for example with annotations.

To learn more, check out this demo below or the Cilium IPAM docs.

This feature lays the groundwork for Cilium to support multi-homing configuration. Pods can sometimes require multiple network interfaces, like Pods that require multiple interfaces for SR-IOV support (bypassing some of the container networking performance bottleneck) or Pods in 5G environments that need multiple interfaces for VNC/CNF use cases.

Typically, this has been addressed by installing yet another tool such as Multus.

Expect more news on Cilium MultiHoming in the Isovalent Enterprise for Cilium 1.14 edition.

BIG TCP for IPv4 (beta)

In Cilium 1.13, we were excited to introduce support for an emerging networking technology: BIG TCP. In fact, we were so buzzing about it we wrote a dedicated blog post about it.

To recap briefly – BIG TCP was created to reduce the TCP/IP stack overhead for networks on 100 Gbit/s bandwidth and beyond. By grouping packets into a super-sized packet, BIG TCP can lead to huge throughput improvements and lower latencies for high speed networking environments.

For once, this technology was available in IPv6 before it was available in IPv4 (IPv6 conveniently had a header where we could specify a bigger payload length and overcome the existing limits with packet sizes).

Fast-forward 5 months later and we have exciting news to share: BIG TCP for IPv4 was introduced in the Linux kernel 6.3 and Cilium 1.14 now supports BIG TCP for both IPv4 and IPv6 on hosts with that kernel version.

It is, as far as we know, the only publicly available networking platform with BIG TCP support.

Enabling this functionality is as simple as setting the --set enableIPv4BIGTCP=true flag. Our performance testing with netperf provided a 50% throughput improvement.

Without BIG TCP:

# kubectl exec netperf-client -- netperf  -t TCP_RR -H ${NETPERF_SERVER} -- -r80000:80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.2.246 (10.0.2.) port 0 AF_INET : first burst 0
Minimum      90th         99th         Throughput 
Latency      Percentile   Percentile              
Microseconds Latency      Latency                 
             Microseconds Microseconds            
64           160          291          7698.72

With BIG TCP:

# kubectl exec netperf-client -- netperf  -t TCP_RR -H ${NETPERF_SERVER} -- -r80000:80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.2.27 (10.0.2.) port 0 AF_INET : first burst 0
Minimum      90th         99th         Throughput 
Latency      Percentile   Percentile              
Microseconds Latency      Latency                 
             Microseconds Microseconds            
47           106          192          11446.82

If you’d like to learn more, check out the Cilium BIG TCP documentation or watch the video below:

Alternatively, you can also try out the new lab:

BIG TCP lab

In this lab, you will learn about BIG TCP ! BIG TCP – a revolutionary networking technology – is now available with Cilium to provide enhanced network performances for your nodes. In this lab, you will learn how BIG TCP can improve throughput by 40-50% in your network. Try it to learn more!

Start Lab!

Deny Policy Graduated to Stable

Deny Policies were introduced in Cilium 1.9 to address specific use cases where users want to explicitly block traffic from entities (for example, in the event of an attack) or to/from IP address ranges (typically for regulatory reasons and to prevent Pods from accessing the IP addresses tied to a specific region). While most users will use a least-privilege “allowlist” model (where policies are based on the expected traffic between workloads), deny policies have found their uses and they have now been promoted to Stable.

To learn more, check out the Cilium deny policies documentation.

Day 2 Operations and Scale

Cluster Mesh Scale Improvements with KVStoreMesh (Beta)

One of the many benefits of open source is how innovation and improvements are not restricted to the core maintainers of a project. Cilium has now had over 600 contributors since its inception over 7 years ago. One of the earlier large-scale users of Cilium is Trip.com, a Chinese provider of travel services including accommodation reservation, transportation ticketing, packaged tours and corporate travel management. Trip.com started using Cilium in 2018 (they called adopting Cilium a “10-year leap in terms of networking stack“) and have been expanding its use on huge environments with over 10K+ nodes, 250K+ pods on Kubernetes clusters.

They actually pushed the limits of Cilium and ran into shortcomings of Clustermesh at scale. As Arthur of Trip described in this video at the eBPF Summit 2022, Trip created their own fork of Cilium with a ClusterMesh alternative solution called KVStoreMesh to address some of the limitations of ClusterMesh.

In the standard Cluster Mesh, each agent would pull the information from each remote cluster directly. At hyper scale, this can lead to reliability issues.

KVStoreMesh caches the information obtained from the remote clusters in a local kvstore (such as etcd), to which all local Cilium agents connect.

While the stability issues only apply to the very largest Cilium Cluster Mesh users, the overall design and principles of KVStoreMesh made perfect sense to us.

In Cilium 1.14, Cluster Mesh now integrates the KVStoreMesh concept into our own Cluster Mesh implementation. Cilium Cluster Mesh in KVStoreMesh mode enables improved scalability and isolation, and targets large scale Cluster Mesh deployments supporting up to 50k nodes and 500k pods.

Even smaller deployments of Cilium Cluster Mesh can benefit from KVStoreMesh as it provides a lower and more predictable control plane resource utilization for Clustermesh.

To find out more, check out the Cilium Cluster Mesh documentation.

Cilium CLI Helm Mode

With Cilium adoption continuing to grow and dozens of new features added in each release, it’s essential we keep optimizing the user experience, especially for Day 2 Operations, including configuration and upgrades.

The Cilium CLI tool has become the go-to tool to install, test, and configure Cilium. Users particularly like the way it auto-detects your environment settings and deploys Cilium in the right configuration (for example, deploying Cilium in tunnel or native routing mode where appropriate).

But users would sometimes run into inconsistent behaviours when using a tool such as Helm to configure Cilium. Previously, in what we now call classic mode, the Cilium CLI would directly call Kubernetes APIs to manage resources related to Cilium.

In the new helm mode, the CLI delegates all the installation state management to an embedded Helm client. This enables you to use the Cilium CLI and helm interchangeably to manage your Cilium installation, while taking advantage of the CLI’s advanced features such as Cilium configuration auto-detection. This increases configuration consistency and flexibility. As new features are added to the official Cilium Helm chart, they can be immediately used by the cilium-cli.

The helm mode is now the default with Cilium CLI v0.15 onwards and we expect the classic mode to be deprecated in an upcoming release.

To learn more, head over to the Cilium CLI documentation or check out the short demo below!

CiliumNodeConfig for Migration

We first gave you a glimpse of CiliumNodeConfig in the Cilium 1.13 release blog post. Its primary use case was to help users migrate gracefully from an existing CNI to Cilium.

In Cilium 1.14, we’ve made some further changes to further reduce the disruption during migration.

We now have a documented migration workflow, a lab, and a blog post that showcase how to migrate from Flannel to Cilium. Expect similar content to help users move off of Calico and onto Cilium in the coming weeks.

The feedback from users has been phenomenal and we’ve already heard anecdotes from users that have migrated to Cilium “with near zero downtime“.

If you’d like to learn more, read this detailed tutorial :

Tutorial: How to Migrate to Cilium

Read how to migrate gracefully to Cilium!

Read Tutorial!

Or take the following lab:

Migrate to Cilium Lab

In this lab, you will learn a new approach to migrate to Cilium!

Start Lab!

Hubble & Observability

We cannot talk about operating Kubernetes and Cilium without talking about Hubble, the distributed networking and security observability platform. We recently started a blog post series re-introducing the Hubble platform (Cilium Hubble Part 1 and Cilium Hubble Part 2, Enterprise) and we are excited to see some of the recent Hubble improvements.

Mutual Authentication support

Our Mutual Authentication implementation is based on Cilium Network Policies. As Hubble is the best tool to manage such policies, it was essential that Hubble support would be there from day one.

Hubble CLI shows you when traffic between two endpoints that have not completed their mutual handshake is dropped:

$ hubble observe -t drop --from-pod default/tiefighter
default/tiefighter:43412 (ID:10685) <> default/deathstar-f694cf746-5wxks:80 (ID:53245) Authentication required DROPPED (TCP Flags: SYN)

And Hubble CLI also lets you know when traffic between 2 endpoints has been mutually authenticated:

$ hubble observe -t policy-verdict --from-pod default/tiefighter
default/tiefighter:57248 (ID:10685) -> default/deathstar-f694cf746-6nkxz:80 (ID:53245) policy-verdict:L3-L4 INGRESS ALLOWED (TCP Flags: SYN; Auth: SPIRE)

You can also get the level of insight on the Hubble UI: the padlock on network flows will confirm a successful mutual authentication.

How to Troubleshoot Kubernetes Networking

One of the consistent feedback areas we hear about from users is, while observability might not be their initial use-case to adopt Cilium, it quickly becomes one of their most loved capabilities of the platform. The rich data provided by Cilium and eBPF, and visualized through Hubble has enhanced users’ understanding and troubleshooting potential within their platforms.

During the KubeCon session on “Surviving Day 2 – How to Troubleshoot Kubernetes Networking”, we covered some of the use cases that Hubble is uniquely built to address. This includes :

  • Network Policy Troubleshooting
  • Ingress Troubleshooting
  • DNS Troubleshooting (it’s always DNS!)

Watch the session to learn more:

Grafana Network Observability with Hubble

One of the areas that Cilium and Hubble are designed to help with is tool sprawl. Therefore, it’s important to ensure that these platforms have a way to integrate into existing widely adopted software components. When we think about data visualization, it’s no surprise that Grafana is one of the most widely implemented tools out there. In the below recorded Kubecon Demo session, you can hear from Anna, an Isovalent Engineer, about the integration points and data visualization that’s possible with Cilium, Hubble, and Grafana.

New and Updated Labs!

If you’d like to try some of the features highlighted in Anna’s talk, head over to the new Golden Signals lab and receive a shiny new badge!

Golden Signals

With Hubble, you can monitor HTTP Golden Signals with minimal effort! Try it yourself in this Hubble & Grafana lab.

Start Lab

With the general availability release of Hubble Timescape, which provides historical flow information for your Cilium clusters, we have updated the existing Connectivity Visibility lab with a new challenge, allowing you to explore the feature through the Hubble UI and CLI.

Connectivity Visibility

Hubble Timescape provides the ability to store and access historical flows and process information! Try out the new features in the Connectivity Visibility Lab.

Start Lab

Cilium in the Cloud

All three major cloud providers (Microsoft, Google and AWS) singled out Cilium as the standard for Kubernetes networking & security for AKS, GKE and EKS Anywhere respectively.

Cilium on AKS

We are delighted by the outcomes of our recently-announced partnership with Microsoft: Azure CNI powered by Cilium is now generally available and any Azure Kubernetes Service customer running Azure CNI powered by Cilium can seamlessly upgrade to Isovalent Enterprise for Cilium. It’s not the only way to run Cilium on AKS – you can also bring your own CNI to install Cilium in the configuration of your choice.

We have published several tutorials (Cilium on Azure, Part 1 and Cilium on Azure, Part 2) to explain how to get the most out of Isovalent Enterprise for Cilium on AKS.

To find out more, head to our Azure Partner page.

Cilium on EKS

Cilium has been the built-in default networking and security platform on EKS Anywhere since its launch nearly two years ago.

We work closely with AWS on improving support, documentation and capabilities of Cilium on EKS Anywhere. For example, we recently enabled limited support for integration with external routing daemons and for cluster health checking on EKS Anywhere.

For more information, head out to our new AWS partner page.

Cilium on GKE

For the past three years, Google Kubernetes Engine (GKE) has been relying on Cilium for networking and security. One thing that was not available when using GKE’s Dataplane V2 was Hubble.

That has recently changed with the introduction of Hubble support in Dataplane V2.

Follow this link to the Google documentation to learn more.

Tetragon

How time flies! It’s already been 15 months since Tetragon was released publicly at KubeCon Europe 2022. The container runtime security and observability tool has grown in popularity (with already 2,500 GitHub Stars!) and adopters.

In fact, Tetragon is reaching a maturity milestone: we expect Tetragon to reach version 1.0 in the coming months. Let’s cover a few other Tetragon highlights.

New Features

Tetragon v0.10.0 has recently been released and one of its most remarkable features is the support for Kubernetes namespace and pod label filtering in tracing policies. If you’re not familiar with tracing policies, it’s how Tetragon users can specify which kernel events to trace and the action it can take on a match (for example, monitor or enforce).

In Tetragon v0.10.0, users can apply tracing policies only on a subset of pods running on the system via two mechanisms: namespaced policies, and pod-label filters. Tetragon implements both mechanisms in-kernel via eBPF. This is important for both observability and enforcement use-cases.

For observability, copying only the relevant events from kernel-space to user-space reduces overhead. For enforcement, performing the enforcement action in the kernel avoids the race-condition of doing it in user-space.

For example, we could use a policy tracing the lseek system call and filter based on a Pod label. The tracing policy below will kill the process that performs an lseek system call with a file descriptor of -1. You will notice how Tetragon Tracing Policies are beginning to look more and more like Cilium Network Policies!

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "lseek-podfilter"
spec:
  podSelector:
    matchLabels:
      app: "lseek-test"
  kprobes:
  - call: "sys_lseek"
    syscall: true
    args:
    - index: 0
      type: "int"
    selectors:
    - matchArgs:
      - index: 0
        operator: "Equal"
        values:
        - "-1"
      matchActions:
      - action: Sigkill

When deploying a Pod without the label, you will see how it is not affected by the Tracing Policy (Tetragon will let it execute this particular system call).

$ kubectl run test  --image=python -it --rm --restart=Never  -- python
If you don't see a command prompt, try pressing enter.
>>> import os
>>> os.lseek(-1, 0, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  OSError: [Errno 9] Bad file descriptor
  >>>

However, the policy will apply to Pods with this label (Tetragon will immediately kill the process):

$ kubectl run test --labels "app=lseek-test" --image=python -it --rm --restart=Never  -- python
If you don't see a command prompt, try pressing enter.
>>> import os
>>> os.lseek(-1, 0, 0)
pod "test" deleted
pod default/test terminated (Error)

New Tetragon Landing Page

Check out the new Tetragon landing page, with new getting started guides and detailed documentation including use cases, tutorials and contribution guides.

New Tetragon Landing Page

Head over to the new Tetragon landing page to learn more about the container runtime observability tool!

Go

Tetragon at KubeCon Europe 2023

Tetragon was pretty popular at KubeCon Europe. We had over 300 attendees at the live Tetragon tutorial (more on that below) and heard some fantastic feedback from Natalia and John’s session on how eBPF could have detected and prevented Log4Shell – and perhaps your next CVE.

Get Your Tetragon Badge with the Tetragon lab!

Hands-on labs are an amazing way to learn new technologies and we hope that you find the Isovalent labs useful. Do you know we’ve had over 13,000 lab runs in this calendar year alone?

Our Getting Started with Tetragon lab has been revamped and now includes a badge for you to share on social media once you’ve completed the challenge at the end of the lab. Give it a try!

Tetragon Lab

Learn how Tetragon can detect and prevent attacks with this lab!

Start lab

We also offer advanced labs on Security Visibility and TLS Visibility features that are only available in our enterprise edition of Tetragon. Try them out!

Community

KubeCon + CloudNativeCon EU

Cilium had 16 talks, a project meeting, and a booth at KubeCon + CloudNativeCon EU in Amsterdam. The booth was buzzing the whole time and we can’t wait to see the community again in Chicago.

CiliumCon

The first ever CiliumCon was held at KubeCon + CloudNativeCon EU in Amsterdam on 18th April. It covered stories from end users sharing what they learned running Cilium in production and from contributors diving into Cilium’s technology and history.

The event was so popular that CiliumCon NA is coming back for a second now full day edition at KubeCon + CloudNativeCon NA. The CfP is still open until August 6th so be sure to get your submission in today! If you want help, reach out to Bill Mulligan on the Cilium slack.

Contributor Ladder

To help make it easier for people to grow within the Cilium community, we have now added a contributor ladder. The ladder lays out the different levels of involvement in the community from organization member to committer and also defines different roles within the community like CI and security teams.

Graduation

The Cilium community has applied to become a CNCF graduated project by creating a PR in the cncf/toc repository. This is a major milestone for the Cilium community and users. The entire community is grateful to everyone who has helped to get the Cilium this far and we are looking forward to working through the graduation process with the CNCF community. Add a 👍 to the PR to show your support!

User Stories

Since the 1.13 release, many new end users have stepped forward to tell their stories running Cilium in production:

  • Ascend – Reducing debugging from 4-16 hours down to 20 seconds with Hubble
  • Bloomberg – Building data sandboxes with Cilium Network Policies
  • ClickHouse – Secured 10+PiB of streaming data and 30+ trillion inserted records in the first months of deployment
  • Eficode – Using Cilium for every customer engagement
  • Form3 – How to build a cloud agnostic environment for developers while ensuring high throughput, reliability, and simplified maintenance.
  • Microsoft – Azure CNI Powered by Cilium in Azure Kubernetes Service now GA
  • Publishing Industry – Securing 100,000+ RPS in a Multi-Tenant Environment
  • Robinhood – Networking to improve pod density, scalability and cost efficiency
  • Tietoevry – Better policies and less tool sprawl

Getting Started

To get started with Cilium, use one of the resources below:

Previous Releases

Thomas Graf
AuthorThomas GrafCTO & Co-Founder Isovalent, Co-Creator Cilium, Chair eBPF Governing Board

Industry insights you won’t delete. Delivered to your inbox weekly.