In part 1 of this series, reintroducing Cilium Hubble to the cloud-native world, we answered many questions about Hubble—what it is, who uses it, and what are the main use cases for it?
We had focused on the widely adopted open-source version of Hubble. Let’s now review some of the scenarios, such as role-based access control, troubleshooting with historical data, and integrations with other systems that enterprise users might encounter and how Isovalent Cilium for Enterprise addresses them.
As always, this blog post includes many demos and follow-up labs to take if you’d like to learn more!
Let’s get started.
How Do I Easily Improve the Security Posture of My Cluster With Network Policies?
One of the primary use cases for Cilium remains its support for Kubernetes Network Policies and the more granular L7-aware Cilium Network Policies. This is usually early on the “path to production” – when enterprises use Kubernetes from a test to a production level.
You’d certainly not want a cluster with no internal traffic restrictions for critical workloads.
The challenge – as we also discussed in the Zero Trust Security Journey blog post – is no longer having to convince users that Zero Trust is required (it’s widely understood that a perimeter-based approach is of little use). The real barrier to adoption is the creation of network policies.
The Enterprise edition of Hubble is there to help, with a built-in Network Policy editor that allows you to build new policies from the live view of your environment. The Network Policy editor shows you all the flows captured for your given namespace and allows you to dynamically create a policy that permits or denies the traffic you are interested in, providing a YAML manifest that can be applied against your cluster.
The Network Policy editor is intuitive by design. You can simply select an existing traffic flow to add it to a policy, or you can start building your policies by specifying the selectors and traffic directions. The video below shows you the Network Policy Editor interface within Hubble UI Enterprise.
This feature covers both the standard Kubernetes Network Policies and the more powerful Cilium Network Policies (which extend to include the ability to create rulesets for layer 7 network traffic). Migrating to a zero-trust model couldn’t be easier, with the ability to build network policies from live network flows, set your default ingress/egress rules, and use selectors based on Kubernetes labels to identify which endpoints the rule applies to.
Isovalent Cilium Enterprise: Network Policies
In this hands-on demo we will walk through some of those challenges and their solutions.
Start Lab!Zero Trust Visibility
In this lab, you will use Hubble metrics to build a Network Policy Verdict dashboard in Grafana showing which flows need to be allowed in your policy approach.
Start Lab!I Need to Delegate Network Troubleshooting to Application Developers
In our conversations with Isovalent Enterprise for Cilium customers such as VSHN or TietoEvry, we heard a recurring goal – reducing the overall burden on the SRE team.
SRE teams tend to be relatively small—sometimes, an SRE might support dozens, if not hundreds, of developers. This only works if the SRE can hand over some of the troubleshooting and operational tasks to the developers themselves.
It’s not just that SREs are extremely busy—it’s also that app developers don’t want to raise a support ticket to investigate a connectivity issue.
While Hubble UI Open-Source provides a namespace-based view of network connectivity, the user’s view and privileges cannot be restricted.
In the Enterprise edition, multi-tenant self-service access is possible with the OpenID Connect (OIDC) integration, which allows you to integrate with your existing Identity and Authorization platforms such as Okta, Auth0 and others.
Hubble Enterprise uses policy-based authorization settings, which allow you to control what resources and data (such as flows and metrics) can be accessed and by whom in your organization. See the example policy code below.
This feature extends throughout all the components of the Hubble architecture, including the UI and Timescape.
In the short demo below, we cover the Hubble Enterprise UI, however, the Hubble Enterprise CLI also offers the same RBAC features:
- The platform administrator can access all namespaces and views
- The “Otel-Demo” Developer logs in and can access their namespace’s service map, but they do not see any other namespaces.
- The “Tenant-Jobs” Developer logs in and can access their namespace’s service map (and we can see the application is totally different).
- Finishing off with an overview of the configuration of the RBAC Policies applied via Helm.
You can see a longer deep dive video on our YouTube Channel.
Taking ownership of troubleshooting not only empowers application owners but also promotes a healthy and productive work environment. It is vital for application owners to actively engage in troubleshooting their own applications to foster a culture of collaboration and eliminate the blame game during post-mortems.
We’ve Had an Application Outage and Need To Find the Root Cause
On the subject of post-mortems, there will come a time where something goes wrong, and conducting a root cause analysis of the incident will be required.
This is an integral part of the roles of both SREs and application owners. It enables them to proactively address incidents, improve system reliability, drive continuous improvement, foster collaboration, and enhance their overall understanding of the system.
As useful as the Hubble Relay component is (provides multi-node support across the Hubble peers), it only provides real-time information; which is of limited use hours, days or weeks after the incident.
Hubble Timescape can take you back in time to the moment when the app started misbehaving, providing you full visibility into the network flow lifecycle for your application and associated services.
The video below captures a quick overview of how Timescape furthers your ability to troubleshoot. You can find a longer, deeper video on our YouTube Channel.
Let’s dive into the architecture of Hubble Timescape. It is designed to be either deployed into the same cluster in which it is monitoring or, for larger environments, can be deployed into a separate Kubernetes cluster.
Hubble Enterprise is configured to export the data into S3-compliant object storage or public cloud storage, such as Google Cloud Storage and Azure Blob Storage.
Hubble Timescape is built on top of ClickHouse, an OSS columnar database that can be deployed in-cluster or as a database external to the Kubernetes cluster, e.g., ClickHouse Cloud. The Hubble Timescape Trimmer is an optional component designed to enforce a pre-defined limit on the number of flows ingested into the database, regardless of time-based considerations.
Hubble Timescape deploys an ingester to load the Hubble flows and Tetragon process events into a Clickhouse Database. The Hubble Timescape server implements and serves the gRPC API and can be accessed via the Hubble UI or Hubble CLI.
You can now get hands-on with Hubble Timescape in our recently updated Isovalent Cilium Enterprise: Connectivity Visibility Lab. You can consume both the Hubble UI and Hubble CLI to troubleshoot using both live flows and historical flows from Timescape.
Isovalent Cilium Enterprise: Connectivity Visibility
This lab provides an introduction to Isovalent Cilium Enterprise capabilities related to connectivity observability using Hubble Enterprise.
Start LabHow Can I Visualise This Meaningful Data in My Existing Monitoring Platform?
We know that tool sprawl is a top consideration for users and is made worse when your different tools do not interoperate with one another. Last year, we announced a strategic partnership with Grafana Labs to provide infrastructure and developer teams deep insights into the connectivity, security, and performance of their applications. As part of this partnership, we developed the Hubble data source plugin (currently in beta in Grafana) to help monitor network and security events.
This plugin was created using the Grafana plugin development tools and integrates with three underlying data stores: Hubble Timescape, Prometheus (which stores Hubble networking metrics), and Grafana Tempo (which stores traces that can be correlated with different signals).
With this plugin, Hubble data can be widely adopted within your business through a common interface, reducing tool sprawl between platform and application teams. In the screenshot below, you can see a representation of our applications’ Service MAP and HTTP metrics from Hubble.
As a small preview in the next Grafana plugin release, we will also be adding a Tetragon dashboard, which will provide a process ancestry view in Grafana, similar to the process view capability in Hubble Enterprise UI.
In the next blog post, we will explore the Hubble Enterprise and Grafana features further, but if you are itching to take an early look at the Grafana integration, you can start with the following resources: Anna’s guest blog post on the Grafana website or the Golden Signals with Hubble and Grafana lab.
In this lab, you will learn how Cilium can provide metrics for an existing application with and without tracing functionality and how you can use Grafana dashboards provided by Cilium to gain insight into how your application is behaving.
Golden Signals with Hubble and Grafana
Learn how to monitor the four Golden Signals with Cilium, Hubble & Grafana.
Start LabWhere Can I Learn More?
If you haven’t already checked out the Part 1 blog post of this series, I urge you to do so now! You can get started today with Hubble. Head over to the official documentation to follow the steps to get Hubble installed in a few minutes once Cilium is up and running.
In the next blog post, Part 3, we will explore the new Grafana Hubble Data Source plugin and use cases that will enhance your Kubernetes troubleshooting for your platforms.
Alternatively, we have a number of labs that show the benefits of the observability and troubleshooting features of Hubble alongside features of Cilium, such as the “Observability with Hubble and Cilium service mesh” lab.
Observability with Hubble and Cilium Service Mesh
Cilium and Hubble provide observability for Service Mesh, without the overhead of sidecars. Start the lab to find out how.
Start LabDean Lewis is a Senior Technical Marketing Engineer at Isovalent – the company behind the open-source cloud native solution Cilium.
Dean had a varied background working in the technology fields, from support to operations to architectural design and delivery at IT Solutions Providers based in the UK, before moving to VMware and focusing on cloud management and cloud native, which remains as his primary focus. You can find Dean in the past and present speaking at various Technology User Groups and Industry Conferences, as well as his personal blog.