File Monitoring with eBPF and Tetragon (Part 1)
This is the first in a series of blog posts where we deep dive into Tetragon’s file monitoring capabilities, focused on implementing low-overhead, highly scalable monitoring directly in the kernel with eBPF. We broadly refer to this set of features as File Integrity Monitoring (FIM), since FIM is an established term and is the main use-case these features address. As we will see in this and later posts, however, Tetragon’s FIM goes beyond traditional monitoring by supporting enforcement (e.g., blocking file operations), inode-based file monitoring, and extensive in-kernel filtering to keep overhead barely visible.
Generally speaking, file monitoring allows users to monitor access and modification to files. This can be, for example, used by security teams that want to be informed whenever a file is read or modified. Security and platform teams implement FIM to adhere and prove regulatory compliance standards NIST, PCI-DSS, HIPAA, CIS, SOC and more.
Tetragon implements its FIM functionality using eBPF for in-kernel filtering and policy enforcement. This blog starts with a brief summary of existing approaches to FIM that do not depend on eBPF, and continues on to illustrate why eBPF offers unique opportunities for file monitoring. We will present the basic functionality of Tetragon’s FIM, and cover how we address the technical challenges of monitoring files.
Subsequent posts in this series will cover advanced features such as monitoring pod files, inline enforcement, and controlling execution.
Why is the cloud-native community moving to eBPF for file monitoring?
Early approaches for file monitoring are based around periodically scanning the file-system and comparing the expected state with the actual state. This approach has a number of limitations, including that it can only be used to detect modifications and not reads to files. Moreover, periodic scanning is unreliable because a modification can go undetected if the file is modified, then returned back to its original state before the scanning occurs. A sufficiently quick attacker can read/modify the target file, and clean up their tracks before the periodic scan occurs.
Later approaches for file monitoring use specialized in-kernel mechanisms such as inotify
. Applications can use inotify
to register files and events that define a “watch-list” of what the kernel should monitor. Once an operation that matches the watch-list happens, the kernel will generate a notification event to the application (denoted as “agent ” in the image below).
While inotify
addresses the unreliability of the scanning approach by being executed inline with the operation, it still lacks expressiveness and flexibility. A major limitation is that there is no way to associate or filter operations using the execution context (e.g., pid
or cgroup
) of the process doing the operation. This means that there is no way to filter events based on which Kubernetes workload performed the file access.
Another limitation of inotify
is the lack of flexibility in the actions taken when a file is accessed. When a monitored file is accessed, it will send an event to user-space and it’s up to the user-space agent to do the rest. Let’s consider the example where a user wants to monitor all files under /private
. When monitoring the directory /private/data
, the sequence of operations would be:
- Agent adds
/private
into the directories to be watched - Application creates
/private/data
directory - inotify sends an event to the agent that a directory
/private/data
was created - Agent adds
/private/data
to the directories to be watched
If a file was created and/or accessed in /private/data
between steps 2 and 4, there will be no inotify
event for the access prior to it being added to the watch list.
One way to address these (and other) limitations would be to extend the inotify
capabilities by modifying the kernel. For example, it would be possible to modify the kernel to add execution context to inotify
events. Generally speaking, however, adding functionality to the Linux kernel is a long and tenuous process and even if it is successful it might take years until a newly released kernel reaches production and end-users.
This is where eBPF comes along. eBPF is a technology, first implemented in the Linux kernel, that allows user-space applications to define code to be executed in-kernel at specific points. The code is ensured to be safe by a verifier component that checks it. eBPF allows implementing kernel functionality outside of the kernel and delivering it to end-users at a much higher pace.
eBPF is extensively used in networking and security, and in our case it enables Tetragon to build FIM without the limitations of inotify
. For example, eBPF allows Tetragon’s FIM implementation to correlate file access events with execution context such as process information (e.g., credentials) and its cloud native identity (e.g., k8s workload), perform inline updates to its internal state to avoid races, as well as implement inline enforcement on file operations. We discuss the specifics below.
How to monitor files with Tetragon generic tracing policies?
In this section, we discuss how to use Cilium Tetragon generic tracing policies that install eBPF hooks (kprobes, specifically) to track file operations and implement FIM.
One approach is to install these hooks into system calls. For example, a hook can be installed in the open
system call to determine when a file is opened and for what access (read or write). Hooking into system calls, however, might lead to time-of-check to time-of-use (TOCTOU) issues. This is because the memory location with the pathname to be accessed belongs to user-space, and user-space can change it after the hook runs, but before the pathname is used to perform the actual open in-kernel operation. This is depicted in the image below, where the bpf hook checks the “innocent”
path, but the kernel operation actually happens with the “suspicious”
path.
Hooking into a (kernel) function that happens after the path is copied from user-space to kernel-space avoids this problem since the hook operates on memory that the user-space application cannot change. Hence, instead of a system call we will install a hook into a security_ function
. Specifically, we will hook into the security_file_permission
function which is called on every file access (there is also security_file_open
which is executed whenever a file is opened).
We can use the following Tetragon tracing policy (spoiler: do not actually try this policy, and definitely not in a production system; see below, it will generate events for every file access) named file-all
:
This policy will generate an event for file access in the system. An example event is:
Note that, the generated event includes information about the execution context (indeed that’s the case for all tetragon events): information about the process and its parent such as binary, arguments, credentials, and others. In cloud-native environments, the events also contain information about the container and the pod that this process belongs to. Using FIM in cloud-native environments will be the subject of a subsequent blog post, so stay tuned!
When the above policy is applied, Tetragon will generate an event for every file access in the system. There are many file accesses happening in a system at any point in time, and monitoring all of them is not a good practice because generating an event for each one incurs significant overhead. Another limitation of the file-all
policy is that it does not inform users about what file was actually accessed.
To address the aforementioned limitations, we create a second version of the policy where sensitive ssh server private keys are monitored, allowing security teams to detect unwanted access to these files. We also configure Tetragon to include the path of the file accessed to the event so that users have the full information. To do so, we use the first argument of security_file_permission
, which is of type struct file
(this is an in-kernel type). Tetragon includes eBPF code for retrieving the filename from this kernel struct, which in turn enables it to filter events based on the file path in the kernel as well as providing the value to the user.
An example event of this policy is:
It is worth highlighting that filtering in-kernel (as performed by the above policy) is important because it minimizes the overhead. Deciding at the eBPF hook whether the event is of interest to the user or not, means that no pointless events will be generated and processed by the agent. The alternative is to do the filtering in user-space tends to induce significant overhead for events that happen very frequently in a system (such as file access). For more details, see Tetragon’s 1.0 release blog post. You can see the difference between in kernel and user-space filtering in the following figure.
There are different hook points that can be used to achieve different functionality. As discussed before, using security_file_permission
means that the eBPF hook is called on every file access in the system. An alternative approach would be to use security_file_open
and have the eBPF hook be executed whenever a file is opened. This has the advantage that the hook is executed only once for every file accessed (instead for every access), which induces less overheard. On the other hand, however, it means that if a file is already opened before the hook is installed, the hook will not be called and certain accesses may be missed. Similarly, Tetragon can install hooks into other functions such as security_file_truncate
or security_file_ioctl
for other operations.
Another major benefit of in-kernel filtering is that Tetragon can go beyond observability and do inline enforcement by stopping an operation from happening. This is achieved by having the Tetragon eBPF program overriding the return value of a function to return an error without being executed. An example is shown in the policy below which disallows binaries named /usr/bin/cat
to access the ssh key files. A subsequent blog post will cover enforcement in more detail.
Note that it is impossible to do proper enforcement without in-kernel filtering, because by the time the event has reached user-space it is already too late if the operation has already executed.
To summarize, we have shown how using Tetragon kprobe tracing policies offers a lot of flexibility to implement FIM. They allow association and filtering based on the process execution identity as well as going beyond observability and performing enforcement. They also offer some features (e.g., prefix matching in the path, tracking return values) that we did not discuss above. More details can be found in the Tetragon documentation.
While the generic tracing policies we discussed in this section offer powerful mechanisms, they do come with some limitations. One issue, widely recognized by the Tetragon community, is that they are hard to use, requiring internal knowledge about kernel functions and structure. We plan to address this limitation in future Tetragon releases (see ticket #2185, “Add new user-friendly policies see”).
Another issue is that, because the aforementioned policies operate on the access path (i.e., they are path-based), they may miss accesses that use a different path to access the same file. We expand on this, as well as presenting our alternative (inode-based) approach which is available in Isovalent Enterprise for Tetragon in the next two sections.
What’s in a (path)name?
In the previous examples, the policies we used were path-based, i.e., they selected the subset of the monitored files based on the path extracted from struct file
arguments of functions such as security_file_open
. It is the case, however, that the same file can have multiple names in a Linux system. Path-based policies only work if the application accesses the files using the names defined in the policy. For example, if a policy monitors /etc/ssh/ssh_host_rsa_key
but the same underlying file is accessed via a different name, the access will go unnoticed.
Concrete examples where the same file can have multiple names are hard links, bind mounts, and chroot. Hence, if we create a hard link to the file /etc/ssh/ssh_host_rsa_key
named, for example, /mykey
accesses via /mykey
will not be caught by policies such as file-ssh-keys
. It is worth noting that all the above methods require certain privileges: creating hard links requires appropriate permissions (when fs.protected_hardlinks
is set to 1
, creating a link requires certain permissions on the target file), bind mount requires CAP_SYS_ADMIN
, and chroot requires CAP_CHROOT
. Certain use-cases, however, require the ability to monitor file accesses regardless of the name with which the file is accessed. To address these use-cases, we developed inode-based FIM which is discussed in the next section.
What is Inode-based file monitoring?
To be able to monitor file accesses to a file, regardless of the pathname that is used we need a way to track the underlying file object. One way to do so is by using the inode number. Inodes uniquely identify the underlying file within a single filesystem. For example:
That is, as can be seen by the figure below, both /mykey
and /mykey2
refer to the same underlying file that has the same inode number.
The approach to inode-based file monitoring, as implemented by Isovalent Enterprise for Tetragon, is to maintain a set of inodes that correspond to the filenames that were defined by users and perform filtering using those. To aid with the maintenance of this set, Tetragon deploys a user-space component called file-scanner
. A high-level overview of how this works is provided below:
- Users provide FIM policies, that include a set of file patterns specifying what files should be monitored (❶)
- These patterns are passed to the file-scanner utility (❷)
file-scanner
scans the filesystem, and retrieves the inodes of the files defined by the user (❸)- The inodes are inserted into a eBPF map so that they can be accessed by the eBPF programs (❹)
- When an application accesses a file (❺), the eBPF hook will trigger and the eBPF program will run. It will lookup the inodes map, and determine whether the access is in a file covered by the policy (❼). If it is, it will take the appropriate action, e.g., generate an event to user-space (❽) that will be translated to a user event by the Tetragon agent and provided to the user (❾).
The astute reader will notice that the above approach will work only as long as the files are not modified. For example, if a file is removed and recreated, it will end up having a different inode number. This has two implications. First, the new inode will not be in the watch list, and, second, the old inode might be recycled to another file. Hence, Tetragon will also add eBPF programs to monitor operations that can change the inode watch list. If that happens, the watch list will be updated accordingly by adding or removing inodes.
Above illustrates another benefit of using eBPF compared to inotify
(see inotify architecture Figure). With eBPF the state of what is watched can be updated in the kernel inline with the file operation, while when using inotify
this has to happen in user-space which introduces races.
It is worth noting that there are some cases, specifically with some instances of file operations using the rename system call, where the state cannot be directly updated from the eBPF program. Tetragon can handle these cases by either blocking these rename
operations (more on enforcement on a subsequent post) or via the traditional approach of scanning the file-system in user-space to determine the new state.
How Tetragon uses inode-based FIM policies?
Isovalent Enterprise for Tetragon supports both path-based and inode-based FIM policies. This section shows an example of how inode-based policies overcome the issues of path-based described in the previous sections. In inode-based policies, we only care about the files/directories. There is no need for the user to define the appropriate hooks as we showed in the previous sections. Tetragon handles all of these internally.
We will use the same example as before (i.e. we care about some sensitive files inside /etc/ssh/
) and the inode-based policy is:
The field monitorHostFiles shows that we care about files in the host file system. This will be more relevant in the next blog posts about monitoring files inside Kubernetes workloads.
This tracing policy will generate events for all file-related operations that can happen on those files. These operations include not only read and write calls but also metadata operations like renaming, changing permissions/owner/group, deleting, or creating new files with the same name.
After applying this tracing policy we also run the following commands:
Some of the events that we get from running these commands are:
We can see that we get at least one read event for each of these cat
commands. We can correlate the commands that the user runs by checking .process.binary
and .process.arguments
fields. These events also provide more information about the operation. For example:
- The actual operation (i.e.
.action
field). - The path that is accessed (i.e.
.args.generic_arg.file.str
field). - The inode numbers of the actual file (i.e.
.args.generic_arg.inode
field) and the directory that contains the corresponding file (i.e..args.generic_arg.parent_inode
field). - The location of the file (i.e.
.args.generic_arg.file.location.type
, but this will become more relevant in the next blog posts where we show how we can monitor files inside pods).
Summary and future of Tetragon FIM blogs
Tetragon’s approach to file monitoring through eBPF offers a more secure and scalable way to bolster security and compliance practices. By leveraging eBPF, Tetragon resolves some of the known limitations of traditional methods around periodic scanning and inotify, offering scalability and fine-grained control over sensitive files.
From inline enforcement to in-kernel filtering, stay tuned for the upcoming blogs on this topic that dive deeper into the advanced features around namespace policies, pod-label filters, and more as we expand on what is possible in the kernel with eBPF.
For a deeper dive into Tetragon’s capabilities, explore the documentation or try out Tetragon in one of our SecOps labs. Of course if you have any questions, reach out and schedule a time to connect or see a demo of Tetragon in action.
Tetragon (OSS) | Tetragon (Isovalent Enterprise) | |
Path-based FIM | ✅ | ✅ |
Inode-based FIM | ➖ | ✅ |
Anastasios Papagiannis is a Senior Software Engineer at Isovalent – the company behind the open-source cloud-native solution Tetragon.
Anastasios is leading the file-monitoring aspects of Tetragon. Before that, he worked in Meta on the containerization of datastores. His main interests are in the general area of computer systems with emphasis on storage systems. Anastasios obtained his Ph.D. degree in the Computer Science Department at the University of Crete in 2021. During his studies, he was awarded with the Meta Research PhD Fellowship (2019-2021) and the Maria Michail Manasaki Doctoral Fellowship (2018).