One less bottleneck to worry about.

FUSE Passthrough is Coming to Linux: Why This is a Big Deal

John Boero
5 min readFeb 29, 2024

--

Anybody who knows my love of Linux knows my passion for everything from hardware devices to REST APIs being presented simply as a filesystem. I’ve often used FUSE / Filesystems in User Space to simplify browsable CRUD operations for a complex system or API. Linus Torvalds says FUSE is a toy and not practical for real-world filesystem usage. I agree completely. It’s terribly inefficient at block storage, with its firehose of user-mode/kernel-mode switches with each block operation (4K by default). The good news is this is all about to change, with a passthrough option that will eliminate the chaotic mode switching once and for all even for existing FUSE clients. There is a good chance your storage access is about to get 10–15x faster.

FUSE is a super simple interface to any developer that wants to implement a filesystem that users can mount without elevated privileges which also protects the kernel from crashes on poorly written code. In fact it’s secure to the extent that even root can’t access another user’s FUSE mounts by default. Developers can implement their FUSE hooks in the language of their choice which is helpful for devs and libraries that don’t use the kernel’s de-facto language of C. Critics of FUSE will say that a true FS implemented in a kernel module can do anything that FUSE can do, but that’s not true unless every library and framework you intend to use are already available in C in the kernel. You may be able to create your own exceptions but good luck having them mainlined into an upstream kernel. I often make use of C++ libraries for JSON, cURL, database connections, and more that I actually wouldn’t even want to exist in my kernel anyway. Fetching a small ~2KB REST API response via DIRECT_IO in a FUSE client will only require a single user mode switch so often performance impacts are negligible.

https://source.android.com/docs/core/storage/fuse-passthrough

What about other projects that use FUSE? There are actually many projects in the community that use FUSE for block storage in a way that has always held them back. As Linus says, large-scale FUSE use is a toy. Back in 2013 I demonstrated how right he was for a Red Hat customer by making a NOOP FUSE to test the theoretical maximum performance of a FUSE filesystem on any given machine. At the time a high end Xeon server or VM would max out approximately 1.1GB/s read or write while actually doing nothing. The read and write operations simply return success, leaving untouched buffers full of garbage and uncleared memory. All of this happens with a single CPU pegged at 100% and in some cases overheating as it works. The same systems were capable of ~15GB/s+ native IO with relatively idle CPU making use of DMA. At the time I remember wishing a passthrough or socket existed to avoid the mode switches, and now the community has built exactly this feature for us.

Storage engineers will recall that Red Hat’s acquisition Gluster uses a FUSE client, meaning Gluster had a theoretical maximum 1.1GB/s even if it never needed to actually read or write storage, which adds more time to each operation. Essentially, Gluster was unusable in low latency situations and a lot of customers were disappointed enough that Red Hat went on to also acquire Inktank (Ceph), which offers block AND object storage with kernel-native drivers and iSCSI option. Ceph was a far more capable distributed filesystem than Gluster in the areas of performance and use cases. I’ve even seen HFT’s and hedge funds recruiting for Gluster which to me is a red flag — historically there has been nothing HFT or low latency about Gluster.

FUSE Passthrough is about to change this in an exciting way. Instead of swapping between the FUSE kernel module and your FUSE client application, there will be direct interop communication between the kernel and your FUSE app. This minor change will suddenly make options like Gluster attractive again. Existing FUSE clients will operate near kernel native performance with mode switch bottlenecks removed.

When and how will this benefit you?

Technically this feature was already included with Android v12, in a twist where mobile devices are the early adopters rather than servers. Since the feature will be mainlined in kernel 6.9, it will take a while to reach downstream distributions like RHEL and *Enterprise Linux derivatives. It will land in upstream projects like Fedora, which is the current basis for Amazon Linux and a few cloud distriutions. Leading edge users should exercise caution as it will take some time to test out security and performance of this new feature. Bimodal organizations should give this a test drive as soon as possible because the benefits are significant. Uncached IOPs will shoot through the roof and CPU utilization will drop noticeably, including energy consumption of I/O heavy FUSE systems. My small REST API FUSE clients don’t stand to gain much as they don’t perform large read/write operations.

List of FUSE Clients

If you aren’t aware if you’re using FUSE filesystems, here are some notable clients commonly used if a kernel module isn’t available. If you’re using one of these in Linux, prepare for it to become a lot faster and more efficient.

  1. Lustre — used heavily in cloud environments.
  2. S3FS — mounted S3 object storage as local block FS.
  3. Gluster — distributed block FS.
  4. Google-Drive-OCAMLFS — mounted Google Drive object storage.
  5. EncFS — locally encrypted files — used in KDE Vault, etc.
  6. OverlayFS — FUSE option for container overlays.
  7. HFS — classic Mac filesystems.

There are many more in the wild and some offer kernel modules in addition to FUSE clients. One way you can check your Linux systems is viewing the contents of /proc/filesystems. This file lists every native supported filesystem in your kernel either built-in or using modules. If your VM is using something like Lustre and you don’t see it listed in /proc/filesystems, chances are you are using it via a sub-optimal FUSE client. Be sure to check your systems as you may be able to scale a bit thanks to this newly added efficiency.

I for one will be sure to re-test my NOOP FS to see how major hypothetical performance increases. Though it won’t matter for practical purposes, it will be interesting to see the maximum theoretical FUSE performance of a passthrough system as a baseline.

--

--

John Boero

Field CTO for Terasky. American expat in London with 20 years of open source software experience.