Implementing a File Watcher (MacOS & Linux)
Table of Contents
High Level Goals
I created jp-watch
as the first layer above the operating system in a set of layers targeting higher level goals listed below as use cases.
Being a low level layer, jp-watch
is written in C for maximum performance and efficiency.
Use Cases
Create a Personal Cloud
I want to create a personal cloud for both data storage and fault tolerant service hosting over WANs.
I want the servers involved to be geographically independent to avoid problems such as natural disasters or large scale network outages. The geographical independence can potentially be leveraged to reduce latency to services located on servers closer to the user.
I want it to be company independent in case there is some problem with the company such as it going out of business.
Sync Local Development Files with a Server
When coding on projects, development and testing can be done locally. But ultimately it's good to test in an environment closer to the production environment including the same hardware, same network and same DNS. Having a fast and efficient way to sync local files with production servers aids this process.
Existing File Watching Tools Reviewed
There are a few good tools about. Two of the most popular being inotifywait
and fswatch
.
inotifywait
inotifywait
is a great tool but is not available on MacOS.
When using an inotifywait
or an inotify
based file watcher, the code needs to recursively mark each file/directory to be watched at startup. Each file descriptor needs to be stored giving it linear complexity O(n) for both startup time and memory usage. Granted inotify is very efficient, this is still a factor to take into consideration for some use cases where you want to watch very large amounts of files on a device with limited resources.
The code for inotify
implementations is also somewhat complicated as the
code needs to scan and watch the paths at startup and also add new paths
while running as new files and/or directories are added. This adds
further potential for coding bugs. It also adds to code maintenance
costs.
fswatch
fswatch
is a great tool also. But the limitations I found were that when
reporting changes to files or directories it would not distinguish
between them by placing a trailing slash at the end as inotifywait
would. I could get it to do what I wanted on MacOS with a few options:
fswatch --event-flags --event=IsDir --event=IsFile <directory>
However on Ubuntu 22 this command would not report any IsFile
events, making it not useable.
Implementations of the High Level Goals
Implementations of the high levels goals went through various stages and evolved through various languages over time. I recount a brief history here:
Bash
I originally used a bash script with inotifywait
. As the bash script got
more complex I decided to move to either Ruby or Node.js as large bash scripts are a nightmare to maintain.
Ruby
Ruby is my favourite language for coding command-line tools that are not performance critical. It's an amazing scripting language with an amazing progression from single line scripts to complex object-oriented tools. I had a Ruby version of the script I used for the second use case I described in the introduction, but I wanted to consolidate all my code into a form that made things easier. I wanted to simplify everything including my language choices so there was less maintenance for my projects. I was trying to finish 12 half done projects and wanted to consolidate them both structurally and language-wise.
Node.js
Since I could use Javascript on the frontend, backend and for scripting, in the end I thought it a better option. Particularly as my Javascript skills had now reached a good level and I could produce code quite rapidly. The Javascript tools now also being fairly mature and powerful. So finally, a good choice seemed to be Node.js and creating npm code packages. This is why I rewrote the tool in Node.js rather than Ruby.
I made a Node.js version of the tool and it worked well. It ran into some issues though such as if too many rsync
s were launched in parallel the program crashed. This would happen if I installed some npm packages, for instance, that would add a whole bunch of new files into a watched directory suddenly. I managed to overcome this by executing the rsync
s serially rather than in parallel.
Implementation of jp-watch
Rust vs C
After coding the initial version of the tool in Rust, I decided to go for a pure C implementation. The main reason being that the binary sizes are smaller. C has no runtime to include in the binary unlike Rust and C++. This may not be a huge deal but my intention is that this software becomes a small command line tool / library to be included in operating system packages and as such it should avoid bloat as much as possible.
Getting setup with C and the overall development flow was ultra smooth and fast. With Rust I had to install Cargo and it had to install some packages. Not a big deal but it was a smoother and faster experience with C.
Performance is debatable between C,C++ and Rust but C seems to have an edge and it feels much like driving a manual car vs automatic.
Algorithms
MacOS
The MacOS version just sets the relevant paths to be
watched from the CLI arguments. FSStream
handles the rest.
Linux
The Linux version uses fanotify
to watch the whole filesystem and filters out irrelevant paths as they are reported.
Previous file watchers were based on inotify
, kqueue
or epoll
where you needed to recursively add the contents of paths to be watched, and manually add new paths to be watched if they are added after the app starts. This added delays at startup, added code complexity, more bug potential and more maintenance costs.
APIs
FSStream (MacOS)
The MacOS FSStream
API allows monitoring of whole directory trees which
can lead to greater performance than some lower level APIs as it can be
optimised in the future without affecting the API endpoints and could incorporate MacOS specific or device specific optimisations as needed.
fanotify (Linux)
The fanotify API allows us to have instant startup and low, constant
memory usage of O(1). inotifywait
needs to add all the paths recursively that need to be watched and also uses memory for this process giving it O(n) memory usage.
On the downside, fanotify requires superuser (CAP_SYS_ADMIN) privileges.
How to Use
First, head to https://github.com/jdspugh/jp-watch-c and follow the "Installation" instructions. Once you have it installed you can try out some commands.
Watch the local directory:
$ jp-watch .
Watch the whole drive:
$ jp-watch /
Watch a few different files and directories with a combination of relative and absolute paths:
$ jp-watch README.md jp-watch.c /var/www /tmp
Code Review
I submitted the code to Code Review Stack Exchange for review and have already incorporated the suggested changes. You can read the reviews and comments there and submit your own reviews of it.
Future Improvements
The Linux version filters out unwatched paths, but the algorithm used is O(n) time complexity. This could potentially be be reduced to O(log(n)) if it’s found to be a bottleneck. At the moment for my use cases I watch a handful of large directory trees, so there is no problem.
Conclusion
jp-watch
is a fast and efficient file watching tool that produces consistent results on both MacOS and Linux. Just make sure you have superuser privileges for running the Linux version as the fanotify
API requires this. I hope you enjoy using jp-watch
and find some great use cases for it in your projects!