High Level Goals

I created jp-watch as the first layer above the operating system in a set of layers targeting higher level goals listed below as use cases.

Being a low level layer, jp-watch is written in C for maximum performance and efficiency.

Use Cases

Create a Personal Cloud

I want to create a personal cloud for both data storage and fault tolerant service hosting over WANs.

I want the servers involved to be geographically independent to avoid problems such as natural disasters or large scale network outages. The geographical independence can potentially be leveraged to reduce latency to services located on servers closer to the user.

I want it to be company independent in case there is some problem with the company such as it going out of business.

Sync Local Development Files with a Server

jp-watch sync from local — Sync to a Remote Server

When coding on projects, development and testing can be done locally. But ultimately it's good to test in an environment closer to the production environment including the same hardware, same network and same DNS. Having a fast and efficient way to sync local files with production servers aids this process.

Existing File Watching Tools Reviewed

There are a few good tools about. Two of the most popular being inotifywait and fswatch.

inotifywait

inotifywait is a great tool but is not available on MacOS.

When using an inotifywait or an inotify based file watcher, the code needs to recursively mark each file/directory to be watched at startup. Each file descriptor needs to be stored giving it linear complexity O(n) for both startup time and memory usage. Granted inotify is very efficient, this is still a factor to take into consideration for some use cases where you want to watch very large amounts of files on a device with limited resources.

The code for inotify implementations is also somewhat complicated as the code needs to scan and watch the paths at startup and also add new paths while running as new files and/or directories are added. This adds further potential for coding bugs. It also adds to code maintenance costs.

fswatch

fswatch is a great tool also. But the limitations I found were that when reporting changes to files or directories it would not distinguish between them by placing a trailing slash at the end as inotifywait would. I could get it to do what I wanted on MacOS with a few options:

fswatch --event-flags --event=IsDir --event=IsFile <directory>

However on Ubuntu 22 this command would not report any IsFile events, making it not useable.

Implementations of the High Level Goals

Implementations of the high levels goals went through various stages and evolved through various languages over time. I recount a brief history here:

Bash

I originally used a bash script with inotifywait. As the bash script got more complex I decided to move to either Ruby or Node.js as large bash scripts are a nightmare to maintain.

Ruby

Ruby is my favourite language for coding command-line tools that are not performance critical. It's an amazing scripting language with an amazing progression from single line scripts to complex object-oriented tools. I had a Ruby version of the script I used for the second use case I described in the introduction, but I wanted to consolidate all my code into a form that made things easier. I wanted to simplify everything including my language choices so there was less maintenance for my projects. I was trying to finish 12 half done projects and wanted to consolidate them both structurally and language-wise.

Node.js

Since I could use Javascript on the frontend, backend and for scripting, in the end I thought it a better option. Particularly as my Javascript skills had now reached a good level and I could produce code quite rapidly. The Javascript tools now also being fairly mature and powerful. So finally, a good choice seemed to be Node.js and creating npm code packages. This is why I rewrote the tool in Node.js rather than Ruby.

I made a Node.js version of the tool and it worked well. It ran into some issues though such as if too many rsyncs were launched in parallel the program crashed. This would happen if I installed some npm packages, for instance, that would add a whole bunch of new files into a watched directory suddenly. I managed to overcome this by executing the rsyncs serially rather than in parallel.

Implementation of jp-watch

Rust vs C

After coding the initial version of the tool in Rust, I decided to go for a pure C implementation. The main reason being that the binary sizes are smaller. C has no runtime to include in the binary unlike Rust and C++. This may not be a huge deal but my intention is that this software becomes a small command line tool / library to be included in operating system packages and as such it should avoid bloat as much as possible.

Getting setup with C and the overall development flow was ultra smooth and fast. With Rust I had to install Cargo and it had to install some packages. Not a big deal but it was a smoother and faster experience with C.

Performance is debatable between C,C++ and Rust but C seems to have an edge and it feels much like driving a manual car vs automatic.

Algorithms

MacOS

The MacOS version just sets the relevant paths to be watched from the CLI arguments. FSStream handles the rest.

Linux

The Linux version uses fanotify to watch the whole filesystem and filters out irrelevant paths as they are reported.

Previous file watchers were based on inotify, kqueue or epoll where you needed to recursively add the contents of paths to be watched, and manually add new paths to be watched if they are added after the app starts. This added delays at startup, added code complexity, more bug potential and more maintenance costs.

APIs

FSStream (MacOS)

The MacOS FSStream API allows monitoring of whole directory trees which can lead to greater performance than some lower level APIs as it can be optimised in the future without affecting the API endpoints and could incorporate MacOS specific or device specific optimisations as needed.

fanotify (Linux)

The fanotify API allows us to have instant startup and low, constant memory usage of O(1). inotifywait needs to add all the paths recursively that need to be watched and also uses memory for this process giving it O(n) memory usage.

On the downside, fanotify requires superuser (CAP_SYS_ADMIN) privileges.

How to Use

First, head to https://github.com/jdspugh/jp-watch-c and follow the "Installation" instructions. Once you have it installed you can try out some commands.

Watch the local directory:

$ jp-watch .

Watch the whole drive:

$ jp-watch /

Watch a few different files and directories with a combination of relative and absolute paths:

$ jp-watch README.md jp-watch.c /var/www /tmp

Code Review

I submitted the code to Code Review Stack Exchange for review and have already incorporated the suggested changes. You can read the reviews and comments there and submit your own reviews of it.

Future Improvements

The Linux version filters out unwatched paths, but the algorithm used is O(n) time complexity. This could potentially be be reduced to O(log(n)) if it’s found to be a bottleneck. At the moment for my use cases I watch a handful of large directory trees, so there is no problem.

Conclusion

jp-watch is a fast and efficient file watching tool that produces consistent results on both MacOS and Linux. Just make sure you have superuser privileges for running the Linux version as the fanotify API requires this. I hope you enjoy using jp-watch and find some great use cases for it in your projects!

Implementing a File Watcher (MacOS & Linux)

Table of Contents