Goal

I want to create a personal cloud for both data storage and fault tolerant service hosting over WANs.

jp-watch cloud
Personal Cloud

I want the servers involved to be geographically independent to avoid problems such as natural disasters or large scale network outages. The geographical independence can potentially be leveraged to reduce latency to services located on servers closer to the user.

I want it to be company independent in case there is some problem with the company such as it going out of business.

Implementation Choices

I chose Node.js as the implementation language to interface between detected file changes (via jp-watch) and the efficient low level file synchronisation protocol, rsync. The interfacing is not speed critical as the bulk of the work is done by jp-watch and rsync, so Node.js works just fine for this. At present I’m keeping most of my software as Javascript as it is so ubiquitous and able to be used front or back end.

Configuration File Format

Several formats were considered for the configuration file: JSON, JSON5, YAML and XML. I will briefly go through the different choices and point out their characteristics.

Even though JSON is a trending technology, I feel it has limitations for configuration files, particularly:

  1. Not being able to have comments inside the file
  2. A cluttered and difficult-to-follow structure
{
  "jpSyncSettings": {
    "watch": [
      {
        "paths": [ "." ],
        "sync": {
          "server": [
            {
              "address": "example.com",
              "remotePath": "/tmp/test",
              "direction": "to"
            }
          ]
        }
      }
    ]
  }
}
Sample JSON Configuration File

JSON5 overcomes most of the limitations of JSON by allowing comments and having a more relaxed stance on using quotes. This makes the files more human friendly for reading, modifying and documenting. JSON5 still has these limitations though:

  1. Few tools for writing and editing JSON5 files
  2. JSON5 is not an official standard, so may be subject to changes
  3. As with JSON, a cluttered and difficult-to-follow structure
{
  jpSyncSettings: {
    watch: [
      {
        paths: [ '.' ],
        sync: {
          server: [
            {
              address: 'example.com',
              remotePath: '/tmp/test',
              direction: 'to'
            }
          ]
        }
      }
    ]
  }
}
Sample JSON5 Configuration File

YAML is easier to read for humans than JSON or JSON5, but I feel it’s still limited in this regards for more complex configurations:

  1. A somewhat unstructured format
jpSyncSettings:
  watch:
    - paths:
        - .
      sync:
        server:
          - address: example.com
            remotePath: /tmp/test
            direction: to
Sample YAML Configuration File

XML is well structured, with clearly defined objects and attributes. It has a powerful schema language. It has mature tools for editing XML files that conform to schemas, including autocompletion.

<jp-sync-settings xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="./jp-sync-settings-schema.xsd">
  <watch>
    <paths>
      <path>.</path>
    </paths>
    <sync>
      <server address="example.com" remotePath="/tmp/test" direction="to"/>
    </sync>
  </watch>
</jp-sync-settings>
Sample XML Configuration File

Configuration File Schema

Configuration consists of any number of <watch> sections. Each <watch> section contains a <paths> section that contains any number of paths that will be watched recursively for changes.

<watch> sections It also contains a <sync> section that enumerates remote servers to be synced and in which direction changes should flow.

Within the <sync> section there is also an <after> section that can contain one or more shell commands to be executed after a sync completes. This is useful, for instance, when you want to restart a service if its watched configuration files have changed.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="./jp-sync-settings-schema.xsd">

  <xs:simpleType name="IPAddressOrFQDN">
    <xs:restriction base="xs:string">
      <xs:pattern
        value="((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)|([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="uint16">
    <xs:restriction base="xs:unsignedInt">
      <xs:minInclusive value="0"/>
      <xs:maxInclusive value="65535"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="syncDirectionType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="to"/>
      <xs:enumeration value="from"/>
      <xs:enumeration value="bidirectional"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:element name="jp-sync-settings">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="watch" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="paths" minOccurs="1" maxOccurs="1">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="path" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <xs:element name="sync" minOccurs="1" maxOccurs="1">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="cloud" minOccurs="0" maxOccurs="unbounded">
                      <xs:complexType>
                        <xs:attribute name="serversEnvVar" type="xs:string" use="required"/>
                        <xs:attribute name="port" type="uint16"/>
                        <xs:attribute name="username" type="xs:string"/>
                        <xs:attribute name="uid" type="uint16"/>
                        <xs:attribute name="gid" type="uint16"/>
                      </xs:complexType>
                    </xs:element>
                    <xs:element name="server" minOccurs="0" maxOccurs="unbounded">
                      <xs:complexType>
                        <xs:attribute name="address" type="IPAddressOrFQDN" use="required"/>
                        <xs:attribute name="port" type="uint16"/>
                        <xs:attribute name="username" type="xs:string"/>
                        <xs:attribute name="uid" type="uint16"/>
                        <xs:attribute name="gid" type="uint16"/>
                        <xs:attribute name="remotePath" type="xs:string" use="required"/>
                        <xs:attribute name="direction" type="syncDirectionType" use="required"/>
                      </xs:complexType>
                    </xs:element>
                    <xs:element name="after" minOccurs="0" maxOccurs="1">
                      <xs:complexType>
                        <xs:sequence>
                          <xs:element name="shellCmd" minOccurs="0" maxOccurs="unbounded">
                            <xs:complexType>
                              <xs:simpleContent>
                                <xs:extension base="xs:string">
                                  <xs:attribute name="uid" type="uint16"/>
                                  <xs:attribute name="gid" type="uint16"/>
                                </xs:extension>
                              </xs:simpleContent>
                            </xs:complexType>
                          </xs:element>
                        </xs:sequence>
                      </xs:complexType>
                    </xs:element>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>
The XML Configuration File's Schema

Linux Cloud Setup

If you wish to implement a cloud using jp-sync the code should be setup to be run as a service to ensure it will be running on automatically on startup or reboot.

Example for Ubuntu 22:

[Unit]
After=network.target

[Service]
User=root
WorkingDirectory=/home/user/jp-sync
EnvironmentFile=/etc/environment
ExecStart=/home/user/n/bin/node jp-sync.mjs

[Install]
WantedBy=multi-user.target
jp-sync.service

This service configuration file needs to be placed in the /etc/systemd/system directory.

It will be run at every system startup if you run:

$ sudo systemctl enable jp-sync

It can be restarted any time using:

$ sudo service jp-sync restart

In the example /home/user/n/bin/node is the location of the Node.js binary executable.

The jp-sync.mjs code is expected at /home/user/jp-sync/jp-sync.mjs.

Make sure any required environment variables have been saved on their own line in /etc/environment so that they will be available following a reboot, for example SERVERS=mydomain1.com,mydomain2.com.