Local-first database: remoteStorage.js

May
6,
2020
·
local-first,
offline-first

remoteStorage.js is a an offline-first solution that's been around for quite some time, and stands out for having a formal protocol spec, first drafted in 2012. It would be really cool if it took off, but unfortunately there are only one or two commercial providers, and the only stable open-source server implementation is in PHP. There's one written in rust that has been unmaintained since early 2018, and a nodejs one that has a big warning up top about being experimental & alpha-stage. Still, I figured it would be interesting to see how it stacks up according to my local-first database criteria. You can also look at the evaluation of gun-js to see how it compares.

Technically, "remoteStorage" is the protocol, and "remoteStorage.js" is the "reference client imeplementation". My goal with this series is to look at solutions that are immediately ready to be used to build apps, so my evaluation is of the javascript client (and a corresponding community-built javascript server). While many of the features of the client & server are inherent to the protocol, there are also many things that are more due to implementation details than the underlying spec. 🤷

Correctness

  • How are conflicts handled?

    Conflicts are not automatically handled -- they must be dealt with (or not) by the client using bespoke conflict merging code. If two clients change the same document, whichever client syncs to the server first wins, and the second one gets an "conflict event" when it tries to sync. A particularly clever client imeplementation could use a json-crdt to encode the data, and thereby automatically handle merge conflicts without developer or user intervention 🤔 but I haven't seen anyone try that yet.

  • How "bullet proof" is it? How easy is it to get it into a broken state (e.g. where different clients continue to see inconsistent data despite syncing)?

    With such a simple protocol I expected it to be pretty robust, but in my short time integrating it into my example app I managed to get into a state where refreshing the page and logging out & in again failed to show me the right data (indicating to me that logging out failed to clear my data, which is also concerning). Given that I was using the self-described "experimental & alpha-stage" nodejs server implementation though, the fault is most likely to be there.

  • Is there consistency verification built-in, to detect if you're in a broken state?

    Nope, the server is trusted to calculate etags correctly, and there's no verification that the data loaded is consistent. Often etag calculation is deterministic based on the content, but we're still trusting the server not to have bugs.

  • How well does sync preserve intent? In what cases would a user's work be "lost" unexpectedly?

    The out-of-the-box behavior is to not handle conflicts, so if two clients change the same document at the same time, one of those is lost. As described above, you can implement custom conflict resolution code to mitigate this.

Cost

Storage

  • How much data does the client need to store to fully replicate?

    The client stores each document in indexeddb, without much in the way of metadata.

  • How much data does the server need to store?

    The server also stores the latest state of each document (no change history).

  • How complicated is the server logic?

    It seems fairly simple conceptually. The server needs to calculate etags for each collection (folder), and stores each document as a plain json file.

    Again, this is evaluating the specific implementation of armadietto + remoteStorage.js. The protocol itself doesn't place restrictions on what kinds of data can be stored.

Code / implementation

  • remoteStorage.js

    • tests: looks like over 3000? All passing on master.

    • coverage: not tracked (although there is a 6-year-old issue about it)

    • community: 3 contributors in the past month

  • armadietto (the node.js server I evaluated)

    • tests: 105 tests, all passing on master

    • coverage: not reported

    • community: 1 change in the past 12 months.

Other notes

During the initial sync, the client will make 1 HTTP request per document. Which, if you have a large number of documents, ends up being a ton of network requests, and a long wait from cold start. (Note that HTTP/2 can lower, although not eliminate, the overhead associated with this).

Also, synchronization is done via simple polling (every 10 seconds or so by default).

Flexibility

  • How does it react to schema changes? If you need to add an attribute to an object, can you?

    With remoteStorage, the schema is defined by the client, and the server has no knowledge or opinions about data shape. So if you deploy a new version of the client with a new attribute, it can add that, but I haven't seen any accounting for data migration.

  • Is the shape of data restricted to anything less than full JSON? e.g. are nested objects, and arrays supported?

    All of JSON is supported.

  • Can it be used with an existing (server-side or client-side) database (sqlite, postgres, etc.) or do you have to use a whole new data storage solution?

    It certainly could, but the whole point of remoteStorage is that the app developer has zero control (or knowledge) over the backend.

  • Can it sync with Google Drive, Dropbox, etc. such that each user manages their own backend storage?

    Technically it can sync to Google Drive or Dropbox, but I've found the implementations to be extremely buggy. Certainly the whole point of remoteStorage is that each user manages their own backend storage, but the best-supported "backend" is the custom remoteStorage protocol, and the options for public remoteStorage providers are quite limited.

  • Does it require all data to live in memory, or can it work with mostly-persisted data? (such that large datasets are possible)

    Data is stored in IndexedDB on the client, so it doesn't all need to live in memory.

  • Does it support e2e encryption?

    There's nothing built-in for e2e encryption, but it could certainly be added on top (there's nothing in the protocol that would prohibit it).

  • Is multi-user collaboration possible, where some users only have access to a subset of the data? (think firebase access rules)

    Neither remoteStorage.js or armadietto have any accomodation for multi-user editing scenarios. It's possible that something could be implemented on top of the protocol to allow for it, but that's very much an area of future research.

  • Is collaborative text editing supported?

    No.

  • Does it have the concept of "undo" built-in?

    No. Change history is not saved.

  • Does it support a fully p2p network setup (no central authority / server)?

    No.

Production-ready

  • Is it being used in production?

    Anecdotally yes, but I haven't seen anyone publish a list of "here are currently-running production apps that are using it". The protocol is still in "draft" stage, which is one barrier to widespread adoption.

  • How well does it handle offline behavior?

    Quite well.

  • Does it correctly handle working on multiple tabs in the same browser session?

    Not when offline (there's an open bug from 2018 tracking this). When online, each tab syncs via polling separately, and so the tabs synchronize within 10 seconds of each other.

  • Does it bake in auth, or can you use an existing authentication setup?

    It bakes in auth, and you wouldn't use existing authentication because each user brings their own backend. If you want to ditch the "each user brings their own backend" part, you could implementat custom auth layer, but that negates many of the benefits of using this system.

Conclusion

remoteStorage occupies an interesting place in my mind. On the one hand, it's an 8 year old project that still receives active maintenance, which is a pretty big achievement in and of itself. On the other hand, its simplicity means that it's lacking a lot of features that people have come to expect from modern web applications. Overall, given that it's still actively being developed, it could very well gain some of those features and become a strong solution for building modern local-first apps in the future.

The original spec author of remoteStorage is now working on SOLID, which is also working in this you-own-your-data space. Recently he's started engaging with the remoteStorage community again as well.

Please drop me a note on twitter if there's anything I should add or correct!