Local-first database: Hypermerge

September
30,
2020
·
local-first,
offline-first

I’m excited to review Hypermerge, as it was developed by the folks from Ink & Switch who wrote the Local-First Software article. It is designed as a fully peer-to-peer data storage system, and is electron/nodejs-only (no browser support).

The underlying data structure implementation, Automerge, is based the academic research paper A Conflict-Free Replicated JSON Datatype, and can be used independently – in fact there are a couple of different “networking” layers available – but Hypermerge is by far the most well-developed and “production ready”.

I’ll be using my local-first database criteria for this evaluation. You can also look at the evaluations of gun-js and remoteStorage.js to see how they compare.

Correctness

  • How are conflicts handled?

    Conflicts are automatically resolved by Automerge, although the exact behavior is undocumented. The paper that it’s based on has a couple of interesting failure modes (if one user clears an object and another user sets an attribute of the object, you might be left with an object that only has one attribute, which could break an app’s assumptions about the shape of data), but overall it’s very thorough and would generally result in seamless merging.

  • How “bullet proof” is it? How easy is it to get it into a broken state (e.g. where different clients continue to see inconsistent data despite syncing)?

    I haven’t seen any issues with it – the append-only log format is quite robust.

  • Is there consistency verification built-in, to detect if you’re in a broken state?

    Hypercore, the underlying persistence layer, uses hashing to verify consistency (from what I can tell).

  • How well does sync preserve intent? In what cases would a user’s work be “lost” unexpectedly?

    All objects contain the full history of changes to that object, and so all work is fully recoverable.

Cost

Storage

  • How much data does the client/server need to store to fully replicate?

    Both client and server store a full replica of all changes made, and from some preliminary testing, it looks like the constant factor is rather high (the on-disk size increasing by multiple kilobytes when adding a single change to an attribute).

Code / implementation

  • Automerge

    • tests: 412 tests passing

    • coverage: not measured

    • community: 2 contributors in the past 6 months

  • Hypermerge

    • tests: 213 passing, 1 failing

    • coverage: not measured

    • community: 2 contributors in the past 6 months

Flexibility

  • How does it react to schema changes? If you need to add an attribute to an object, can you?

    Hypermerge doesn’t do schema validation, so anything goes!

  • Is the shape of data restricted to anything less than full JSON? e.g. are nested objects, and arrays supported?

    All supported.

  • Can it be used with an existing (server-side or client-side) database (sqlite, postgres, etc.) or do you have to use a whole new data storage solution?

    Hypermerge is built on hypercore, which doesn’t look like it’s built to integrate with “traditional” databases.

  • Can it sync with Google Drive, Dropbox, etc. such that each user manages their own backend storage?

    No.

  • Does it require all data to live in memory, or can it work with mostly-persisted data? (such that large datasets are possible)

    It can load documents as needed from the file-based persistence.

  • Does it support e2e encryption?

    It depends on how you define “end”. Hypermerge uses the DAT protocol under the hood, and so data is encrypted when going over the wire, but not at rest on a “storage peer” (remote server).

  • Is multi-user collaboration possible, where some users only have access to a subset of the data? (think firebase access rules)

    Not currently.

  • Is collaborative text editing supported?

    Yes! Automerge contains a Text CRDT that allows for collaborative text editing. I’ll be doing a more in-depth review of this compared to others (and the ways in which they might fail to preserve intent) in a future post.

  • Does it have the concept of “undo” built-in?

    Yes! Because hypermerge retains a full history of all changes, undo and redo are supported out of the box.

  • Does it support a fully p2p network setup (no central authority / server)?

    Yes! Using Dat as the underlying synchronization & discovery protocol.

Production-ready

  • Is it being used in production?

    Sort of. Pushpin is the main (only?) client application currently written using Hypermerge, and it is working & available to download, but I’m not sure what kind of usage it’s getting. Certainly Hypermerge (and Pushpin) are being written from a research perspective, as opposed to a commercial one.

  • How well does it handle offline behavior?

    Very well.

  • Does it correctly handle working on multiple tabs in the same browser session?

    N/A, as hypermerge is electron-only, with no browser support.

  • Does it bake in auth, or can you use an existing authentication setup?

    Auth w/ Hypermerge is an area of active research, but at the moment each “client” self-declares their name, and is treated as a unique user. There’s no accounding for custom auth at the moment.

I asked about production usage in chat, and got the answer “it’s reasonably usable but still has some nasty bugs”, so in general it’s probably “not production ready”.

Conclusion

Hypermerge is doing some very cool things in the peer-to-peer space, and the papers that are coming out of this research are fascinating, but ultimately it is a research tool primarily, and not meant to be production ready. That said, it’s also undergoing very active development, so I wouldn’t be surprised to see it become a real contender in the future!

Please drop me a note on twitter if there’s anything I should add or correct!