📖Merkle-CRDTs: Merkle-DAGs meet CRDTs

March 3, 2025

authors: Hector Sanjuan and Samuli Poyhtari and Pedro Teixeira and Ioannis Psaras
year: 2020
url: https://arxiv.org/abs/2004.00107

Merkle-DAGs provide causality information and can be used to build a logical clock (Merkle-Clock)
q
a Merkle-CRDT-based system scales well to the order of thousands of replicas which can opportunistically join and depart – a very common condition when working with mobile, browsers or IoT devices.
Operation-based CRDTs require reliable messaging layer and may require causal delivery.
ChronoSync, Named Data Networking architecture
Merkle-clocks decouple causality information from the number of replicas, so they can handle the case of replicas randomly joining and leaving
the one downside is that histories can get quite large, so syncing state from scratch may become slow
merkle clocks establish partial order. Total order could be established by adding additional rules for ordering concurrent nodes (e.g., sorting by depth + content hash)
merkle crdt is just merkle clocks with crdt payload
- provides causal consistenty and gap detection, so good for operation-based CRDTs
q
Merging two Merkle-Clocks requires comparing them to see if they are included in one another and finding differences. This may be a costly operation if DAGs have diverged significantly (or long ago).
- storing hashes in a local key-value store may help speed-up these checks
- embedding version vectors in the payload may help with inclusion check without the need for DAG-walking
q
Additional pointers in nodes:: One of the ways to work around the thin-DAG problem is to regularly introduce references to deeper parts of the DAG when issuing new nodes. This method is basically adding extra children to nodes. It allows more parallelism when fetching missing parts of the DAG by being able to jump to other sections of it, resulting in much faster traversals. The actual number of extra links and their destination will depend on the needs of the application.
- This helps to parallelize fetches of unknown nodes. As merkle-dag tends to be thin, fetching could devolve to fetch one children, look at its children and fetch them — requires a lot of back-and-forth and is thus slow