2025-2026
An Obsidian Sync Built Around the Merger I Already Had
VaultLink: self-hosted Obsidian sync. Edit in any editor, online or off, then come back to a converged vault. The app that justified reconcile-text.
I refuse to give up the editor. Obsidian on the phone, Vim on the laptop, VS Code at work, the occasional headless sed across the whole vault. None of them know about each other, none of them are going to learn to, and I’m not switching to whichever sync product picks a favourite. VaultLink is the architecture that falls out of that refusal: one Rust server, one TypeScript sync engine, an Obsidian plugin, a CLI, and two test harnesses. The merge primitive underneath it all is reconcile-text, which I wrote first. VaultLink is the question that made it worth writing, finally asked in earnest.
The constraint that picks the algorithm
The consequence of that refusal is that the server never sees keystrokes. It sees end states: a file as it stood when sync caught it. That kills CRDTs (which need every operation) and OT-as-it’s-usually-implemented (same). It leaves you with one primitive: 3-way merge given a parent, a left, and a right. Which is reconcile-text. Which I’d written exactly because no existing tool took three independently-edited file states and gave one back.
The other consequence is that the path placement is its own problem. Two clients might both move the same file. A file might land on a slot another file already occupies. A rename and a content edit might race. That’s the part I underestimated.
Two loops, separate invariants
The sync engine is two loops, deliberately disentangled:
- Wire loop (
syncer.ts). Drains the single-consumer FIFO of pending HTTP and WebSocket ops. Updates a document’s record fields (remoteRelativePath,parentVersionId,remoteHash) and writes content to whatever path the record currently holds. Never moves files for path placement. - Path reconciler (
reconciler.ts). Runs after every drained event. Best-effort pass that moves files on disk solocalPath === remoteRelativePath. The move graph is topologically sorted. Records with pending local events are skipped; the reconciler only operates on settled ones. Failures (slot occupied by something untracked) are silent skips; the next pass retries.
The split is the load-bearing decision. It used to be one loop with both responsibilities, and the bug catalogue was a parade of slot-collision stashes, “conflict-uuid” hacks, and MoveOnConflict.NEW/EXISTING policy choices. Separating wire transport from path placement made most of that vanish: the wire loop can freely write remoteRelativePath to whatever the server returned, even if it disagrees with the file on disk, because the reconciler won’t move anything out from under a queued user rename.
Cycles in the move graph (A→B, B→C, C→A) are resolved by reading every file in the cycle into memory and writing each back to its new slot; no tmp files. A write-ahead marker at .vaultlink/swap-<uuid>.json lists each leg. On startup the reconciler reads the marker, hashes each from to determine which legs ran, and replays the rest. .vaultlink/** is hardcoded into the internal ignore pattern so the swap markers never themselves get synced.
Pending creates are Promises, not strings
When the user creates a file locally and then immediately edits or renames it before the create has been acknowledged, the engine doesn’t know the document’s id yet; the server assigns it. So queued events for that doc carry a Promise<DocumentId> in their documentId slot, threaded back to the still-in-flight LocalCreate. When the server acks the create, resolveCreate fulfils the promise and replacePendingDocumentId walks the queue swapping the resolved string into every dependent event.
If you’re walking events[] and comparing docIds with ===, you’ll silently fail to match until the swap happens. There’s a comment in sync-event-queue.ts that warns about exactly that, in slightly more alarmed punctuation. The shape is unusual but the alternative (synchronously waiting for the create ack before letting the user type more) is the kind of thing that makes a notes app feel like a 1998 webform.
MinCovered: the watermark that doesn’t lie
The catch-up handshake says “give me everything newer than lastSeenUpdateId.” If the client advances that id as it receives a stream of RemoteChange ids out of order, it’ll publish a too-high cursor, and the next reconnect will request from a point past events it never actually applied. Permanent gap. Replay-forever bug, with extra steps.
The fix is a small data structure called MinCovered: a contiguous-prefix tracker over a stream of integers. It advances the public min only when the next consecutive id has been processed. Out-of-order arrivals stash without bumping the cursor. Five files of test, one screen of implementation, and an entire category of confusing data-loss bugs disappears.
reconcile-text on the server
The merge sits on the server. When two clients submit edits against the same parent_version_id, the second submission triggers a 3-way merge against the parent and the freshly-committed first edit. Three strings in, one out. No conflict markers. The engine commits the merged result, increments the version, and broadcasts the new state to every connected client.
Two restrictions, both honest:
- Only
.mdand.txt. Markdown that fails UTF-8 validation gets treated as binary, same as PNGs and PDFs. - Last-write-wins for everything else. Concurrent edits to a
.docxlose one of the writes. The right fix is “don’t edit binaries concurrently,” which is unsatisfying but true.
Merge quality is exactly what reconcile-text gives me. Word-level tokenisation turns most prose conflicts into two adjacent edits that coexist. If the merge looks slightly clumsy now and then, the alternative is a <<<<<<< HEAD block in my notes, and I’d take the clumsy sentence every time.
Two test harnesses, one workflow
Distributed-sync bugs are confusing the first time and impossible the second. The fix is two harnesses:
test-client(fuzz). N parallel processes hammering random ops against a shared server for minutes at a time. Catches bugs nobody thought to write a test for. Reproductions are noisy.deterministic-tests. Scripted multi-client scenarios with a step grammar (pause-server,pause-websocket,barrier,assert-consistent) using an in-memory filesystem against a real server binary. Used to capture a fuzz-found bug as a minimal repro before fixing it.
The workflow: fuzz finds something, I sift logs for a root cause, write the minimal deterministic test that fails on it, fix until both that test and the fuzz pass. Without the deterministic harness, every bug fix would be vibes-based.
Smaller calls
- TS types are generated from Rust via
ts-rs. The HTTP/WS API has one source of truth: the Serde types in the server.scripts/update-api-types.shre-emitsfrontend/sync-client/src/services/types/. Hand-edits to those files are explicitly banned. sqlx::query!macros over a checked-in.sqlxcache. SQL is verified against the schema at compile time. Touching SQL means re-runningcargo sqlx prepare --workspace; if you forget, CI catches it.- One sync engine, four consumers.
sync-clientis the engine. Obsidian plugin, standalone CLI, fuzz harness, and deterministic harness all depend on it viafile:../sync-client. Bugs are fixed once and inherited everywhere. record.localPathmutates in place across awaits. The watcher can rename a doc while a wire-loop handler is mid-HTTP. SnapshottinglocalPathinto a local at function entry and reading it after the await reads a vacated slot. Read it live; only snapshot when you deliberately want to compare before and after the await.- Watermark advancement is load-bearing both ways. Branches that skip a remote event without advancing
lastSeenUpdateIdcreate permanent gaps that re-deliver forever. Branches that advance without applying the content lose data. The rule that survives review is: advance only if you applied the event or deliberately discarded it.
The race I haven’t structurally fixed
Pause-or-disable-sync mid-flight is the one left. An HTTP that committed server-side but whose response was dropped leaves the server holding a doc the client never recorded. On resume, the offline scan finds the file again, uploads it as a new create, and server-side dedupe merges the duplicate into the existing doc. If the merge produces a deconflict file (two real divergences), the user picks up an extra file in their vault. Not data loss, but a small ugliness.
The two-loop split doesn’t fix this and probably shouldn’t. The honest path is something like a persisted client-side “have I acked this op?” log, sitting in the same SQLite the engine already uses. It’s on my list, below several things I want more.
What I’d change
- Move the merge to the client. Right now reconcile-text runs on the server. Putting it in the WASM build of reconcile-text on each client, and letting the server be a dumb commit log, would let the merge benefit from device-specific tokenisers (Markdown-aware on the desktop, word-level on mobile). It would also stop the server from needing to understand the file format at all.
- Property tests for the move graph. The cycle resolver is the part I trust least under crash. Snapshot tests can’t go where proptest can; I should be generating arbitrary move-graph + interruption combinations.
- A first-class “pause” with a write-ahead op log. See above.
- More than
.mdand.txt. A canvas-aware merge for Obsidian’s.canvasfiles is one reconcile-text tokeniser away. Not because anyone asked, but because the asymmetry annoys me.
The way I think about VaultLink now: reconcile-text was the bet. VaultLink is what I built once the bet looked like it might pay off. The interesting part of the bet was always that three independently-edited files can become one without anyone telling the system about the keystrokes that produced them. The interesting part of the application is everything you have to do around that merge to stop the rest of the system from undoing it.