Knowledge-Management on hippotion

I Added a Knowledge Graph to My Search. It Made It Worse.

Fri, 05 Jun 2026 00:00:00 +0000

I have a note in my second brain that I wrote months ago. It says, with the confidence of someone who hadn’t measured anything:

Combining lexical search (BM25) with vector similarity and graph expansion produces more robust recall than embeddings alone.

That sentence shipped into production. My vault of markdown notes gets indexed into a search database, and the search function fuses three signals: BM25 (classic keyword ranking), vector similarity (embeddings), and graph expansion — when a note matches, pull in its linked neighbours too, on the theory that the thing you want is often next to the thing you typed.

It sounds right. Graphs are having a moment in RAG. “Add a knowledge graph to your retrieval” is the kind of thing you can put on a slide and nobody pushes back. I believed it enough to make graph expansion a first-class signal with a weight of 0.5 — equal footing with keyword matching.

This week I finally wrote a benchmark. The graph wasn’t helping. It was the single biggest thing hurting my search.

The setup

30 gold queries against the live vault (63 notes), borrowing the harness shape from an eval framework I’d been reading. Each query has a hand-labelled “correct” note. I measured recall@5 (did the right note land in the top 5?) and MRR (how high did it rank?), across three retrievers:

grep — naive substring term-count. The dumb floor.
bm25 — pure keyword ranking, FTS5’s BM25. The honest baseline.
live — my production hybrid (BM25 + vector + graph).

I expected a clean staircase: grep at the bottom, bm25 in the middle, my clever hybrid on top. That’s the whole reason you build the clever thing.

The scorecard

retriever	recall@5	MRR
grep	0.467	0.307
bm25	0.950	0.826
live (hybrid, `w_graph=0.5`)	0.650	0.520

Read that bottom row twice. My production “smart” search found the right note 65% of the time. Plain keyword search found it 95% of the time. The hybrid I’d been quietly proud of was worse than its own baseline — it broke 9 of 30 queries that BM25 got right. BM25 alone whiffed on exactly one.

The clever layer wasn’t adding intelligence. It was adding noise, confidently.

Why the graph backfired

Here’s the mechanism, and it’s almost funny once you see it.

Graph expansion pulls in a matched note’s neighbours. But in a real knowledge base, the most connected notes are hubs — my inbox of ideas, my project radar, my “things Claude noticed” log. Everything links to them, so they’re everyone’s neighbour. When I searched for something specific, the graph helpfully dragged these popularity-contest winners into the candidate set, and they elbowed the genuinely relevant note clean out of the top 5.

Concrete example. Query: “who owns this knowledge system?” The correct answer is my personal note. BM25 ranked it #5 — just barely in. The hybrid, drunk on graph neighbours, pushed it off the list entirely. The graph didn’t find a better answer. It buried a good one under hubs.

I swept the graph weight to confirm it wasn’t a fluke. It was perfectly monotonic — every increment of graph made search worse:

graph weight	recall@5	MRR
0.0 (off)	0.950	0.826
0.1	0.950	0.737
0.25	0.817	0.564
0.5 (what I shipped)	0.650	0.520

There’s no ambiguity to argue with. More graph, more harm, no exceptions. The value I’d been claiming in that confident note — I finally measured it, and it was negative.

The fix, and the actual lesson

The fix was one line: drop the default graph weight from 0.5 to 0.1. Recall snapped back to 0.95, tying pure BM25. (Turning the graph fully off is marginally better still on MRR; I kept a whisper of it as a tiebreaker, which is a taste call, not a data-driven one.)

But the one-line fix isn’t the point. The point is where graphs belong.

Graph expansion isn’t a bad idea — I aimed it at the wrong job. Precision retrieval (“find me the one note that answers this”) wants to be narrow and literal. Pulling in neighbours is the opposite of what you want; every neighbour is a chance to be wrong. But I have a different feature in this same system — a discovery mode that deliberately collides distant notes to surface unexpected connections. There, neighbour-pulling isn’t noise, it’s the entire product.

Same mechanism. One context it’s poison, the other it’s the point. I’d been running my discovery tool inside my lookup tool and calling it a hybrid.

A few honest caveats, because a benchmark you can’t poke holes in is usually lying: my gold set is self-authored v1, the corpus is small (63 notes), and the vector signal was actually dark during this run — I hadn’t built the embeddings yet, so “hybrid” here was really “BM25 + graph.” The vector half of my original claim is still untested. This is directional, not gospel.

But directional was enough. I’d shipped a claim, the claim got measured, and it didn’t survive contact with 30 queries. That’s the whole reason I keep my brain in git with everything reproducible: so the day I bother to measure, the measurement can actually win the argument against my own confident prose.

The slide-deck version of RAG says add a graph. The benchmark says know which question you’re answering first. I’ll take the benchmark.

I Run GitOps for My Brain

Fri, 01 May 2026 00:00:00 +0000

The pattern I didn’t know I had

This week an AI agent told me something about my own systems that I’d never noticed, and it was correct: I have one favorite architecture, and I’ve built it three times.

At work: git holds Terraform code → Terraform derives the S3 buckets. Nobody clicks around in the AWS console; the repo is the truth.
In the homelab: git holds Kubernetes manifests → ArgoCD derives the cluster. Every app on my rack is a folder in a repo.
In my second brain: a vault of markdown notes → an indexer derives the search database (SQLite FTS + a link graph) that my AI tools query.

Same shape everywhere: a plain-text source of truth in git, and a machine that builds the real thing from it. Master copy, derived state. I never decided this consciously — it’s just how my hands build things now.

GitOps isn’t the git part

Here’s the thing that the third copy got wrong, and it took me embarrassingly long to see because I teach this pattern at the infrastructure layer.

“Configuration in git” existed long before GitOps. What made GitOps an actual shift was the reconciler: ArgoCD doesn’t apply your manifests once and wish you luck. It watches, continuously. When the cluster drifts from the repo, you get an OutOfSync badge, and with selfHeal enabled it puts reality back where the repo says it should be. The loop is the product. Git is just where the loop points.

My vault had no loop. If I edited a note and forgot to rebuild the index, the search results my AI agents rely on were silently stale — no badge, no error, nothing. The only protection was a rule in the repo’s agent instructions: “if files and index disagree, the files win — run the indexer.”

A policy that agents must remember. In other words: I was running Kubernetes with a sticky note on the monitor that says please redeploy after editing the YAML. I would never accept that on my cluster. My brain ran on it for months.

The fix took an afternoon

Two pieces, both boring on purpose.

exo status — the OutOfSync badge. The indexer now stores a content hash per note; status re-hashes the vault and diffs:

{
  "status": "OutOfSync",
  "modified": ["vault/10-notes/interests-themes.md"],
  "new": [],
  "deleted": [],
  "repair": "exo index"
}

Exit code 0 when synced, 1 when not — so scripts and CI can ask the question too, exactly like argocd app get.

Git hooks — the selfHeal. Versioned hooks (core.hooksPath .githooks) on post-commit and post-merge rebuild the index after every commit and pull:

command -v exo >/dev/null 2>&1 || exit 0
EXO_ROOT="$(git rev-parse --show-toplevel)"
exo index >/dev/null 2>&1 && echo "exo: index reconciled (Synced)"

Now every git commit in the vault prints exo: index reconciled (Synced) on its way out. The rule didn’t change — files win — but it stopped being something agents must remember and became something a machine enforces. That’s the entire difference between configuration management and GitOps, replayed at the knowledge layer.

The part where it gets a little strange

The reason I’m writing this post at all: I didn’t have this idea. A scheduled agent did, on what I can only describe as an idle walk.

My vault has a weekly cron job — we call it the Wanderer — that samples pairs of notes that are far apart: different folders, different months, almost no shared vocabulary. A headless Claude gets the pairs with exactly one task: read both notes in full and say whether anything genuinely connects. “Nothing connects” is a successful run. That last sentence is load-bearing — the run always reports its result either way, so the agent never needs to manufacture a finding to have done its job.

On its very first walk, it collided a work note about Terraform-driven S3 provisioning with the architecture map of the vault itself, and wrote: same sentence in different clothes — and the brain copy is missing its reconciler. Then it listed the two fixes you just read about.

Retrieval answers the questions you ask. Distant collisions surface the questions you didn’t know you had. It turns out my second brain didn’t need to get better at remembering — it needed to occasionally interrupt me.

If you keep a vault

Whatever your stack — Obsidian, org-mode, a folder of markdown — if anything derives from your notes (an index, embeddings, a published site), then you have source of truth and derived state, and the GitOps question applies: who notices when they drift? If the answer is “I do, hopefully,” you’re running the sticky-note era. Give it a badge and a loop. It’s an afternoon.

🌱 My Second Brain Weeds Itself Now

Fri, 27 Feb 2026 00:00:00 +0000

A few weeks ago I rebuilt my second brain as a folder of markdown in git — vault is the source of truth, everything else (search index, graph, 3D viewer) is a derived layer I can delete and rebuild. I love it. But a knowledge base has a dirty secret: it rots.

Not the files — those are fine. The connections rot. You capture a note at 11pm and never link it to anything, so it becomes an orphan floating off the graph. A project note’s one-line summary describes what the project was three weeks ago. Two notes are obviously about the same thing and neither knows the other exists. Do this for a few months and you don’t have a second brain, you have a junk drawer with good search.

The honest fix is to weed the garden regularly. The honest truth is that nobody does, including me.

So I stopped relying on myself and built a gardener.

What it actually does

Every night at 3am, on my homelab box, a script runs:

Detect — exo garden, a plain query over the index, produces a report: here are the orphans, here are notes that should probably link to each other, here are summaries that look stale. No AI in this step. It’s SQL and graph traversal. Deterministic, boring, trustworthy.
Decide and write — that report gets piped to claude -p (Claude Code in headless mode). Claude reads the vault’s operating contract, makes only high-confidence edits — add a [[wikilink]] between two genuinely related notes, refresh a stale summary — caps itself at ~10 notes a night, and writes a dated log note explaining exactly what it changed and what it deliberately skipped.
Commit — the wrapper reindexes and lands everything as a single garden: 2026-06-09 … git commit, then pushes. My 3D graph viewer picks it up on the next sync.

The first real run, it found one orphan (90-meta/README), linked it into the notes it actually indexes, and then — this is the part I liked — declined to touch the 12 “stale summary” candidates because, on inspection, every one of them was already accurate. It wrote: “flagged by length, not staleness; churning them would add noise.” A gardener that knows when not to prune is the one you can leave alone.

“Isn’t this a solved problem?”

Mostly, no — but partly, yes, and I want to be straight about it. AI-assisted note-linking exists: Obsidian plugins like Smart Connections suggest related notes, and apps like Mem and Reflect auto-organize as you write. They’re good.

Three things make this different enough to build:

Every change is a reviewable git diff, authored by a named agent. Not silent magic that rearranges your notes while you’re not looking. git log -p shows you exactly what the gardener did last night; git revert undoes a bad night in one command. For something as personal as a knowledge base, “show me the diff” beats “trust me.”
It’s mine, end to end. Runs on my hardware, on my schedule, with a model I point at. No SaaS holds my brain hostage.
The detection is deterministic; the model only acts. The LLM never decides what’s wrong — a boring query does that. The model only decides how to fix the things already found. That split keeps the whole thing auditable and cheap.

If you already live in a tool that does this and you trust it, great. I wanted the git-diff trail and the local control.

The part I actually want to tell you about

The plan was tidy: I run n8n on the same cluster, so n8n would be the scheduler — fire nightly, SSH into the node, run the gardener. Clean, visual, one workflow.

n8n could not reach the node. At all. Every port: ECONNREFUSED.

This sent me down a genuinely interesting hole, because the homelab runs Cilium for networking, and Cilium has opinions about your own node that plain Kubernetes does not.

First instinct: a NetworkPolicy allowing egress to the node’s IP. Wrote it, synced it, still refused. The reason is a Cilium subtlety worth knowing: the node isn’t a CIDR, it’s an identity. Cilium classifies your cluster’s own node as the special host identity, and ordinary ipBlock CIDR rules do not match it unless you flip a cluster-wide setting (policy-cidr-match-mode: nodes). My 192.168.0.109/32 rule was a no-op.

So I switched to the Cilium-native tool: a CiliumNetworkPolicy with toEntities: [host]. Confirmed it applied — I could see reserved:host allowed right there in the datapath’s BPF policy map. I confirmed the node’s IP really does resolve to identity 1 (host). I confirmed the host firewall was disabled. Everything said “allowed.”

Still ECONNREFUSED.

That’s the wall. The packet leaves the pod with Cilium’s blessing, hits the host’s own network stack, and something there sends a reset — and I couldn’t see what, because inspecting the host firewall needs root, and this automation deliberately doesn’t have it. I could have kept digging with a password. But I stopped and asked a better question: why am I making a pod reach back into the host it’s running on at all?

That’s an awkward direction. The work has to happen on the host (that’s where the vault, git creds, and Claude live). A pod straining to SSH into its own node is fighting the grain of the platform.

So I inverted it. The node schedules itself — a plain cron entry, rock-solid, no network gymnastics. And n8n, instead of triggering the job, receives it: at the end of each run the node POSTs a summary to an n8n webhook. Node→n8n works perfectly (it’s just an outbound HTTPS call to a URL). n8n keeps the run history and is the place I’ll later wire a phone notification.

I lost nothing that mattered. n8n is still my dashboard; the schedule just lives where the work lives. And I deleted the SSH key and the network-policy hole I’d opened — the cleanup felt better than the original plan would have.

The lesson, such as it is

Two, actually.

One: when you’re automating something to run unattended, the bug you want to find is the one that shows up in a dry run at 2pm, not at 3am three weeks from now. I almost shipped a version where a brand-new note (untracked by git) was invisible to my change-detection and would’ve been silently wiped each night. The dry run caught it. Always build the dry run.

Two, the bigger one: I spent an hour trying to make a pod punch into its host because that was my plan, and the platform kept saying no in increasingly specific ways. The fix wasn’t a cleverer NetworkPolicy. It was noticing I was pushing against the design and turning around. The node scheduling itself and reporting up to n8n is simpler, safer, and more honest about where the work actually lives.

My brain weeds itself now. Every morning there’s maybe one small, sensible commit waiting — a link I’d have never made, a summary nudged back to true — and I can read exactly what changed before my coffee’s done. That’s the whole dream of a second brain that isn’t a junk drawer: it stays a garden, and I barely have to touch it.

🧠 A Second Brain You Can `git clone`

Fri, 16 Jan 2026 00:00:00 +0000

The graveyard of second brains

I had a second brain once. Obsidian vault, a CouchDB LiveSync backend, even a weekly agent that summarised my notes. It worked — for a while. Then the sync started fighting itself across my laptop, the homelab, and my phone, and the day syncing becomes a chore is the day you stop opening the thing. The notes were still there. I just never looked at them again.

That’s how most second brains die. Not from bad notes — from the plumbing. The sync breaks, or the upkeep outpaces the payoff, or the whole thing is trapped in one app’s database and moving it feels like surgery. The knowledge was never the problem. The container was.

So when I rebuilt it, I started from the failure modes, not the features.

What I actually wanted

Three things, none of them “more notes”:

Memory I share with my AIs. Every time I open a fresh Claude session, it starts from zero — I re-explain my homelab, my projects, what we decided last week. I wanted a place both of us read and write, so the context survives the session.
Something that outlives any tool. No lock-in. If the app of the month dies, my brain shouldn’t die with it.
Sync that can’t rot. The thing that killed v1.

The one decision that matters

The store and the intelligence are different layers, and only the store is sacred.

The store is a folder of plain markdown in git. That’s it. Human-readable, diffable, greppable, yours. Everything clever sits above it and is fully rebuildable:

L5  Visualisation   3D graph, Obsidian, whatever reads markdown
L4  Automation      scheduled "gardener" runs
L3  Agent interface MCP servers — search, graph, note CRUD
L2  Index           SQLite: full-text + vectors + materialised edges
L1  Structure       typed frontmatter + [[wikilinks]]
L0  Substrate       markdown files in git   ← the only thing that's truth

Delete L1–L5 and nothing is lost — you rebuild them from L0 with one command. That property is the whole design. The index can corrupt, the embedding model can change, the viewer can break (mine did, spectacularly — that’s another post), and the knowledge doesn’t care. It’s text in git.

And sync is just git pull. No LiveSync daemon to wedge itself, no proprietary replication. The exact thing that killed v1 is now the most boring, battle-tested part of the stack. Three devices, one git pull, done.

Search that explains itself

The retrieval layer is deliberately not “throw it all at embeddings.” It fuses three signals — keyword (BM25), vector similarity, and graph expansion (pull in the neighbours of strong hits) — and every result reports which signals fired.

exo search "hybrid retrieval"
→ hybrid-retrieval   matched_on: [bm25, graph]

That matched_on matters more than it looks. An embeddings-only system gives you a ranked list and no reason — you can’t tell a real match from a vibe. For a brain I’m supposed to trust over years, “why did this surface?” is a feature, not a nicety.

The AI is a librarian, not a hoarder

Here’s the part I care about most. The AI doesn’t just read the brain — it writes to it. Through an MCP server it can search, walk the graph, and author notes. But under a hard rule: every write is a reviewable git diff.

It searches before it writes (extend a note, don’t spawn a duplicate). It links instead of piling. A scheduled “gardener” pass finds orphaned notes and stale summaries and proposes fixes — as commits I can read and git revert if it gets something wrong. No black-box mutation of my memory. Just a librarian that files things while I’m asleep and leaves a paper trail.

So now “what am I building?” is a question with an instant, honest answer: a single map note, kept current, that every project links into. I ask, the AI pulls it, and neither of us has to remember.

Why not just…

Obsidian alone? It’s a lovely viewer — and I still use it as one. But it can’t give an agent structured read/write or explainable retrieval, and its sync is what burned me. Here Obsidian reads the same markdown; it’s a window, not the house.
Embeddings RAG? Opaque and one-directional. It can rank, but it can’t tell you why, and it can’t write back. This is transparent and bidirectional.
Notion / a SaaS brain? Lock-in by design. git clone is my backup and any text editor is my fallback.
A graph database? Unnecessary infra. The graph lives in the wikilinks; SQLite just materialises it. I’ll add Neo4j the day my queries actually outgrow a single file, and not a day sooner.

What it changes

The vault is small still — that’s fine; it grows by use. But the loop already pays off: I work, the AI checkpoints decisions into markdown, and the next session — fresh model, no memory of its own — searches the brain and is caught up in seconds. The knowledge stopped living only in my head and in dead chat logs.

I’m a team of one. There’s no colleague who remembers why I made a call six months ago, no handover doc someone else maintains. Continuity isn’t a nice-to-have; it’s the whole job. A second brain that the AI helps keep alive — and that I can git clone onto any machine in thirty seconds — is the first version of this idea that I actually trust to still be here in five years.

The notes from v1? They’re sitting in a folder, waiting to be triaged into v2. This time I’ll still be opening it.