Exocortex on hippotion

Two Birds That Read the Web for Me: One Hoards, One Scatters

Fri, 12 Jun 2026 00:00:00 +0000

I have a vault of markdown notes that I treat as a second brain, and I run GitOps over it like it’s production infrastructure. It already has agents that work on it from the inside: a nightly gardener that weeds orphans and suggests links, and a Wanderer that collides random pairs of my own notes looking for connections I missed.

The obvious next move is to point an agent at the outside — let it read the web and tell me what matters. That move is also a small landmine, and most “AI reads the internet for you” tooling steps right on it. So this week I built two of them instead of one, named them after corvids, and the reason there are two is the entire point of this post.

Meet the Magpie and the Blue Jay.

The same fear, twice

Before either bird got a name, both inherited a single non-negotiable rule, and it’s worth saying plainly because it’s the part everyone skips:

An agent that reads the internet and writes to your notes is a prompt-injection pipeline aimed straight at your trust root.

My vault isn’t just storage. Every other agent — the gardener, the Wanderer, the search that answers “what am I building?” — reads it as trusted context. So the moment one agent ingests a GitHub README or a news headline (attacker- influenceable text) and is allowed to write a note, a stranger on the internet gets to whisper instructions into the thing my whole system believes. “Structured API” narrows that surface. It does not close it.

Both birds are built on the same chassis as the gardener, and that chassis enforces the fear rather than trusting the model to behave:

Two phases, hard split. A wrapper-owned FETCH step pulls the external text in plain Bash — Claude is not in the loop, can’t be talked into anything, because it isn’t running yet. Then a COLLIDE step starts claude -p with the fetched text handed in as inline data, and that process gets only Read / Glob / Grep / Write. No Bash, no git, no network, no MCP. While untrusted text is in the context window, the agent has no tool that can reach the outside world or rewrite history.
Allowlist, not the open web. Each bird reads a short, named list of sources. Nothing else.
Quarantine, not the vault. Findings land in quarantine//, which lives outside vault/. The indexer never sees it. Nothing it writes is ever auto-wikilinked into the graph. Promotion to a real note is a thing I do, by hand, after reading it.
Blast radius is checked, not assumed. A run may modify only its quarantine directory. Anything written anywhere else is discarded and reported as a violation.
“Nothing found” is a successful run. Neither bird has a quota. This is the honesty contract I stole from the Wanderer — an agent under pressure to produce N findings will manufacture N findings, and manufactured insight is worse than silence.

That’s the shared spine. Now the interesting part: given the same security model, the two birds do almost opposite things, and trying to make one bird do both jobs would have quietly ruined it.

The Magpie hoards what’s already shiny

A magpie collects shiny objects and keeps them close. Mine watches my own GitHub stars.

The premise is slow public signal × private context. I starred some repo three weeks ago, forgot about it, and moved on. Meanwhile my projects shifted. The Magpie runs weekly, pulls my starred repos through one allowlisted endpoint (gh api user/starred), and collides each one against what I’m actively building right now — the live projects, the open hubs.

Its output contract is a tight one: it is a relevance filter. It fires only when a star actually touches live work, and every finding has to name three concrete things — the repo, the project it connects to, and one “so what.” A vague “these are thematically related” doesn’t count as a hit. It’s a watchdog on the dials, not a newsletter.

The supervised proof run, over 28 stars, surfaced exactly two real hits and refused to invent a third:

supertonic (on-device multilingual TTS) × my Hungarian-audiobook voice-cloning project — a possible escape from a TTS fight I’d been losing. I checked: it genuinely supports Hungarian. That’s a hit with a so-what.
agentmemory × the exocortex itself — prior art for persistent AI memory, notably with benchmarks my own notes lacked. (And if you’ve read about the time I benchmarked my own search and it lost, you’ll know how much I needed that nudge.)

The other ~22 stars mapped to tidy thematic clusters and were correctly not reported. That restraint is the feature.

The Blue Jay scatters acorns and forgets where

Here’s the bird that explains why there are two.

Blue jays don’t hoard close like magpies. They cache acorns far and wide and forget where they buried some — and the forgotten ones grow into oak trees. Ecologists think blue jays are why oak forests spread north after the last ice age. Seed dispersal, by way of a bad memory. That is exactly the job I wanted for the second bird, and the metaphor was too good to pass up.

The Blue Jay reads an allowlist of eight RSS feeds, picked so tech and science cross-pollinate:

Tech: Hacker News (high-score front page), lobste.rs, Ars Technica
Science & ideas: phys.org, Quanta, Aeon, Nautilus
Wildcard: Medium — but scoped to specific tag feeds, never the raw firehose of crypto and self-help

Quanta, Aeon, and Nautilus are on that list on purpose: they’re the connective tissue, the feeds where “huh, that’s weirdly similar to…” happens before my vault even gets involved.

And its output contract is the opposite of the Magpie’s. The Blue Jay is a serendipity filter. Its job is to surface the connection that isn’t in my projects yet — the distant idea, the acorn worth burying. If I ran it through the Magpie’s “only fire on a live-work hit” rule, I would strangle the one thing it exists to do. Relevance and serendipity pull in opposite directions, and you can’t tune a single agent to maximize both.

One more load-bearing detail, half design and half security: the Blue Jay collides on the RSS summary only — title, abstract, link. It never pulls the full article body into context. That’s simultaneously the lower-injection path and the right cognitive shape (a headline is a seed; I click through myself from quarantine if the seed is interesting). The narrow input is doing double duty.

Why two birds and not one with a flag

I genuinely considered making this one agent with a --mode=relevance|serendipity switch. I’m glad I didn’t, and the reasoning generalizes past birds:

	Magpie	Blue Jay
Source	my GitHub stars (structured API)	8 RSS feeds (open prose)
Injection risk	low	the highest frontier
Fires when	a star hits live work	a summary sparks a distant idea
Output	relevance: repo → project → so-what	serendipity: the not-yet-relevant connection
Failure mode it guards against	noise / false relevance	being strangled into silence

Two things made the split non-negotiable. First, the output contracts are too different to share one brain — “only speak on a hit” and “speak about the thing that isn’t a hit yet” are contradictory prompts, and a single agent told to do both does neither well. Second, open news is a higher injection frontier than a structured stars API, so the riskier bird deserves its own enforced blast-radius wrapper, not a code path bolted onto the safe one. When two jobs disagree on both what good output is and how dangerous the input is, that’s not a flag. That’s two programs.

So now my vault has two more agents reading the world on a cron. The Magpie runs Saturday at 06:00 and tells me when something I bookmarked finally became relevant. The Blue Jay runs Saturday at 07:00 and buries acorns in a quarantine folder, most of which I’ll ignore — but I only need one of them to grow into an oak.

Both are on probation for their first few runs, because I don’t trust a thing that reads the internet until I’ve watched it behave. But the part I’m actually happy about isn’t the agents. It’s that building the second one forced me to say out loud what the first one was secretly assuming — and the names made the difference impossible to forget. A magpie hoards. A blue jay scatters. You want both, and you do not want them to be the same bird.

I Added a Knowledge Graph to My Search. It Made It Worse.

Fri, 05 Jun 2026 00:00:00 +0000

I have a note in my second brain that I wrote months ago. It says, with the confidence of someone who hadn’t measured anything:

Combining lexical search (BM25) with vector similarity and graph expansion produces more robust recall than embeddings alone.

That sentence shipped into production. My vault of markdown notes gets indexed into a search database, and the search function fuses three signals: BM25 (classic keyword ranking), vector similarity (embeddings), and graph expansion — when a note matches, pull in its linked neighbours too, on the theory that the thing you want is often next to the thing you typed.

It sounds right. Graphs are having a moment in RAG. “Add a knowledge graph to your retrieval” is the kind of thing you can put on a slide and nobody pushes back. I believed it enough to make graph expansion a first-class signal with a weight of 0.5 — equal footing with keyword matching.

This week I finally wrote a benchmark. The graph wasn’t helping. It was the single biggest thing hurting my search.

The setup

30 gold queries against the live vault (63 notes), borrowing the harness shape from an eval framework I’d been reading. Each query has a hand-labelled “correct” note. I measured recall@5 (did the right note land in the top 5?) and MRR (how high did it rank?), across three retrievers:

grep — naive substring term-count. The dumb floor.
bm25 — pure keyword ranking, FTS5’s BM25. The honest baseline.
live — my production hybrid (BM25 + vector + graph).

I expected a clean staircase: grep at the bottom, bm25 in the middle, my clever hybrid on top. That’s the whole reason you build the clever thing.

The scorecard

retriever	recall@5	MRR
grep	0.467	0.307
bm25	0.950	0.826
live (hybrid, `w_graph=0.5`)	0.650	0.520

Read that bottom row twice. My production “smart” search found the right note 65% of the time. Plain keyword search found it 95% of the time. The hybrid I’d been quietly proud of was worse than its own baseline — it broke 9 of 30 queries that BM25 got right. BM25 alone whiffed on exactly one.

The clever layer wasn’t adding intelligence. It was adding noise, confidently.

Why the graph backfired

Here’s the mechanism, and it’s almost funny once you see it.

Graph expansion pulls in a matched note’s neighbours. But in a real knowledge base, the most connected notes are hubs — my inbox of ideas, my project radar, my “things Claude noticed” log. Everything links to them, so they’re everyone’s neighbour. When I searched for something specific, the graph helpfully dragged these popularity-contest winners into the candidate set, and they elbowed the genuinely relevant note clean out of the top 5.

Concrete example. Query: “who owns this knowledge system?” The correct answer is my personal note. BM25 ranked it #5 — just barely in. The hybrid, drunk on graph neighbours, pushed it off the list entirely. The graph didn’t find a better answer. It buried a good one under hubs.

I swept the graph weight to confirm it wasn’t a fluke. It was perfectly monotonic — every increment of graph made search worse:

graph weight	recall@5	MRR
0.0 (off)	0.950	0.826
0.1	0.950	0.737
0.25	0.817	0.564
0.5 (what I shipped)	0.650	0.520

There’s no ambiguity to argue with. More graph, more harm, no exceptions. The value I’d been claiming in that confident note — I finally measured it, and it was negative.

The fix, and the actual lesson

The fix was one line: drop the default graph weight from 0.5 to 0.1. Recall snapped back to 0.95, tying pure BM25. (Turning the graph fully off is marginally better still on MRR; I kept a whisper of it as a tiebreaker, which is a taste call, not a data-driven one.)

But the one-line fix isn’t the point. The point is where graphs belong.

Graph expansion isn’t a bad idea — I aimed it at the wrong job. Precision retrieval (“find me the one note that answers this”) wants to be narrow and literal. Pulling in neighbours is the opposite of what you want; every neighbour is a chance to be wrong. But I have a different feature in this same system — a discovery mode that deliberately collides distant notes to surface unexpected connections. There, neighbour-pulling isn’t noise, it’s the entire product.

Same mechanism. One context it’s poison, the other it’s the point. I’d been running my discovery tool inside my lookup tool and calling it a hybrid.

A few honest caveats, because a benchmark you can’t poke holes in is usually lying: my gold set is self-authored v1, the corpus is small (63 notes), and the vector signal was actually dark during this run — I hadn’t built the embeddings yet, so “hybrid” here was really “BM25 + graph.” The vector half of my original claim is still untested. This is directional, not gospel.

But directional was enough. I’d shipped a claim, the claim got measured, and it didn’t survive contact with 30 queries. That’s the whole reason I keep my brain in git with everything reproducible: so the day I bother to measure, the measurement can actually win the argument against my own confident prose.

The slide-deck version of RAG says add a graph. The benchmark says know which question you’re answering first. I’ll take the benchmark.

I Run GitOps for My Brain

Fri, 01 May 2026 00:00:00 +0000

The pattern I didn’t know I had

This week an AI agent told me something about my own systems that I’d never noticed, and it was correct: I have one favorite architecture, and I’ve built it three times.

At work: git holds Terraform code → Terraform derives the S3 buckets. Nobody clicks around in the AWS console; the repo is the truth.
In the homelab: git holds Kubernetes manifests → ArgoCD derives the cluster. Every app on my rack is a folder in a repo.
In my second brain: a vault of markdown notes → an indexer derives the search database (SQLite FTS + a link graph) that my AI tools query.

Same shape everywhere: a plain-text source of truth in git, and a machine that builds the real thing from it. Master copy, derived state. I never decided this consciously — it’s just how my hands build things now.

GitOps isn’t the git part

Here’s the thing that the third copy got wrong, and it took me embarrassingly long to see because I teach this pattern at the infrastructure layer.

“Configuration in git” existed long before GitOps. What made GitOps an actual shift was the reconciler: ArgoCD doesn’t apply your manifests once and wish you luck. It watches, continuously. When the cluster drifts from the repo, you get an OutOfSync badge, and with selfHeal enabled it puts reality back where the repo says it should be. The loop is the product. Git is just where the loop points.

My vault had no loop. If I edited a note and forgot to rebuild the index, the search results my AI agents rely on were silently stale — no badge, no error, nothing. The only protection was a rule in the repo’s agent instructions: “if files and index disagree, the files win — run the indexer.”

A policy that agents must remember. In other words: I was running Kubernetes with a sticky note on the monitor that says please redeploy after editing the YAML. I would never accept that on my cluster. My brain ran on it for months.

The fix took an afternoon

Two pieces, both boring on purpose.

exo status — the OutOfSync badge. The indexer now stores a content hash per note; status re-hashes the vault and diffs:

{
  "status": "OutOfSync",
  "modified": ["vault/10-notes/interests-themes.md"],
  "new": [],
  "deleted": [],
  "repair": "exo index"
}

Exit code 0 when synced, 1 when not — so scripts and CI can ask the question too, exactly like argocd app get.

Git hooks — the selfHeal. Versioned hooks (core.hooksPath .githooks) on post-commit and post-merge rebuild the index after every commit and pull:

command -v exo >/dev/null 2>&1 || exit 0
EXO_ROOT="$(git rev-parse --show-toplevel)"
exo index >/dev/null 2>&1 && echo "exo: index reconciled (Synced)"

Now every git commit in the vault prints exo: index reconciled (Synced) on its way out. The rule didn’t change — files win — but it stopped being something agents must remember and became something a machine enforces. That’s the entire difference between configuration management and GitOps, replayed at the knowledge layer.

The part where it gets a little strange

The reason I’m writing this post at all: I didn’t have this idea. A scheduled agent did, on what I can only describe as an idle walk.

My vault has a weekly cron job — we call it the Wanderer — that samples pairs of notes that are far apart: different folders, different months, almost no shared vocabulary. A headless Claude gets the pairs with exactly one task: read both notes in full and say whether anything genuinely connects. “Nothing connects” is a successful run. That last sentence is load-bearing — the run always reports its result either way, so the agent never needs to manufacture a finding to have done its job.

On its very first walk, it collided a work note about Terraform-driven S3 provisioning with the architecture map of the vault itself, and wrote: same sentence in different clothes — and the brain copy is missing its reconciler. Then it listed the two fixes you just read about.

Retrieval answers the questions you ask. Distant collisions surface the questions you didn’t know you had. It turns out my second brain didn’t need to get better at remembering — it needed to occasionally interrupt me.

If you keep a vault

Whatever your stack — Obsidian, org-mode, a folder of markdown — if anything derives from your notes (an index, embeddings, a published site), then you have source of truth and derived state, and the GitOps question applies: who notices when they drift? If the answer is “I do, hopefully,” you’re running the sticky-note era. Give it a badge and a loop. It’s an afternoon.