[{"content":"Kubernetes: init container crash loop leaves dirty emptyDir When a pod\u0026rsquo;s init container crashes, Kubernetes restarts only the init container — not the whole pod. The emptyDir volume survives between retries. If your init container does a git clone into a fixed path, the second attempt fails with \u0026ldquo;destination path already exists.\u0026rdquo;\nFix: rm -rf the target dir before cloning.\nrm -rf /git/repo git clone --depth=10 --branch=main https://... /git/repo After many restarts, no manual cleanup needed. Events expire in ~1h, old pods are replaced automatically by the Deployment controller. Check recovery with:\nkubectl get events -n \u0026lt;namespace\u0026gt; --sort-by=\u0026#39;.lastTimestamp\u0026#39; | tail -10 A \u0026ldquo;CPU spike\u0026rdquo; that was actually memory thrashing (adding GA4 to Hugo) Wanted Google Analytics on this blog. PaperMod already calls a google_analytics.html partial in head.html, but it\u0026rsquo;s gated behind hugo.IsProduction | or (eq site.Params.env \u0026quot;production\u0026quot;). My blog pod runs hugo server, which always reports the environment as development — so the partial never fires. I \u0026ldquo;fixed\u0026rdquo; that by setting env = \u0026quot;production\u0026quot;.\nThat was the wrong lever. env = production flips on Hugo\u0026rsquo;s whole production path — minification, OpenGraph, Twitter cards, schema JSON across every page. The next full rebuild blew past the pod\u0026rsquo;s 128Mi memory limit and got OOMKilled (exit 137). Server load jumped.\nThe right way to add GA without touching the build mode: drop the tag in layouts/_partials/extend_head.html. PaperMod includes that partial unconditionally, above the production guard — so it loads under hugo server too.\nBut here\u0026rsquo;s the part that fooled me. After reverting env, load was still climbing — to ~14 on a single node — and ps showed hugo at \u0026ldquo;500% CPU\u0026rdquo;. Looked like a runaway compute loop. It wasn\u0026rsquo;t:\n%Cpu(s): 2.1 us, 41.0 sy, 6.9 id, 50.0 wa \u0026lt;- 50% iowait, 2% userspace PID ... S %CPU COMMAND ... D 333 hugo \u0026lt;- state D, RES pinned at 127MiB (the 128Mi cgroup limit) Two lessons:\nps %CPU is a lifetime average, not instantaneous. A process that ran hot for 1s then blocked still shows a big number for a while. Use top for what\u0026rsquo;s happening now. High load + high %wa + a D-state process sitting at its cgroup memory limit = memory thrashing, not CPU. Hugo wasn\u0026rsquo;t computing — it was wedged against the 128Mi ceiling, and every allocation triggered kernel reclaim/swap. A sub-second build dragged out for minutes in uninterruptible I/O sleep, and all those blocked tasks are what inflate load average (Linux counts D-state in load). The actual fix was boring: 128Mi was always marginal for hugo-extended + PaperMod. Bumped the limit to 512Mi and the thrash vanished.\nTakeaway: when load spikes, read %wa and process state before blaming the CPU. And don\u0026rsquo;t flip env=production on a long-lived hugo server just to ungate one partial — use extend_head.html.\nSelf-hosting Supabase (lean) on k3s: the gotcha checklist Ran the community supabase/supabase chart on a 16Gi single node — enabled db, rest, auth, meta, studio, kong + the log pipeline (analytics/Logflare + vector); left realtime, storage, imgproxy, edge-functions off. The deploy is easy; these are the things that actually bit:\nStudio shows \u0026ldquo;no tables\u0026rdquo;. Supabase is single-database by design — Studio, PostgREST and auth all use the database named postgres. App tables in a separate database are invisible to all of it. Put your schema in postgres\u0026rsquo;s public schema. Studio won\u0026rsquo;t schedule with edge-functions disabled. Its Deployment mounts the functions PVC unconditionally. Either run functions, or create the PVC yourself and leave functions off. edge-functions crashloops if you keep it: it boots by fetching a Deno module from the internet, which a deny-all egress policy blocks. You usually only want the PVC it leaves behind anyway. vector (log collector) stays silent under a deny-all policy. It discovers pods via the Kubernetes API, so it needs API egress, not just app ports (allowEgressToKubeApi). A log shipper that can\u0026rsquo;t reach the API collects nothing and doesn\u0026rsquo;t say why. secretRef must contain every key the chart maps — including non-secret ones like database and openAiApiKey. Miss one and pods sit in CreateContainerConfigError. ESO ExternalSecret shows perpetual OutOfSync in Argo CD unless you spell out the remoteRef defaults (conversionStrategy: Default, decodingStrategy: None, metadataPolicy: None) — ESO writes them back, and the compact form drifts. postgres is not a superuser. CREATE DATABASE … OWNER app fails with must be member of role. Supabase keeps the real superuser (supabase_admin) to itself; GRANT app TO postgres first. Logflare needs no BigQuery. It runs on the self-hosted Postgres backend (the _supabase database, _analytics schema) — logs land in _analytics.log_events_*. None of this is in the README. It\u0026rsquo;s the gap between \u0026ldquo;I deployed Supabase\u0026rdquo; and \u0026ldquo;I run it.\u0026rdquo;\n","permalink":"https://blog.hippotion.com/posts/dev-notes/","summary":"Running notes on things I\u0026rsquo;ve hit, fixed, or found worth remembering.","title":"📝 Dev Notes"},{"content":"I have a vault of markdown notes that I treat as a second brain, and I run GitOps over it like it\u0026rsquo;s production infrastructure. It already has agents that work on it from the inside: a nightly gardener that weeds orphans and suggests links, and a Wanderer that collides random pairs of my own notes looking for connections I missed.\nThe obvious next move is to point an agent at the outside — let it read the web and tell me what matters. That move is also a small landmine, and most \u0026ldquo;AI reads the internet for you\u0026rdquo; tooling steps right on it. So this week I built two of them instead of one, named them after corvids, and the reason there are two is the entire point of this post.\nMeet the Magpie and the Blue Jay.\nThe same fear, twice Before either bird got a name, both inherited a single non-negotiable rule, and it\u0026rsquo;s worth saying plainly because it\u0026rsquo;s the part everyone skips:\nAn agent that reads the internet and writes to your notes is a prompt-injection pipeline aimed straight at your trust root.\nMy vault isn\u0026rsquo;t just storage. Every other agent — the gardener, the Wanderer, the search that answers \u0026ldquo;what am I building?\u0026rdquo; — reads it as trusted context. So the moment one agent ingests a GitHub README or a news headline (attacker- influenceable text) and is allowed to write a note, a stranger on the internet gets to whisper instructions into the thing my whole system believes. \u0026ldquo;Structured API\u0026rdquo; narrows that surface. It does not close it.\nBoth birds are built on the same chassis as the gardener, and that chassis enforces the fear rather than trusting the model to behave:\nTwo phases, hard split. A wrapper-owned FETCH step pulls the external text in plain Bash — Claude is not in the loop, can\u0026rsquo;t be talked into anything, because it isn\u0026rsquo;t running yet. Then a COLLIDE step starts claude -p with the fetched text handed in as inline data, and that process gets only Read / Glob / Grep / Write. No Bash, no git, no network, no MCP. While untrusted text is in the context window, the agent has no tool that can reach the outside world or rewrite history. Allowlist, not the open web. Each bird reads a short, named list of sources. Nothing else. Quarantine, not the vault. Findings land in quarantine/\u0026lt;bird\u0026gt;/, which lives outside vault/. The indexer never sees it. Nothing it writes is ever auto-wikilinked into the graph. Promotion to a real note is a thing I do, by hand, after reading it. Blast radius is checked, not assumed. A run may modify only its quarantine directory. Anything written anywhere else is discarded and reported as a violation. \u0026ldquo;Nothing found\u0026rdquo; is a successful run. Neither bird has a quota. This is the honesty contract I stole from the Wanderer — an agent under pressure to produce N findings will manufacture N findings, and manufactured insight is worse than silence. That\u0026rsquo;s the shared spine. Now the interesting part: given the same security model, the two birds do almost opposite things, and trying to make one bird do both jobs would have quietly ruined it.\nThe Magpie hoards what\u0026rsquo;s already shiny A magpie collects shiny objects and keeps them close. Mine watches my own GitHub stars.\nThe premise is slow public signal × private context. I starred some repo three weeks ago, forgot about it, and moved on. Meanwhile my projects shifted. The Magpie runs weekly, pulls my starred repos through one allowlisted endpoint (gh api user/starred), and collides each one against what I\u0026rsquo;m actively building right now — the live projects, the open hubs.\nIts output contract is a tight one: it is a relevance filter. It fires only when a star actually touches live work, and every finding has to name three concrete things — the repo, the project it connects to, and one \u0026ldquo;so what.\u0026rdquo; A vague \u0026ldquo;these are thematically related\u0026rdquo; doesn\u0026rsquo;t count as a hit. It\u0026rsquo;s a watchdog on the dials, not a newsletter.\nThe supervised proof run, over 28 stars, surfaced exactly two real hits and refused to invent a third:\nsupertonic (on-device multilingual TTS) × my Hungarian-audiobook voice-cloning project — a possible escape from a TTS fight I\u0026rsquo;d been losing. I checked: it genuinely supports Hungarian. That\u0026rsquo;s a hit with a so-what. agentmemory × the exocortex itself — prior art for persistent AI memory, notably with benchmarks my own notes lacked. (And if you\u0026rsquo;ve read about the time I benchmarked my own search and it lost, you\u0026rsquo;ll know how much I needed that nudge.) The other ~22 stars mapped to tidy thematic clusters and were correctly not reported. That restraint is the feature.\nThe Blue Jay scatters acorns and forgets where Here\u0026rsquo;s the bird that explains why there are two.\nBlue jays don\u0026rsquo;t hoard close like magpies. They cache acorns far and wide and forget where they buried some — and the forgotten ones grow into oak trees. Ecologists think blue jays are why oak forests spread north after the last ice age. Seed dispersal, by way of a bad memory. That is exactly the job I wanted for the second bird, and the metaphor was too good to pass up.\nThe Blue Jay reads an allowlist of eight RSS feeds, picked so tech and science cross-pollinate:\nTech: Hacker News (high-score front page), lobste.rs, Ars Technica Science \u0026amp; ideas: phys.org, Quanta, Aeon, Nautilus Wildcard: Medium — but scoped to specific tag feeds, never the raw firehose of crypto and self-help Quanta, Aeon, and Nautilus are on that list on purpose: they\u0026rsquo;re the connective tissue, the feeds where \u0026ldquo;huh, that\u0026rsquo;s weirdly similar to\u0026hellip;\u0026rdquo; happens before my vault even gets involved.\nAnd its output contract is the opposite of the Magpie\u0026rsquo;s. The Blue Jay is a serendipity filter. Its job is to surface the connection that isn\u0026rsquo;t in my projects yet — the distant idea, the acorn worth burying. If I ran it through the Magpie\u0026rsquo;s \u0026ldquo;only fire on a live-work hit\u0026rdquo; rule, I would strangle the one thing it exists to do. Relevance and serendipity pull in opposite directions, and you can\u0026rsquo;t tune a single agent to maximize both.\nOne more load-bearing detail, half design and half security: the Blue Jay collides on the RSS summary only — title, abstract, link. It never pulls the full article body into context. That\u0026rsquo;s simultaneously the lower-injection path and the right cognitive shape (a headline is a seed; I click through myself from quarantine if the seed is interesting). The narrow input is doing double duty.\nWhy two birds and not one with a flag I genuinely considered making this one agent with a --mode=relevance|serendipity switch. I\u0026rsquo;m glad I didn\u0026rsquo;t, and the reasoning generalizes past birds:\nMagpie Blue Jay Source my GitHub stars (structured API) 8 RSS feeds (open prose) Injection risk low the highest frontier Fires when a star hits live work a summary sparks a distant idea Output relevance: repo → project → so-what serendipity: the not-yet-relevant connection Failure mode it guards against noise / false relevance being strangled into silence Two things made the split non-negotiable. First, the output contracts are too different to share one brain — \u0026ldquo;only speak on a hit\u0026rdquo; and \u0026ldquo;speak about the thing that isn\u0026rsquo;t a hit yet\u0026rdquo; are contradictory prompts, and a single agent told to do both does neither well. Second, open news is a higher injection frontier than a structured stars API, so the riskier bird deserves its own enforced blast-radius wrapper, not a code path bolted onto the safe one. When two jobs disagree on both what good output is and how dangerous the input is, that\u0026rsquo;s not a flag. That\u0026rsquo;s two programs.\nSo now my vault has two more agents reading the world on a cron. The Magpie runs Saturday at 06:00 and tells me when something I bookmarked finally became relevant. The Blue Jay runs Saturday at 07:00 and buries acorns in a quarantine folder, most of which I\u0026rsquo;ll ignore — but I only need one of them to grow into an oak.\nBoth are on probation for their first few runs, because I don\u0026rsquo;t trust a thing that reads the internet until I\u0026rsquo;ve watched it behave. But the part I\u0026rsquo;m actually happy about isn\u0026rsquo;t the agents. It\u0026rsquo;s that building the second one forced me to say out loud what the first one was secretly assuming — and the names made the difference impossible to forget. A magpie hoards. A blue jay scatters. You want both, and you do not want them to be the same bird.\n","permalink":"https://blog.hippotion.com/posts/two-birds-read-the-web/","summary":"I gave my second brain two agents that read the outside world and collide it against my notes. A Magpie watches my GitHub stars and only speaks when something hits live work. A Blue Jay reads a handful of RSS feeds and surfaces the distant, not-yet-relevant connection. They share a security spine — and they have deliberately opposite jobs. Here\u0026rsquo;s why the split is the whole design.","title":"Two Birds That Read the Web for Me: One Hoards, One Scatters"},{"content":"I have a note in my second brain that I wrote months ago. It says, with the confidence of someone who hadn\u0026rsquo;t measured anything:\nCombining lexical search (BM25) with vector similarity and graph expansion produces more robust recall than embeddings alone.\nThat sentence shipped into production. My vault of markdown notes gets indexed into a search database, and the search function fuses three signals: BM25 (classic keyword ranking), vector similarity (embeddings), and graph expansion — when a note matches, pull in its linked neighbours too, on the theory that the thing you want is often next to the thing you typed.\nIt sounds right. Graphs are having a moment in RAG. \u0026ldquo;Add a knowledge graph to your retrieval\u0026rdquo; is the kind of thing you can put on a slide and nobody pushes back. I believed it enough to make graph expansion a first-class signal with a weight of 0.5 — equal footing with keyword matching.\nThis week I finally wrote a benchmark. The graph wasn\u0026rsquo;t helping. It was the single biggest thing hurting my search.\nThe setup 30 gold queries against the live vault (63 notes), borrowing the harness shape from an eval framework I\u0026rsquo;d been reading. Each query has a hand-labelled \u0026ldquo;correct\u0026rdquo; note. I measured recall@5 (did the right note land in the top 5?) and MRR (how high did it rank?), across three retrievers:\ngrep — naive substring term-count. The dumb floor. bm25 — pure keyword ranking, FTS5\u0026rsquo;s BM25. The honest baseline. live — my production hybrid (BM25 + vector + graph). I expected a clean staircase: grep at the bottom, bm25 in the middle, my clever hybrid on top. That\u0026rsquo;s the whole reason you build the clever thing.\nThe scorecard retriever recall@5 MRR grep 0.467 0.307 bm25 0.950 0.826 live (hybrid, w_graph=0.5) 0.650 0.520 Read that bottom row twice. My production \u0026ldquo;smart\u0026rdquo; search found the right note 65% of the time. Plain keyword search found it 95% of the time. The hybrid I\u0026rsquo;d been quietly proud of was worse than its own baseline — it broke 9 of 30 queries that BM25 got right. BM25 alone whiffed on exactly one.\nThe clever layer wasn\u0026rsquo;t adding intelligence. It was adding noise, confidently.\nWhy the graph backfired Here\u0026rsquo;s the mechanism, and it\u0026rsquo;s almost funny once you see it.\nGraph expansion pulls in a matched note\u0026rsquo;s neighbours. But in a real knowledge base, the most connected notes are hubs — my inbox of ideas, my project radar, my \u0026ldquo;things Claude noticed\u0026rdquo; log. Everything links to them, so they\u0026rsquo;re everyone\u0026rsquo;s neighbour. When I searched for something specific, the graph helpfully dragged these popularity-contest winners into the candidate set, and they elbowed the genuinely relevant note clean out of the top 5.\nConcrete example. Query: \u0026ldquo;who owns this knowledge system?\u0026rdquo; The correct answer is my personal note. BM25 ranked it #5 — just barely in. The hybrid, drunk on graph neighbours, pushed it off the list entirely. The graph didn\u0026rsquo;t find a better answer. It buried a good one under hubs.\nI swept the graph weight to confirm it wasn\u0026rsquo;t a fluke. It was perfectly monotonic — every increment of graph made search worse:\ngraph weight recall@5 MRR 0.0 (off) 0.950 0.826 0.1 0.950 0.737 0.25 0.817 0.564 0.5 (what I shipped) 0.650 0.520 There\u0026rsquo;s no ambiguity to argue with. More graph, more harm, no exceptions. The value I\u0026rsquo;d been claiming in that confident note — I finally measured it, and it was negative.\nThe fix, and the actual lesson The fix was one line: drop the default graph weight from 0.5 to 0.1. Recall snapped back to 0.95, tying pure BM25. (Turning the graph fully off is marginally better still on MRR; I kept a whisper of it as a tiebreaker, which is a taste call, not a data-driven one.)\nBut the one-line fix isn\u0026rsquo;t the point. The point is where graphs belong.\nGraph expansion isn\u0026rsquo;t a bad idea — I aimed it at the wrong job. Precision retrieval (\u0026ldquo;find me the one note that answers this\u0026rdquo;) wants to be narrow and literal. Pulling in neighbours is the opposite of what you want; every neighbour is a chance to be wrong. But I have a different feature in this same system — a discovery mode that deliberately collides distant notes to surface unexpected connections. There, neighbour-pulling isn\u0026rsquo;t noise, it\u0026rsquo;s the entire product.\nSame mechanism. One context it\u0026rsquo;s poison, the other it\u0026rsquo;s the point. I\u0026rsquo;d been running my discovery tool inside my lookup tool and calling it a hybrid.\nA few honest caveats, because a benchmark you can\u0026rsquo;t poke holes in is usually lying: my gold set is self-authored v1, the corpus is small (63 notes), and the vector signal was actually dark during this run — I hadn\u0026rsquo;t built the embeddings yet, so \u0026ldquo;hybrid\u0026rdquo; here was really \u0026ldquo;BM25 + graph.\u0026rdquo; The vector half of my original claim is still untested. This is directional, not gospel.\nBut directional was enough. I\u0026rsquo;d shipped a claim, the claim got measured, and it didn\u0026rsquo;t survive contact with 30 queries. That\u0026rsquo;s the whole reason I keep my brain in git with everything reproducible: so the day I bother to measure, the measurement can actually win the argument against my own confident prose.\nThe slide-deck version of RAG says add a graph. The benchmark says know which question you\u0026rsquo;re answering first. I\u0026rsquo;ll take the benchmark.\n","permalink":"https://blog.hippotion.com/posts/graph-hurt-my-search/","summary":"My second brain searches over a vault of markdown using BM25 + vectors + graph expansion. I\u0026rsquo;d been telling people the graph improved recall. Then I finally benchmarked it, and plain keyword search beat my fancy hybrid — the graph was actively dragging the right answers out of the results. Here\u0026rsquo;s the scorecard and what it taught me about where graphs actually belong.","title":"I Added a Knowledge Graph to My Search. It Made It Worse."},{"content":"The silence My house runs on quiet little robots. A tracker watches my kombucha ferment. A job narrates kids\u0026rsquo; books in Hungarian. A media stack pulls and files things. Home Assistant minds the sensors. A dozen services, all doing their jobs, all completely mute. When a batch finished or an import failed, I found out the same way every time: by going to look.\nThen the silence got expensive. Claude Code stopped dead in the middle of a task because I\u0026rsquo;d burned through my plan\u0026rsquo;s usage window — no warning, no countdown, just a wall. The information existed; a dashboard in my own cluster was already polling it. It just had no way to reach my pocket.\nSo I built one thing: a push bus. One place anything in the cluster can POST to, that actually buzzes my phone. And the first job I gave it was to warn me before my AI assistant goes dark.\nThe boring part (said honestly) The bus is ntfy — a self-hosted pub/sub notifier. Picking it took about five minutes, because self-hosting ntfy for a homelab is a thoroughly solved problem. There are at least three off-the-shelf bridges from Prometheus Alertmanager to ntfy. I\u0026rsquo;m not going to pretend the bus is the clever bit.\nWhat I did do deliberately:\n📦 Deployed it GitOps-native — one entry in my app-of-apps, reconciled by Argo CD, no docker run anywhere. 🔒 Locked it to deny-all auth with bearer tokens. Security alerts ride this bus; a world-readable topic on a public URL was a non-starter. (Which also means it sits outside my usual OAuth gate — the phone app can\u0026rsquo;t do an interactive login flow, so ntfy does its own token auth.) 🏷️ Topics by severity: hl-crit, hl-warn, hl-info, hl-event. Subscribe and mute by how much I care. Then the interesting parts showed up at the edges, where they always do.\nEdge one: my own firewall 403\u0026rsquo;d me First test, the usage producer POSTing to https://ntfy.hippotion.com:\nHTTP 403 Forbidden error code: 1010 That 1010 looks like ntfy rejecting my token. It isn\u0026rsquo;t. It\u0026rsquo;s Cloudflare. Error 1010 means \u0026ldquo;your browser signature is banned\u0026rdquo; — Cloudflare\u0026rsquo;s bot protection took one look at a Python script\u0026rsquo;s urllib User-Agent and slammed the door.\nMy own producer couldn\u0026rsquo;t reach my own bus, because the request left the cluster, went all the way out to my own edge, and got flagged as a bot on the way back in.\nThe fix is the architecture I should\u0026rsquo;ve had from the start: in-cluster producers POST to the internal service address and never touch the public internet at all.\n# wrong: out to Cloudflare and back, gets bot-blocked https://ntfy.hippotion.com/hl-warn # right: stays inside the cluster http://ntfy.web-ntfy.svc.cluster.local/hl-warn The phone still uses the public URL happily — the real ntfy app carries a signature Cloudflare trusts. Only scripts trip 1010. Lesson: your own edge is not your friend when you\u0026rsquo;re a script. Keep cluster traffic in the cluster.\nEdge two: the obvious data source was lying To warn me about Claude usage, the naïve move is to parse Claude Code\u0026rsquo;s local logs — they sit right there in ~/.claude/projects/.../*.jsonl, token counts and all.\nDon\u0026rsquo;t. Those counts are unreliable for accounting — known to undercount, wildly, in some cases by ~100x. Every tool that parses that JSONL inherits the bug.\nThe number that\u0026rsquo;s actually true lives in the claude.ai usage API — the same five_hour and seven_day windows your plan enforces against. And I already had a service polling exactly that. So the producer is just a tiny sidecar on that existing pod, reading its /api/usage over localhost (same pod — no network policy to negotiate, no second credential, nothing else hammering claude.ai):\n📈 ≥80% of a window → hl-warn (high). 🚨 ≥95% → hl-crit (urgent). 🔁 One ping per window per reset cycle, escalating warn→crit, keyed on the reset timestamp so it never spams. The first time it mattered, my phone buzzed at 80% with hours of runway left instead of a brick wall mid-task.\nWhat I\u0026rsquo;d tell past me Three things, none of them about ntfy:\nReuse the signal you already have. I didn\u0026rsquo;t build a usage poller — I bolted a sidecar onto the one already running. The smallest producer is one that reads localhost. Your own edge can betray you. A firewall that protects you from bots will happily block your own automation. In-cluster talks in-cluster. Check whether your data source is telling the truth before you build an alert on it. An alert you don\u0026rsquo;t trust is worse than no alert — you\u0026rsquo;ll learn to ignore it, and then it\u0026rsquo;ll be right once. Next, the high-leverage move: point Prometheus Alertmanager at the same bus, and every infra alert I have — plus every one I\u0026rsquo;ll ever add — lands on the phone through one bridge. The kombucha ping can wait. The disk-full one can\u0026rsquo;t.\nThe house is still full of quiet robots. The difference is now they know my number.\n","permalink":"https://blog.hippotion.com/posts/every-robot-texts-me/","summary":"My house is full of automation that never told me anything — until I gave it one push bus. The first thing I taught it to do was warn me before Claude Code cuts out mid-task.","title":"Every Robot in My House Can Text Me Now"},{"content":"The question I actually had It started as a nervous-Sunday kind of question: is a third party trying to get into my server — over SSH, or some other way? I run a single-node Kubernetes homelab that hosts a couple dozen little apps, some of them public. You read about credential-stuffing bots and you start to wonder who\u0026rsquo;s been rattling the handle while you slept.\nSo I did the audit. The good news came first, and it\u0026rsquo;s worth saying plainly because it\u0026rsquo;s the part most homelabs get wrong: the front door is solid. Nothing is reachable from the internet except through a Cloudflare Tunnel — an outbound-only connection, zero open inbound ports on my router. Almost every service sits behind OAuth. The cluster has 140 network policies doing real east-west segmentation. And the login history? Eleven straight weeks where every single shell login came from one IP — my own workstation on the LAN. No strangers. No 3 a.m. logins from a VPS in another hemisphere.\nI could have stopped there feeling good. That would have been a mistake.\nThe scary finding wasn\u0026rsquo;t an attacker The useful question turned out not to be \u0026ldquo;is someone knocking?\u0026rdquo; but \u0026ldquo;if someone got in, would anything tell me?\u0026rdquo; And when I traced that wire, it ended in the dark.\nI have a full monitoring stack — Prometheus, Grafana, Alertmanager, the works. Alertmanager was running. It was also configured to notify exactly no one: no receivers, and upstream, no alert rules at all. It was a smoke detector with the battery taken out and, for good measure, no smoke sensor either. If an attacker had walked in, the alarm would have stayed perfectly, silently green.\nThat reframed the whole job. Three gaps, in priority order.\nGap 1 — an alarm with no one to call I built the missing chain end to end. A small exporter on the host parses the SSH journal and fail2ban state and writes metrics into node_exporter\u0026rsquo;s textfile collector — so it rides the monitoring I already had instead of adding a new moving part. On top sit the alert rules that were never there. The one that matters most is blunt:\nA shell login succeeded from a non-LAN IP.\nThat should be impossible in normal life, so if it ever fires, I want it shouting. It now emails me the instant it happens, alongside quieter alerts for brute-force spikes, distributed scans, fail2ban going down, and — the meta-alert I\u0026rsquo;m fondest of — the watchdog itself going stale, because a security monitor that silently dies is worse than none. And fail2ban now actually bans the bots, with escalating ban times and my LAN permanently on the allow-list.\nThe honest lesson: I\u0026rsquo;d been treating \u0026ldquo;I have Prometheus\u0026rdquo; as if it meant \u0026ldquo;I have monitoring.\u0026rdquo; Dashboards you have to remember to look at are not monitoring. Monitoring is the thing that interrupts you. Until an alert can reach your phone, you don\u0026rsquo;t have a security alarm — you have a security museum.\nGap 2 — there was a web terminal on the open internet This is the one that made me wince. Among my public hostnames was ttyd — a browser-based shell. A full terminal on my server, reachable from anywhere, sitting behind a single OAuth proxy. One misconfiguration, one OAuth bypass, and that\u0026rsquo;s not \u0026ldquo;an app is compromised,\u0026rdquo; that\u0026rsquo;s root on the box from a browser tab.\nThe fix here isn\u0026rsquo;t more locks. It\u0026rsquo;s the realization that the strongest control is not exposing the thing at all. I deleted the web terminal entirely — app, manifests, dashboard tile, all of it. Then I went down the public hostname list and pulled everything with no business being public off the tunnel: the secrets UI, the ingress dashboard, Prometheus, Alertmanager, the network-observability console, the DNS admin. They still work — on my LAN, over the same wildcard cert — they\u0026rsquo;re just not the internet\u0026rsquo;s business anymore. A service that isn\u0026rsquo;t exposed has no attack surface to harden.\nGap 3 — no floor under the blast radius The network policies limit how far a compromised pod can talk sideways. But nothing stopped a workload from running as root, mounting the host filesystem, or grabbing the host network in the first place. So I turned on Kubernetes' built-in Pod Security Admission: every namespace now at least reports baseline violations, and the clean app namespaces enforce baseline — meaning a compromised app there simply cannot request privileged mode or a hostPath mount. It\u0026rsquo;s a floor. Floors are underrated.\nWhat the audit was really about I went looking for an intruder and didn\u0026rsquo;t find one — the logs were clean, the front door held. What I found instead was that I\u0026rsquo;d built something secure at the perimeter and then never asked the uncomfortable follow-up: what happens after the perimeter? The answer had been \u0026ldquo;nothing happens, and no one is told,\u0026rdquo; and I just hadn\u0026rsquo;t looked.\nThree principles I\u0026rsquo;m taking with me:\nAn alarm that can\u0026rsquo;t reach you is decoration. Wire the notification first; the rules are easy once something is listening. Don\u0026rsquo;t expose it beats add more auth. Every hostname you take off the public internet is a class of attack you no longer have to be clever about. Give the blast radius a floor. Assume one thing gets popped, and decide in advance how far it gets. The best part: all of it is GitOps. The intrusion alerts, the un-exposing, the pod-security floor — every change is a commit, reviewable and revertible, and my cluster reconciles itself to match. The audit didn\u0026rsquo;t just make the homelab safer. It wrote down why it\u0026rsquo;s safer, in a form the next version of me can read.\nNow if someone knocks, I\u0026rsquo;ll know. And the web terminal isn\u0026rsquo;t answering the door anymore — because it\u0026rsquo;s gone.\n","permalink":"https://blog.hippotion.com/posts/is-anyone-knocking/","summary":"I set out to answer a simple worry — is someone trying to get into my server? — and found the scarier question underneath it: if they did, would I even know? My front door was solid. The inside had an alarm with the wires cut, a web terminal sitting on the open internet, and no floor under the blast radius. Here\u0026rsquo;s the audit, and the three things I fixed.","title":"Is Anyone Knocking? A Security Pass on My Homelab"},{"content":"Open data nobody opens Every vote in the European Parliament and the Slovak National Council is public. The EU even ships it as a clean API. And almost nobody reads it, because the raw record is unreadable: \u0026ldquo;Návrh poslanca… ktorým sa dopĺňa zákon č. 581/2004 Z. z. … (tlač 1259) — tretie čítanie, hlasovanie o návrhu zákona ako o celku.\u0026rdquo; Multiply that by a few hundred votes a sitting. Transparency that no human can parse is transparency on paper only.\nSo I built VoteWatch — a small site on my homelab that turns the record into something a citizen can actually use: what was decided, who voted, and do you agree?\nVoteWatch SK: each decision summarised in plain language, which parties voted how, and a Yes/No question whose live citizen tally sits next to how parliament actually voted — labelled agree or gap.\nTwo halves, one lopsided The EU half was easy. HowTheyVote.eu already did the hard work and publishes roll-call votes as a clean, open-licensed API. You consume it; you don\u0026rsquo;t scrape it.\nThe Slovak half is where the real work lives — and the real value. nrsr.sk has no API. The HTML is the contract: a results listing, and per-vote pages where each MP appears next to a one-letter code ([Z] za, [P] proti, [?] zdržal sa). So the national half is a genuine scraper — the unglamorous kind that nobody maintains, which is exactly why a gap exists to fill. The unglamorous part is the moat.\nFrom ten votes to one question A single bill generates a pile of procedural roll-calls — shorten the debate, move to third reading, amendment block A, amendment block B, the bill as a whole. Ten rows that are really one decision. Nobody wants ten rows.\nSo the pipeline groups votes by bill, then asks an LLM (llama-3.3-70b on NVIDIA NIM) to do exactly one job: turn the bureaucratic titles into a plain headline, two sentences of summary, and one neutral Yes/No question a person can actually answer. Seven votes on the health-insurer bill collapse into: \u0026ldquo;Changes to the health-insurance law\u0026rdquo; → \u0026ldquo;Do you agree with the health-insurance bill?\u0026rdquo;\nThe rule that keeps it honest Here\u0026rsquo;s the line I won\u0026rsquo;t cross, and it\u0026rsquo;s the whole reason I trust the result: the AI writes the prose, but it never decides a fact.\nWhich votes belong to one bill? Deterministic — parsed from the bill number. Did it pass? Deterministic — read from the result row. Which parties voted for, against, abstained? Deterministic — tallied from the per-MP record, shown as Za: SMER-SD, HLAS-SD, SNS · Zdržali sa: PS, KDH, SaS. The model only touches language: the headline, the summary, the question. If it hallucinates, you get an awkward sentence — never a wrong vote count. And if the model fails entirely, the card falls back to the raw title. The facts come from the record; the model just makes the record legible. For civic data, that separation isn\u0026rsquo;t a nice-to-have — it\u0026rsquo;s the difference between a tool and a liability. (Every card says so out loud: summaries are AI-generated; the raw record prevails.)\nThe part that closes the loop Showing people how their representatives voted is only half a feedback loop. The other half is letting them answer.\nEach decision carries its one distilled question and two buttons — Áno / Nie. You vote, and the site shows the citizen tally next to how parliament actually decided, with the honest verdict on top: \u0026quot;✓ Citizens and Parliament agree\u0026quot; or \u0026quot;⚖ Gap between citizens and Parliament.\u0026quot; That gap is the entire point. It\u0026rsquo;s the thesis behind a side project of mine called veracracy — governance measured against verified knowledge and the actual will of the governed — made concrete enough to click.\nThe same loop on the European Parliament — dossiers consolidated, political-group stances (EPP, S\u0026amp;D, PfE…), and the citizen poll under each topic.\nThe backend is deliberately boring. The site is static (git-synced nginx, same as this blog). Votes can\u0026rsquo;t POST to a static page, so they go to a public n8n webhook that records to a data table and returns live tallies — no new service, no database, just the automation box I already run. Vote keys are namespaced so EU and Slovak polls share one store without colliding.\nThe honest caveat Dedup is browser-local. It stops casual double-voting, but behind a Cloudflare tunnel every request shares one IP, so this is an indicative signal, not a secured ballot. That\u0026rsquo;s the right altitude for \u0026ldquo;let people express an opinion.\u0026rdquo; The day it needs to mean more than that, it needs real identity first — and I\u0026rsquo;d rather ship the honest version than fake the robust one.\nIt\u0026rsquo;s live at votewatch.hippotion.com — the EU parliament and the Slovak NR SR, every MEP and every poslanec, in plain language, with a button that asks the only question that matters after a vote: would you have voted the same way?\nA neutral record — what was decided and who decided it — not a villain list. Data © HowTheyVote.eu (ODbL) and nrsr.sk.\n","permalink":"https://blog.hippotion.com/posts/votewatch/","summary":"Parliamentary roll-call votes are public, machine-readable, and almost completely unread. I built a thing that scrapes them, distills each decision into one plain-language question, shows which party voted which way, and lets you register whether you agree — then puts your answer next to how parliament actually voted. The rule that keeps it honest: the AI writes the summary, but it never decides a fact.","title":"VoteWatch: How Your Representatives Voted — and Whether You'd Agree"},{"content":"The missing question Democracies ask what do we want? Markets ask what will we pay? Both are good questions, and between them they run most of the world. But there\u0026rsquo;s a third question that almost no system asks before it acts, and it\u0026rsquo;s the one that decides whether the first two produce anything good:\nWhat do we actually know — and how do we know it?\nI gave the idea of governing as if that question came first a name — veracracy, from veritas (truth) and kratos (rule). Not rule by experts in a back room. Rule by evidence that anyone can inspect, deliberated by the people it binds. It lives at veracracy.hippotion.com, and this is the honest account of what it is and why an infrastructure engineer ended up building a shrine to an idea.\nThe clock reads ~2051 — computed, not wished, from one published assumption. The sun\u0026rsquo;s height above the horizon is how far the weighted dials have risen; the tag tracks the beacon (Taiwan) against the world.\nWhat it actually means Strip the romance and veracracy is five fairly concrete commitments:\nTruth as civic infrastructure. Verified, open evidence maintained like roads and water — with provenance, versioning, and repair crews. A public utility, not a content feed. Radical transparency. Binding decisions carry their evidence trail by default. A law without its sources is a claim, not a law. Decentralised trust. No ministry of truth. Verification is plural, adversarial, and bridging — many checkers, no single owner, consensus that has to span camps to count. Ethical AI as auditor and advocate. Machines that trace claims, surface contradictions, and argue against the powerful reading of the data — never as oracle, always as instrument. Participatory epistemocracy. Citizens not as voters once every four years, but as standing jurors of what is true enough to act on, where weight accrues to evidence rather than volume. If you squint, none of that is a political program. It\u0026rsquo;s the same instinct I bring to a cluster — provenance, versioning, a reconciler, no single point of trust — pointed at the question of how a society decides what\u0026rsquo;s real. That\u0026rsquo;s the only way I know how to think, so that\u0026rsquo;s the lens I used.\nWhy a clock, and why it can run backward Here\u0026rsquo;s where most \u0026ldquo;vision\u0026rdquo; projects lose me, and where I tried not to lose myself. A manifesto is cheap. Anyone can declare a better world and feel moral. So instead of a manifesto, the site is a measurement.\nIt shows a single year — first light, the year the first place on Earth might plausibly govern this way. Right now it reads around 2051. But that number isn\u0026rsquo;t a wish; it\u0026rsquo;s computed, from one assumption stated in plain sight: the infrastructure of verified governance has been built since the world went online in 1991, and continues at the average pace it has held since. A set of dials — open data, civic tech, verification at platform scale — each scored 0–1 from a named source, weighted, extrapolated. Change the assumption and the number moves: pace the same dials from Athens in 508 BC instead, and dawn lands near the year 3860. So the assumption is the lever, which is exactly why it\u0026rsquo;s published.\nAnd the clock can run backward. There\u0026rsquo;s a Watch — a standing log of what moved the year — that deliberately files the evidence against: every transparency rollback, every deliberation experiment that failed, every force pushing dawn further out. Because a sunrise that only ever gets closer is a marketing widget. The honesty is the product. If I can\u0026rsquo;t show you the thing that would move the number the wrong way, you shouldn\u0026rsquo;t believe the number.\nThe first brick Ideas this size are easy to admire and easy to never touch. So the rule I set myself is that veracracy has to cash out in things that actually run.\nThe first one shipped this week: VoteWatch — every roll-call vote in the European Parliament and the Slovak National Council, scraped from the public record, distilled into plain language, showing which party voted which way, with a button that asks you whether you\u0026rsquo;d have voted the same. It\u0026rsquo;s the third and fifth pillars made clickable: binding decisions carrying their evidence trail, and citizens as standing jurors rather than spectators. The gap it surfaces — between how parliament voted and how the people who answered would have — is veracracy in miniature, on real data, today.\nIt\u0026rsquo;s small. It\u0026rsquo;s one person\u0026rsquo;s homelab. The voting is an indicative signal, not a secured ballot, and I say so on the page. But it\u0026rsquo;s the difference between a belief and a brick, and I would rather lay one honest brick than write a beautiful manifesto.\nWhere this comes from I\u0026rsquo;ll be straight about the shape of this: it\u0026rsquo;s idealistic, it\u0026rsquo;s personal, and I don\u0026rsquo;t expect to see first light. I\u0026rsquo;m a solo operator in a small town who runs a rack of servers and thinks too much about how systems stay trustworthy when no one\u0026rsquo;s watching them. Veracracy is what happened when that instinct refused to stay inside the server room.\nThe version of this I can defend isn\u0026rsquo;t the dream — it\u0026rsquo;s the discipline around the dream. Publish your assumption. Let the clock run backward. Cash the idea out in something real. Credit your sources. Ship the honest version, not the robust-sounding one.\nA measurement with one stated assumption — not a prophecy. The clock\u0026rsquo;s at veracracy.hippotion.com; disagree with a dial and you\u0026rsquo;ve understood the point.\n","permalink":"https://blog.hippotion.com/posts/veracracy/","summary":"I built a clock that counts down to a form of government that doesn\u0026rsquo;t exist yet — legitimacy grounded in verified knowledge rather than power, wealth, or whoever shouts loudest. The only reason I\u0026rsquo;m not embarrassed to have built it: the clock can run backward, the assumption behind it is published in plain sight, and the first concrete brick already ships real parliamentary data. A measurement, not a prophecy.","title":"Veracracy: The Question We Forget to Ask When We Govern"},{"content":"The pattern I didn\u0026rsquo;t know I had This week an AI agent told me something about my own systems that I\u0026rsquo;d never noticed, and it was correct: I have one favorite architecture, and I\u0026rsquo;ve built it three times.\nAt work: git holds Terraform code → Terraform derives the S3 buckets. Nobody clicks around in the AWS console; the repo is the truth. In the homelab: git holds Kubernetes manifests → ArgoCD derives the cluster. Every app on my rack is a folder in a repo. In my second brain: a vault of markdown notes → an indexer derives the search database (SQLite FTS + a link graph) that my AI tools query. Same shape everywhere: a plain-text source of truth in git, and a machine that builds the real thing from it. Master copy, derived state. I never decided this consciously — it\u0026rsquo;s just how my hands build things now.\nGitOps isn\u0026rsquo;t the git part Here\u0026rsquo;s the thing that the third copy got wrong, and it took me embarrassingly long to see because I teach this pattern at the infrastructure layer.\n\u0026ldquo;Configuration in git\u0026rdquo; existed long before GitOps. What made GitOps an actual shift was the reconciler: ArgoCD doesn\u0026rsquo;t apply your manifests once and wish you luck. It watches, continuously. When the cluster drifts from the repo, you get an OutOfSync badge, and with selfHeal enabled it puts reality back where the repo says it should be. The loop is the product. Git is just where the loop points.\nMy vault had no loop. If I edited a note and forgot to rebuild the index, the search results my AI agents rely on were silently stale — no badge, no error, nothing. The only protection was a rule in the repo\u0026rsquo;s agent instructions: \u0026ldquo;if files and index disagree, the files win — run the indexer.\u0026rdquo;\nA policy that agents must remember. In other words: I was running Kubernetes with a sticky note on the monitor that says please redeploy after editing the YAML. I would never accept that on my cluster. My brain ran on it for months.\nThe fix took an afternoon Two pieces, both boring on purpose.\nexo status — the OutOfSync badge. The indexer now stores a content hash per note; status re-hashes the vault and diffs:\n{ \u0026#34;status\u0026#34;: \u0026#34;OutOfSync\u0026#34;, \u0026#34;modified\u0026#34;: [\u0026#34;vault/10-notes/interests-themes.md\u0026#34;], \u0026#34;new\u0026#34;: [], \u0026#34;deleted\u0026#34;: [], \u0026#34;repair\u0026#34;: \u0026#34;exo index\u0026#34; } Exit code 0 when synced, 1 when not — so scripts and CI can ask the question too, exactly like argocd app get.\nGit hooks — the selfHeal. Versioned hooks (core.hooksPath .githooks) on post-commit and post-merge rebuild the index after every commit and pull:\ncommand -v exo \u0026gt;/dev/null 2\u0026gt;\u0026amp;1 || exit 0 EXO_ROOT=\u0026#34;$(git rev-parse --show-toplevel)\u0026#34; exo index \u0026gt;/dev/null 2\u0026gt;\u0026amp;1 \u0026amp;\u0026amp; echo \u0026#34;exo: index reconciled (Synced)\u0026#34; Now every git commit in the vault prints exo: index reconciled (Synced) on its way out. The rule didn\u0026rsquo;t change — files win — but it stopped being something agents must remember and became something a machine enforces. That\u0026rsquo;s the entire difference between configuration management and GitOps, replayed at the knowledge layer.\nThe part where it gets a little strange The reason I\u0026rsquo;m writing this post at all: I didn\u0026rsquo;t have this idea. A scheduled agent did, on what I can only describe as an idle walk.\nMy vault has a weekly cron job — we call it the Wanderer — that samples pairs of notes that are far apart: different folders, different months, almost no shared vocabulary. A headless Claude gets the pairs with exactly one task: read both notes in full and say whether anything genuinely connects. \u0026ldquo;Nothing connects\u0026rdquo; is a successful run. That last sentence is load-bearing — the run always reports its result either way, so the agent never needs to manufacture a finding to have done its job.\nOn its very first walk, it collided a work note about Terraform-driven S3 provisioning with the architecture map of the vault itself, and wrote: same sentence in different clothes — and the brain copy is missing its reconciler. Then it listed the two fixes you just read about.\nRetrieval answers the questions you ask. Distant collisions surface the questions you didn\u0026rsquo;t know you had. It turns out my second brain didn\u0026rsquo;t need to get better at remembering — it needed to occasionally interrupt me.\nIf you keep a vault Whatever your stack — Obsidian, org-mode, a folder of markdown — if anything derives from your notes (an index, embeddings, a published site), then you have source of truth and derived state, and the GitOps question applies: who notices when they drift? If the answer is \u0026ldquo;I do, hopefully,\u0026rdquo; you\u0026rsquo;re running the sticky-note era. Give it a badge and a loop. It\u0026rsquo;s an afternoon.\n","permalink":"https://blog.hippotion.com/posts/gitops-for-my-brain/","summary":"An AI agent on a scheduled idle walk through my notes pointed out that I\u0026rsquo;d built the same architecture three times — at work, in my homelab, and in my second brain — and that the third copy was missing the part that makes GitOps work. It was right. So we shipped the missing piece the same day.","title":"I Run GitOps for My Brain"},{"content":"The thing I was actually building I wanted a small web page on my homelab that shows my Claude usage — the 5-hour session window, the weekly limits, the per-model split. There\u0026rsquo;s a nice Electron widget out there that does this on the desktop, but I don\u0026rsquo;t want a desktop app; I want a URL behind my own OAuth that I can glance at from my phone.\nThe mechanics are unremarkable. The claude.ai web app reads those numbers from a couple of undocumented endpoints using your logged-in session cookie. So a self-hosted version does the same thing server-side: hold the session token as a secret, replay the same calls, cache the result, render some bars. An afternoon\u0026rsquo;s work. I was pairing with Claude Fable 5 on it — Anthropic\u0026rsquo;s newest model, and the one that ships with extra safety measures around dual-use capability.\nThen, partway through, I got the message: Fable 5 flagged something in this session and switched to a more conservative model. It dropped me to Opus 4.8 for the rest of the conversation. Safe conversations sometimes trip it, the notice said. Send feedback.\nI wasn\u0026rsquo;t doing anything wrong. That\u0026rsquo;s the interesting part. My first reaction was the obvious one — what did I say? But I knew exactly what I\u0026rsquo;d built, and none of it was sketchy. It was my account, my usage data, my hardware, my OAuth in front of it.\nSo I went looking at the request the way a classifier would — not \u0026ldquo;what did he mean\u0026rdquo; but \u0026ldquo;what does this look like.\u0026rdquo; And from that angle it\u0026rsquo;s a different picture entirely. Stack up the surface features:\n🔑 capturing a session token and storing it to replay later 🌐 sending it to an undocumented API that isn\u0026rsquo;t meant for third parties 🕵️ spoofing a browser User-Agent so the request blends in 🧱 detecting and working around a Cloudflare bot challenge Read that list cold, with no context. That\u0026rsquo;s not a usage dashboard. That\u0026rsquo;s the exact signature of credential theft and scraping tooling. Every individual move is one a malicious script would also make. The only thing separating my afternoon project from the bad version is whose account it touches and why — and intent is precisely the part that doesn\u0026rsquo;t show up in the tokens.\nSurface vs. intent This is the part worth sitting with, because it\u0026rsquo;s not a Claude quirk — it\u0026rsquo;s the shape of every content classifier, every WAF rule, every fraud model I\u0026rsquo;ve ever run in production.\nA detector scores what it can see. It cannot see intent; it sees features. And the features of \u0026ldquo;monitor my own usage\u0026rdquo; and \u0026ldquo;harvest someone else\u0026rsquo;s session\u0026rdquo; overlap almost completely, because the technique is identical — the difference lives entirely in context the model has been deliberately built not to over-trust. You can\u0026rsquo;t tune that gap away. You can only pick where to sit on the precision/recall curve, and Fable 5 — being the high-capability model with the extra dual-use measures bolted on — sits where it catches the pattern even when it costs some false positives, then hands off to Opus 4.8. I was the false positive. The system did roughly the right thing for roughly the right reason; it just doesn\u0026rsquo;t feel that way when it\u0026rsquo;s pointed at you.\nThe honest engineering takeaway is the one I keep relearning: if a benign task has the silhouette of an abusive one, expect to get treated like the silhouette. Not just by AI — by rate limiters, by bot detection, by the fraud team. The fix isn\u0026rsquo;t to be offended. It\u0026rsquo;s to recognize the silhouette, and where it matters, make the legitimate context legible up front.\nWhat I\u0026rsquo;d do differently Practically, very little — the project was fine, and it downshifted to a model that finished the job. But the framing changed how I built it. I leaned harder into the parts that make intent visible in the design: the session token never leaves the server, it lives in Vault and arrives as an injected secret, the whole thing sits behind OAuth, and it polls on a leash instead of hammering. Not because a classifier made me, but because those are the same choices that make it obviously a personal dashboard and not a harvesting bot — to a reviewer, to future-me, and yes, to a model reading over my shoulder.\nThe widget rides your credential on your desktop. Mine keeps it server-side behind my own front door. Turns out building it the trustworthy way and building it the legibly trustworthy way are the same work — and getting flagged is what made me notice the difference.\n","permalink":"https://blog.hippotion.com/posts/when-claude-flagged-my-own-dashboard/","summary":"I asked Claude Fable 5 to help me self-host a dashboard for my own Claude usage. Halfway through, its dual-use safety measures flagged the conversation and downshifted me to Opus 4.8. Nothing I did was wrong — the request just had the shape of something that is. That gap, between what a thing looks like and what it\u0026rsquo;s for, turns out to be the whole story.","title":"🚩 I Built a Usage Dashboard and Tripped Claude Fable 5's Safety Net"},{"content":"A while back I applied for a senior platform role at n8n and didn\u0026rsquo;t land it. Fair enough — but \u0026ldquo;fair enough\u0026rdquo; isn\u0026rsquo;t actionable. Rejections come with no logs, no metrics, no trace. For someone who runs thirty-odd services with full observability, having vibes as the only instrumentation on my own career felt architecturally embarrassing.\nSo I built mind-the-gap: a pipeline that measures what the market demands, diffs it against what I can prove, and renders the gap as a private dashboard on my cluster. The job hunt is now a monitored system. This post is about the non-obvious decisions.\nDemand: an LLM reads job listings so I don\u0026rsquo;t have to I already had a job poller — an n8n workflow that polls the public ATS APIs (Greenhouse / Lever / Ashby) of ~33 companies plus a broad remote-jobs feed every six hours. A sibling workflow now re-fetches the same boards and, for every listing that passes the role+location gate, asks a small hosted LLM (Llama-3.1-8B) for a structured extraction:\n{\u0026#34;seniority\u0026#34;: \u0026#34;senior\u0026#34;, \u0026#34;skills\u0026#34;: [{\u0026#34;name\u0026#34;: \u0026#34;kubernetes\u0026#34;, \u0026#34;importance\u0026#34;: \u0026#34;must\u0026#34;}, ...]} One row per (job, skill) lands in an n8n Data Table. Decisions that mattered:\nOne LLM call per job, not one batch. Free-tier inference times out on batches; per-job calls are slower but fail independently. A lesson the poller already paid for. Insert doubles as the processed-marker. A job whose extraction fails to parse produces no rows — so it\u0026rsquo;s retried next run, for free. No status column, no second table. Canonicalization in code, not in the prompt. The model says \u0026ldquo;K8s\u0026rdquo;, \u0026ldquo;k3s\u0026rdquo;, \u0026ldquo;EKS\u0026rdquo; on different days regardless of instructions. A dumb alias map (k8s→kubernetes, eks→aws) beats prompt engineering for consistency. 8B is good enough — with a guard. It occasionally echoed the seniority enum back literally (\u0026quot;junior|mid|senior|staff|lead|unspecified\u0026quot;). The fix is one line of validation, not a bigger model. Supply: no artifact, no credit The other side of the diff is a skills registry — markdown in my knowledge vault, with a machine-parseable YAML block. Every skill has a state, and the rule that keeps the whole thing honest is brutal: a skill counts as proven only if an artifact exists — a public repo, a blog post, documented production experience. Otherwise it\u0026rsquo;s claimed, and claimed earns half credit.\nThat rule immediately produced the most useful insight of the project: \u0026ldquo;invisible skill\u0026rdquo; is a real category. Python turned out to be the market\u0026rsquo;s #5 ask. I use it constantly — and could point to nothing public that shows it. The cheapest score increase isn\u0026rsquo;t learning something new; it\u0026rsquo;s a weekend making an existing skill visible. No gut-feeling gap analysis would have ranked \u0026ldquo;write about what you already do\u0026rdquo; above \u0026ldquo;learn the shiny thing.\u0026rdquo;\nThe score: distinct companies, not mentions First naive aggregation: Canonical\u0026rsquo;s listings mention Ubuntu nine times, all marked must-have — suddenly Ubuntu looks like the hottest skill in Europe. Employer skew is the noise floor of small samples. The fix: demand weight = distinct companies naming the skill, not total mentions. One enthusiastic employer can\u0026rsquo;t move the radar.\nTwo more scoring rules I\u0026rsquo;d defend in review:\nSkills named by fewer than two companies don\u0026rsquo;t count at all — single-listing noise stays out. Demand the registry hasn\u0026rsquo;t classified yet shows up as \u0026ldquo;unreviewed\u0026rdquo; and counts fully against the score. An unreviewed market signal is a gap until proven otherwise; the dashboard nags me to triage it. Rendering: the page is a git commit The dashboard is a single static HTML file, and the pipeline that produces it never touches the cluster. render.js lives in this repo as the single source of truth; a nightly n8n workflow fetches it raw from GitLab, eval()s it against the Data Table rows and the registry, and — only if the result differs from what\u0026rsquo;s committed (timestamps stripped, or every night is a \u0026ldquo;change\u0026rdquo;) — PUTs the new index.html back via the GitLab API.\nServing is the same pattern as this blog: nginx plus a git-pull sidecar, deployed by Argo CD, behind the cluster\u0026rsquo;s OAuth middleware. The renderer has no kubeconfig, no SSH, no cluster access of any kind. GitLab stays the only source of truth — even for a page that rewrites itself nightly. If the workflow goes rogue, the worst it can do is a reviewable commit.\nDay-one verdict First run: 2,297 postings fetched, 25 in scope, 257 skill rows. Coverage score: 63%. Kubernetes and AWS tied at the top of demand — which means the AWS gap-closing project already in flight stopped being a hunch and became the measured top of the market. Go is the only top-ten demand with zero supply. The dashboard doesn\u0026rsquo;t get anyone a job; it just makes sure every learning Saturday is pointed where the data says, not where the hype does.\nThe job board rejected me. The data didn\u0026rsquo;t.\nWorkflows, render.js, and setup: github.com/janos-gyorgy/mind-the-gap.\n","permalink":"https://blog.hippotion.com/posts/mind-the-gap-skill-radar/","summary":"A rejection isn\u0026rsquo;t actionable data. So an n8n workflow now extracts skill demand from live job listings, diffs it against what I can prove, and renders the gap as a dashboard — deployed like everything else here: via git push.","title":"Mind the gap: I pointed monitoring at my own skill set"},{"content":"I saw a clip of an autonomous farm robot — TRIC Robotics — driving strawberry beds in total darkness, killing pathogens with UV light instead of spraying them. Zero chemicals, zero runoff. My first reaction was \u0026ldquo;that\u0026rsquo;s a marketing robot.\u0026rdquo; My second, after reading, was \u0026ldquo;no, the science is real — and the robot is the least interesting part.\u0026rdquo;\nThe interesting part is why it works at night.\nThe trick is the darkness, not the light UV-C light (254 nm) shreds the DNA of fungal pathogens like powdery mildew. Nothing new there — it\u0026rsquo;s the same wavelength that sterilises water and hospital rooms. The problem is that in daylight those pathogens repair the damage, using a light-activated enzyme (photoreactivation). Zap them at noon and they patch themselves up by evening.\nSo you do it in the dark. With the repair pathway switched off, a tiny dose sticks. Cornell\u0026rsquo;s Gadoury lab spent years on this: nighttime UV-C at doses around 85 J/m² once a week gave season-long powdery mildew control on strawberries that beat the best available fungicides. Grapes, cucumbers, roses — same story. Applied about 30 minutes after sunset, finished within a couple of hours.\nThat\u0026rsquo;s a genuinely beautiful result. Not a new chemical, not a stronger lamp — just the same old light, applied when the enemy can\u0026rsquo;t fix itself.\nWhat it is, and what it absolutely isn\u0026rsquo;t Before anyone rips out their whole garden routine: this is not a general pesticide replacement. The evidence is strong for one specific class of problem — surface fungal pathogens, mostly powdery and downy mildew on susceptible plants (strawberry, grape, cucurbits, roses). It does nothing for slugs, most insects, or anything in the soil.\nSo the honest pitch is narrow: if you fight recurring mildew every summer, this is a chemical-free tool that genuinely works. If your real enemy is aphids, don\u0026rsquo;t build this — you\u0026rsquo;d be solving the wrong problem with a dangerous toy.\nWhich brings me to the toy being dangerous.\nThe part where I tell you not to blind yourself UV-C is not mood lighting. Seconds of direct exposure burn your eyes (welder\u0026rsquo;s-flash) and skin, and it\u0026rsquo;s a long-term cancer risk. This is the single reason a home version has to be designed carefully — and the reason I\u0026rsquo;d never run an exposed source in a garden where my kids play.\nAny home rig needs, non-negotiably:\nA physical enclosure or skirt so the light only hits the bed, never a person. A hard interlock — a motion sensor or door contact that cuts power instantly if anything moves into range. A schedule that only ever runs in the dead of night, when everyone\u0026rsquo;s inside and asleep. You can also over-dose the plants — too much UV-C scorches leaves. The whole point is that the effective dose is tiny, so more is not better.\nThe build (the home version of \u0026ldquo;while you sleep\u0026rdquo;) You don\u0026rsquo;t need TRIC\u0026rsquo;s autonomous navigation. A home garden has fixed beds — so the robot problem collapses into a much simpler one: get a shielded lamp over a known bed, for a known number of seconds, at night. That\u0026rsquo;s not robotics. That\u0026rsquo;s a timer and a rail.\nHere\u0026rsquo;s the plan I\u0026rsquo;d build:\nThe lamp. A low-pressure UV-C tube (254 nm — not the \u0026ldquo;UV-C LED\u0026rdquo; novelties, and not ozone-generating 185 nm lamps). Mounted in a hooded reflector so the light points down and is blocked from the sides. The geometry. Fix it at a set height over the bed — on a simple cart that rolls a track, or just a static fixture over a raised bed. Fixed height = repeatable dose. The dose, measured not guessed. This is the one place you can\u0026rsquo;t wing it: borrow or buy a UV-C meter, measure the irradiance (W/m²) at canopy height, then time = 85 ÷ irradiance. If the lamp delivers, say, 5 W/m² at the leaves, that\u0026rsquo;s ~17 seconds of exposure. Seventeen seconds, once a week. That tiny number is the whole reason this is plant-safe and low-energy — and why a slow-moving robot pass is enough on a farm. The brain. This is the bit that\u0026rsquo;s actually in my wheelhouse: an ESP32 + a relay, on the homelab. Fires at 2 a.m. for N seconds, once a week. A PIR sensor wired as a kill-switch. A mind-the-gap-style cron and a log line to my phone when it ran. The \u0026ldquo;autonomous robot working while you sleep\u0026rdquo; headline, minus the $100k of autonomy I don\u0026rsquo;t need for four raised beds. Verdict I haven\u0026rsquo;t built this yet — it\u0026rsquo;s a someday project, parked here so I stop losing the idea. But it\u0026rsquo;s the rare someday project where the science is settled, the materials are cheap, and the only real engineering is safety and dose control, both of which are squarely the kind of problem I like.\nThe farm robot\u0026rsquo;s pitch is \u0026ldquo;pesticide-free at scale.\u0026rdquo; The home version\u0026rsquo;s pitch is smaller and more honest: if mildew is your summer tax, you can pay it in seventeen seconds of midnight light instead of a spray bottle. I\u0026rsquo;ll take that trade.\nWhen I build it, the failure log gets its own post.\n","permalink":"https://blog.hippotion.com/posts/killing-mildew-in-the-dark/","summary":"A farm robot is replacing pesticides with UV light at night. The clever part isn\u0026rsquo;t the robot — it\u0026rsquo;s the darkness. Here\u0026rsquo;s the home version, and the honest scope of what it can and can\u0026rsquo;t do.","title":"🌙 Killing Mildew in the Dark"},{"content":"The problem nobody sells a fix for My kid loves audiobooks. The commercial platforms barely carry Hungarian children\u0026rsquo;s books, and none of them carry the one narrator my kid actually prefers: me. I can\u0026rsquo;t read aloud every evening — but my homelab doesn\u0026rsquo;t have that excuse.\nThe platform half (ebook → M4B → Audiobookshelf on k3s) is a story for another post. This one is about the voice: how to go from a phone recording to an audiobook narrated in your own voice, step by step, on hardware with no GPU.\nThe short version: XTTS-v2 does zero-shot voice cloning from a ~20-second sample. No training, no fine-tuning, no dataset. One clean recording and a flag.\nWhy XTTS-v2, in 2026? It\u0026rsquo;s not the best open TTS model anymore. Chatterbox beats ElevenLabs in blind tests; F5-TTS sounds cleaner. But model selection for a small language is constraint-first, not leaderboard-first: Chatterbox has no Hungarian, NVIDIA\u0026rsquo;s TTS NIMs have no Hungarian, Kokoro — no Hungarian. XTTS-v2 speaks Hungarian and clones voices and runs on CPU. That intersection has exactly one resident.\nI run it via ebook2audiobook, which wraps XTTS with Calibre ingestion and M4B chaptering.\nStep 1 — Record ~25 seconds of yourself Phone voice-memo app, quiet room, ~20 cm from your mouth. Mine came out as 28 seconds of stereo 48 kHz AAC. Two rules that matter more than gear:\nRead the way you want the books narrated. The clone copies prosody — energy, pacing, warmth — not just timbre. A flat recital clones into a flat narrator. I read a children\u0026rsquo;s tale the way I\u0026rsquo;d read it at bedtime. Don\u0026rsquo;t peak the mic. My sample hit −0.1 dB max volume — right at the clipping ceiling. It worked, but quieter is safer. Check yours: ffmpeg -i janos.m4a -af volumedetect -f null - 2\u0026gt;\u0026amp;1 | grep volume # mean_volume: -21.4 dB ← fine # max_volume: -0.1 dB ← living dangerously Step 2 — Normalize to what XTTS wants XTTS expects a mono WAV; 24 kHz matches its internal rate. Trim the silence off both ends while you\u0026rsquo;re at it:\nffmpeg -i janos.m4a \\ -af \u0026#34;silenceremove=start_periods=1:start_threshold=-45dB:start_silence=0.2,\\ areverse,silenceremove=start_periods=1:start_threshold=-45dB:start_silence=0.2,\\ areverse\u0026#34; \\ -ar 24000 -ac 1 janos.wav (The double-areverse is the classic trick: silenceremove only trims the front, so you flip the audio, trim the front again, flip it back.)\nDrop the result where your TTS stack looks for voices. In ebook2audiobook that\u0026rsquo;s the voices/ tree, organised by language:\nvoices/hun/adult/male/janos.wav Step 3 — Synthesize One flag does the cloning. Headless run on the k3s pod:\nkubectl exec -n web-audiobooks deploy/ebook2audiobook -- sh -c \\ \u0026#39;cd /app \u0026amp;\u0026amp; python app.py --headless \\ --ebook \u0026#34;/app/ebooks/tale.txt\u0026#34; \\ --language hun \\ --tts_engine xtts \\ --device cpu \\ --voice /app/voices/hun/adult/male/janos.wav \\ --output_format m4b \\ --output_dir /app/audiobooks\u0026#39; On my 12-core CPU node this runs at roughly 3× real-time — a 2-minute tale takes ~8 minutes, a full children\u0026rsquo;s book is an overnight job. The first run computes speaker latents from your WAV; after that it\u0026rsquo;s ordinary synthesis with your voice as the reference.\nStep 4 — A/B before you batch Render one short book twice — stock narrator and cloned voice — and put both in front of the household jury. Cloning quality is personal in the most literal sense: MOS scores won\u0026rsquo;t tell you whether it sounds like you. My benchmark has strong opinions and goes to bed at eight.\nOnly after the clone passes do you re-render the library with --voice.\nThe manual steps that earn the word \u0026ldquo;manual\u0026rdquo; Things the tutorials skip, learned the slow way:\nLong conversions die with the browser tab. Gradio-style web UIs tie the job to the open page; close the laptop and you get \u0026ldquo;Conversion cancelled\u0026rdquo; half a book in. Anything longer than ~15 minutes of audio runs headless under nohup. CPU synthesis leaks memory over hours. My pod has a hard 6 Gi limit on a 16 Gi node, and a 6-hour run will hit it. Keep the cap (it protects the other 30 namespaces), and rely on the tool\u0026rsquo;s --session \u0026lt;id\u0026gt; resume — it picks up at the exact sentence. One catch: headless resume still asks an interactive Resume? [y]es — pipe echo y | into it. The per-chapter FLACs survive a crash. If the final M4B muxing step OOMs, don\u0026rsquo;t re-synthesize: the chapters are sitting in the session\u0026rsquo;s tmp directory, and ffmpeg will assemble them into a chaptered M4B with a hand-written FFMETADATA file in about two minutes, at near-zero memory. None of this is hard. It\u0026rsquo;s just undocumented — which is the gap between \u0026ldquo;there\u0026rsquo;s a model for that\u0026rdquo; and your kid pressing play.\nPostscript: the jury came back The clone failed. Recognizably my timbre, nowhere near natural — I wouldn\u0026rsquo;t play it to my kid, which is the only metric that exists for this project.\nWorth being precise about what failed: the stock XTTS-v2 narrator passed the ear test and the library keeps growing with it. Zero-shot cloning is the part that fell short — a 2023 model conditioning on 26 seconds of a voice it has never seen, in a language that was never its strong suit. The pipeline above is still the right pipeline; the model isn\u0026rsquo;t there yet on CPU-class options.\nThe next experiment is already picked: F5-TTS Hungarian, a 2026 fine-tune on 280 hours of actual Hungarian speech, built precisely for short-sample cloning. It needs CUDA, which my node doesn\u0026rsquo;t have — but a rented spot GPU tests it for the price of an espresso. If it passes the bedtime jury, that\u0026rsquo;ll be its own post.\nNegative results are results. The jury reconvenes when the GPU shows up.\n","permalink":"https://blog.hippotion.com/posts/clone-your-voice-hungarian-audiobooks/","summary":"Zero-shot voice cloning with XTTS-v2 on a CPU-only k3s node: 26 seconds of phone audio in, a cloned-voice audiobook out — and an honest verdict from the bedtime jury. Every manual step, including the ones that went wrong.","title":"🎙️ Cloning My Own Voice for My Kid's Audiobooks"},{"content":"A few weeks ago I rebuilt my second brain as a folder of markdown in git — vault is the source of truth, everything else (search index, graph, 3D viewer) is a derived layer I can delete and rebuild. I love it. But a knowledge base has a dirty secret: it rots.\nNot the files — those are fine. The connections rot. You capture a note at 11pm and never link it to anything, so it becomes an orphan floating off the graph. A project note\u0026rsquo;s one-line summary describes what the project was three weeks ago. Two notes are obviously about the same thing and neither knows the other exists. Do this for a few months and you don\u0026rsquo;t have a second brain, you have a junk drawer with good search.\nThe honest fix is to weed the garden regularly. The honest truth is that nobody does, including me.\nSo I stopped relying on myself and built a gardener.\nWhat it actually does Every night at 3am, on my homelab box, a script runs:\nDetect — exo garden, a plain query over the index, produces a report: here are the orphans, here are notes that should probably link to each other, here are summaries that look stale. No AI in this step. It\u0026rsquo;s SQL and graph traversal. Deterministic, boring, trustworthy. Decide and write — that report gets piped to claude -p (Claude Code in headless mode). Claude reads the vault\u0026rsquo;s operating contract, makes only high-confidence edits — add a [[wikilink]] between two genuinely related notes, refresh a stale summary — caps itself at ~10 notes a night, and writes a dated log note explaining exactly what it changed and what it deliberately skipped. Commit — the wrapper reindexes and lands everything as a single garden: 2026-06-09 … git commit, then pushes. My 3D graph viewer picks it up on the next sync. The first real run, it found one orphan (90-meta/README), linked it into the notes it actually indexes, and then — this is the part I liked — declined to touch the 12 \u0026ldquo;stale summary\u0026rdquo; candidates because, on inspection, every one of them was already accurate. It wrote: \u0026ldquo;flagged by length, not staleness; churning them would add noise.\u0026rdquo; A gardener that knows when not to prune is the one you can leave alone.\n\u0026ldquo;Isn\u0026rsquo;t this a solved problem?\u0026rdquo; Mostly, no — but partly, yes, and I want to be straight about it. AI-assisted note-linking exists: Obsidian plugins like Smart Connections suggest related notes, and apps like Mem and Reflect auto-organize as you write. They\u0026rsquo;re good.\nThree things make this different enough to build:\nEvery change is a reviewable git diff, authored by a named agent. Not silent magic that rearranges your notes while you\u0026rsquo;re not looking. git log -p shows you exactly what the gardener did last night; git revert undoes a bad night in one command. For something as personal as a knowledge base, \u0026ldquo;show me the diff\u0026rdquo; beats \u0026ldquo;trust me.\u0026rdquo; It\u0026rsquo;s mine, end to end. Runs on my hardware, on my schedule, with a model I point at. No SaaS holds my brain hostage. The detection is deterministic; the model only acts. The LLM never decides what\u0026rsquo;s wrong — a boring query does that. The model only decides how to fix the things already found. That split keeps the whole thing auditable and cheap. If you already live in a tool that does this and you trust it, great. I wanted the git-diff trail and the local control.\nThe part I actually want to tell you about The plan was tidy: I run n8n on the same cluster, so n8n would be the scheduler — fire nightly, SSH into the node, run the gardener. Clean, visual, one workflow.\nn8n could not reach the node. At all. Every port: ECONNREFUSED.\nThis sent me down a genuinely interesting hole, because the homelab runs Cilium for networking, and Cilium has opinions about your own node that plain Kubernetes does not.\nFirst instinct: a NetworkPolicy allowing egress to the node\u0026rsquo;s IP. Wrote it, synced it, still refused. The reason is a Cilium subtlety worth knowing: the node isn\u0026rsquo;t a CIDR, it\u0026rsquo;s an identity. Cilium classifies your cluster\u0026rsquo;s own node as the special host identity, and ordinary ipBlock CIDR rules do not match it unless you flip a cluster-wide setting (policy-cidr-match-mode: nodes). My 192.168.0.109/32 rule was a no-op.\nSo I switched to the Cilium-native tool: a CiliumNetworkPolicy with toEntities: [host]. Confirmed it applied — I could see reserved:host allowed right there in the datapath\u0026rsquo;s BPF policy map. I confirmed the node\u0026rsquo;s IP really does resolve to identity 1 (host). I confirmed the host firewall was disabled. Everything said \u0026ldquo;allowed.\u0026rdquo;\nStill ECONNREFUSED.\nThat\u0026rsquo;s the wall. The packet leaves the pod with Cilium\u0026rsquo;s blessing, hits the host\u0026rsquo;s own network stack, and something there sends a reset — and I couldn\u0026rsquo;t see what, because inspecting the host firewall needs root, and this automation deliberately doesn\u0026rsquo;t have it. I could have kept digging with a password. But I stopped and asked a better question: why am I making a pod reach back into the host it\u0026rsquo;s running on at all?\nThat\u0026rsquo;s an awkward direction. The work has to happen on the host (that\u0026rsquo;s where the vault, git creds, and Claude live). A pod straining to SSH into its own node is fighting the grain of the platform.\nSo I inverted it. The node schedules itself — a plain cron entry, rock-solid, no network gymnastics. And n8n, instead of triggering the job, receives it: at the end of each run the node POSTs a summary to an n8n webhook. Node→n8n works perfectly (it\u0026rsquo;s just an outbound HTTPS call to a URL). n8n keeps the run history and is the place I\u0026rsquo;ll later wire a phone notification.\nI lost nothing that mattered. n8n is still my dashboard; the schedule just lives where the work lives. And I deleted the SSH key and the network-policy hole I\u0026rsquo;d opened — the cleanup felt better than the original plan would have.\nThe lesson, such as it is Two, actually.\nOne: when you\u0026rsquo;re automating something to run unattended, the bug you want to find is the one that shows up in a dry run at 2pm, not at 3am three weeks from now. I almost shipped a version where a brand-new note (untracked by git) was invisible to my change-detection and would\u0026rsquo;ve been silently wiped each night. The dry run caught it. Always build the dry run.\nTwo, the bigger one: I spent an hour trying to make a pod punch into its host because that was my plan, and the platform kept saying no in increasingly specific ways. The fix wasn\u0026rsquo;t a cleverer NetworkPolicy. It was noticing I was pushing against the design and turning around. The node scheduling itself and reporting up to n8n is simpler, safer, and more honest about where the work actually lives.\nMy brain weeds itself now. Every morning there\u0026rsquo;s maybe one small, sensible commit waiting — a link I\u0026rsquo;d have never made, a summary nudged back to true — and I can read exactly what changed before my coffee\u0026rsquo;s done. That\u0026rsquo;s the whole dream of a second brain that isn\u0026rsquo;t a junk drawer: it stays a garden, and I barely have to touch it.\n","permalink":"https://blog.hippotion.com/posts/an-ai-gardener-for-your-second-brain/","summary":"I gave my markdown knowledge base a nightly gardener — an AI that finds orphan notes and missing links and fixes them, every change a reviewable git commit. The fun part was the Kubernetes wall I hit on the way.","title":"🌱 My Second Brain Weeds Itself Now"},{"content":"You don\u0026rsquo;t have to be about to change jobs to want to know the landscape. What\u0026rsquo;s being built, what it pays, where you\u0026rsquo;d actually fit — staying current on the market (and your own worth) is just good professional hygiene. The trouble is that checking is tedious, so most of us don\u0026rsquo;t, until we\u0026rsquo;re already job-hunting and starting cold.\nSo I automated mine. An n8n workflow on my homelab polls job boards every six hours, scores each new posting against my profile with an LLM, and emails me only the strong matches — the ones scoring 80%+. When it\u0026rsquo;s quiet, it\u0026rsquo;s silent. When something genuinely fits, I know the same day. Here\u0026rsquo;s what I learned building it. Repo at the bottom.\nThree APIs cover most of the market Company career pages look bespoke, but underneath, the vast majority run on one of three ATS — and all three hand you the jobs as unauthenticated JSON:\nGreenhouse — boards-api.greenhouse.io/v1/boards/{token}/jobs?content=true Lever — api.lever.co/v0/postings/{token}?mode=json Ashby — api.ashbyhq.com/posting-api/job-board/{token}?includeCompensation=true No scraping, no headless browser. You poll the API the page itself calls, normalize the three shapes into one { company, title, location, remote, url, posted_at, description, external_id }, and you\u0026rsquo;re done with the hard part.\n\u0026ldquo;Resolve the token\u0026rdquo; is half the battle The naive assumption — the token is the company name, and everyone\u0026rsquo;s on one of the three — is half right. When I probed my initial wishlist, roughly half 404\u0026rsquo;d everywhere: HashiCorp (now under IBM → Workday), SUSE (SuccessFactors), Aiven (Teamtailor), Hugging Face. They\u0026rsquo;re on a fourth or fifth system entirely. The honest move was to ship the ~33 that actually resolve and leave the rest as disabled config stubs. Verify before you trust a slug.\nDedup without a database I didn\u0026rsquo;t want to stand up Postgres just to remember which jobs I\u0026rsquo;d already seen. n8n\u0026rsquo;s Data Tables handle it natively: a seen_jobs table, an external_id namespaced {ats}:{company}:{id}, and the rowNotExists operation drops anything already recorded. State lives inside n8n, backed up with it. Zero extra infrastructure.\nThe ordering matters: notify first, mark seen second. The insert only happens after the email sends, so a failed send retries next run instead of silently swallowing a posting.\nThe location filter is a trap My first version kept everything that wasn\u0026rsquo;t explicitly US-based. The inbox filled with \u0026ldquo;Senior Platform Engineer — Spain (Remote)\u0026rdquo; and \u0026quot;… — United Kingdom (Remote)\u0026quot;. Those aren\u0026rsquo;t remote-for-me — they\u0026rsquo;re remote if you live in Spain. Useless from where I sit.\nThe fix was to invert the logic. Keep only three things:\nglobally-remote / worldwide / anywhere, pan-EU (EMEA / Europe / EU / EEA), my own country. …and drop single-country remote, even EU ones. Region and home matches win over the country deny-list, ambiguous locations are kept (a missed match is worse than one extra line to skim). That one change cut the noise more than anything else.\nLet an LLM read the actual job Keyword + location filtering gets you a candidate list, but it can\u0026rsquo;t tell a \u0026ldquo;Platform Engineer\u0026rdquo; who herds Kubernetes from a \u0026ldquo;Platform Engineer\u0026rdquo; who owns a Figma design system. The job description can.\nSo the last step scores each new posting against my CV. My first version batched all of them into one big LLM call — which promptly timed out on the free tier. The fix was the opposite: one small call per job, which also means a single slow or rate-limited job never sinks the batch. Each call asks a NVIDIA NIM model (Llama 3.1 8B, OpenAI-compatible) for one number and a reason:\nScore this job 0–100 for fit against my profile. Return {score, reason}.\nThat score is what lets me widen the net instead of narrowing it. On top of the curated company list I pull a broad remote-jobs feed (every company, all categories); the cheap keyword + location filters do the first pass, then I only email the roles scoring 80%+. Casting wide is fine when a model is the bar at the door. A line ends up looking like:\n92% — Grafana Labs — Senior Platform Engineer (Remote, EMEA) — strong k8s/GitOps overlap — link\nScoring is fail-safe: if a call hiccups, that job is just skipped, and every posting gets marked seen either way — so nothing re-scores forever, and a rare bad run never floods or stalls the inbox.\nThe unglamorous bits that make it trustworthy One bad source can\u0026rsquo;t kill the run — every fetch is wrapped; failures become a ⚠️ N sources failing footer so a company quietly changing ATS is visible, not invisible. A prime run seeds the table silently the first time, so I\u0026rsquo;m not buried under every currently-open role on day one. Everything tunable lives in one Config node — companies, keywords, location lists, the profile, the model — so adding a company is a one-line edit, not a graph safari. Takeaways The \u0026ldquo;scrape job boards\u0026rdquo; problem mostly isn\u0026rsquo;t a scraping problem — it\u0026rsquo;s three public APIs and a normalizer. For personal automation, reach for the boring-but-correct primitive: native dedup state beats a database you have to operate. An LLM works best here as the bar at the door: cheap deterministic filters keep the candidate set (and the cost) small, then the model gates on real fit — which is what lets you cast a wide net without drowning in it. Workflow JSON, the full node-by-node breakdown, and setup notes: github.com/janos-gyorgy/ats-job-poller.\n","permalink":"https://blog.hippotion.com/posts/ats-job-poller/","summary":"You don\u0026rsquo;t have to be job-hunting to want to know your market — what\u0026rsquo;s out there, what it pays, where you\u0026rsquo;d fit. So I built an n8n workflow: it polls the public ATS APIs (Greenhouse/Lever/Ashby) plus a broad remote-jobs feed, filters for remote-EU infra roles, scores each posting against my CV with an LLM, and emails me only the 80%+ matches. No database, no scraping.","title":"🎯 Know the Market Without Job-Hunting: An LLM-Scored Job Poller in n8n"},{"content":"The confession first There are, at last count, a small army of tools that list your Claude Code sessions and let you jump back into one. tmux wrappers (claude-tmux, claunch), keyword resumers (tmux-claude-code), fleet managers (claude-manager), and a whole macOS menu-bar genre (claude-control, cmux, and friends). They\u0026rsquo;re good. Several are better-engineered than mine.\nI built one more anyway.\nNot because the others are wrong — because none of them were shaped like my day, and the cost of hand-rolling a 300-line script turned out to be smaller than the cost of bending my workflow around someone else\u0026rsquo;s defaults. That\u0026rsquo;s the whole pitch, and it\u0026rsquo;s a boring one. The interesting part is what I had to understand to build it, because it corrected a mental model I\u0026rsquo;d had backwards for months.\nMy day, concretely I work off a single Linux box over SSH, from a few different machines. A session might be a homelab change, a side project, a blog post. I drop one mid-thought, my laptop sleeps, I pick it up that evening from a different terminal. The thing I kept doing was running claude --resume and squinting at a list of UUIDs trying to remember which 7f3a… was the one about the broken redirect.\nI wanted one command — wt — that shows me every session with a human summary and tells me, truthfully, which ones are still alive. Then lets me pick one.\nSimple ask. It sent me reading the on-disk format, and that\u0026rsquo;s where it got educational.\nWhat I had backwards: tmux is not how you keep a session Every tmux-first tool sells the same promise: run Claude inside tmux so your session survives a disconnect. I\u0026rsquo;d internalized that as \u0026ldquo;tmux is how Claude sessions persist.\u0026rdquo;\nThat\u0026rsquo;s wrong, and realizing it deleted half the code I thought I\u0026rsquo;d need.\nA Claude Code session is one claude process, keyed by a sessionId UUID. Its entire transcript — every message, every tool call and result — is appended to a file:\n~/.claude/projects/\u0026lt;cwd-slug\u0026gt;/\u0026lt;sessionId\u0026gt;.jsonl It\u0026rsquo;s append-only, and it has no \u0026ldquo;end\u0026rdquo; marker. When you --resume, Claude reopens that same file and replays it. One of my session files spans three calendar days across half a dozen resumes — same file, same UUID, the whole conversation reconstructed from disk each time.\nWhich means: the history is durable independent of any running process. You do not need tmux to land exactly where you left off. claude --resume \u0026lt;id\u0026gt; does that from the transcript alone, on a box with no tmux installed at all.\nSo what is tmux for, then? Exactly one thing: keeping a process running while you\u0026rsquo;re disconnected — a long job, an agent grinding away, or re-attaching the same live process from your phone. That\u0026rsquo;s real, but it\u0026rsquo;s the exception, not the default. So in my tool, plain resume is the default and tmux is an opt-in flag. The inversion fell straight out of reading the format honestly.\nThe other thing the transcript doesn\u0026rsquo;t tell you: is it alive? Here\u0026rsquo;s the subtle bit. The transcript tells you the history of a session. It does not tell you whether a claude process is running right now. There\u0026rsquo;s no \u0026ldquo;closed\u0026rdquo; record — the file for a long-dead session looks identical to one you left open thirty seconds ago.\nLiveness lives somewhere else:\n~/.claude/sessions/\u0026lt;pid\u0026gt;.json → { pid, sessionId, cwd, procStart, ... } A session is alive if that pid is actually running. But you can\u0026rsquo;t just trust the file\u0026rsquo;s existence — it can linger after a crash — and you can\u0026rsquo;t just kill -0 the pid either, because the kernel recycles pids and you might be poking a process that reused the number. So the honest check is two-factor:\ndef alive(pid, procstart): try: os.kill(pid, 0) # exists and signalable? except (ProcessLookupError, OSError): return False # ...and is it the SAME process, not a pid-recycle? stat = Path(f\u0026#34;/proc/{pid}/stat\u0026#34;).read_text() starttime = stat[stat.rindex(\u0026#34;)\u0026#34;) + 2:].split()[19] return starttime == str(procstart) That /proc/\u0026lt;pid\u0026gt;/stat start-time comparison is the difference between \u0026ldquo;I think it\u0026rsquo;s live\u0026rdquo; and \u0026ldquo;it\u0026rsquo;s live.\u0026rdquo; It\u0026rsquo;s the kind of detail you only get right by caring about the boring case.\nWith that, every session resolves to a real state:\n● live — a process is running now ⧗ waiting — no process; you left mid-conversation (last line was Claude) · idle — no process; stale And the payoff for getting liveness right: if you try to resume a session that\u0026rsquo;s still live in another terminal, the tool refuses to double-open it — two processes appending to one transcript is how you corrupt your own history — and offers a clean --fork-session instead.\nThe summaries were free the whole time The feature I assumed I\u0026rsquo;d have to build — a short, human description of each session — I didn\u0026rsquo;t build at all. Claude Code already writes one. Buried in the transcript is a record type:\n{\u0026#34;type\u0026#34;: \u0026#34;ai-title\u0026#34;, \u0026#34;aiTitle\u0026#34;: \u0026#34;Investigate nested o directories\u0026#34;, \u0026#34;sessionId\u0026#34;: \u0026#34;...\u0026#34;} Claude titles your sessions for you. The \u0026ldquo;summary\u0026rdquo; column in my tool is just that field, with a fallback to your last prompt. The best line of code is the one you delete after noticing the platform already did the work.\nSo what did I actually build Not much, and that\u0026rsquo;s the point. wt is one Python file, standard library only, no daemon. It globs the transcripts, reads each one\u0026rsquo;s title and last-activity, joins that against the pid-verified live registry, sorts live-first, and prints a numbered list. Pick a number and it execs into claude --resume. There\u0026rsquo;s a -t for tmux when I genuinely need it, a d to archive old sessions (a file move, fully reversible), and a guarded hook that turns it into an SSH login greeting so the box tells me what\u0026rsquo;s on it the moment I land.\nwatchtower · 5 session(s) 1) ● live 16s homelab 595e931d Investigate nested o directories 2) · idle 1d07h notes-app 6565b121 Migrate to server components [#]resume [t#]tmux [d#]archive [n]ew [Enter]shell [q]uit ▸ If you want it, it\u0026rsquo;s on GitHub, MIT. But honestly, I\u0026rsquo;d rather you take the three things I had to learn than the tool:\nYour Claude history lives in a plain append-only JSONL on disk, not in tmux. --resume works without any wrapper. Back up ~/.claude/projects/ and you\u0026rsquo;ve backed up every conversation you\u0026rsquo;ve had. Liveness is a separate fact from history, and checking it honestly means verifying the pid is the same process — not just that something answers to the number. The platform probably already did the boring work (here: the titles). Read the format before you write the feature. The flooded-market thing turns out not to matter. A tool that fits your own hands is worth building even when fifty others exist — especially when it\u0026rsquo;s small enough that \u0026ldquo;build\u0026rdquo; and \u0026ldquo;understand the system underneath\u0026rdquo; are the same afternoon.\n","permalink":"https://blog.hippotion.com/posts/hand-rolled-claude-session-switcher/","summary":"The web is flooded with Claude Code session managers. I built one more anyway — and the part worth sharing isn\u0026rsquo;t the tool, it\u0026rsquo;s what I had to learn about where Claude actually keeps your sessions.","title":"🪟 I Built Yet Another Claude Code Session Switcher"},{"content":"The graveyard of second brains I had a second brain once. Obsidian vault, a CouchDB LiveSync backend, even a weekly agent that summarised my notes. It worked — for a while. Then the sync started fighting itself across my laptop, the homelab, and my phone, and the day syncing becomes a chore is the day you stop opening the thing. The notes were still there. I just never looked at them again.\nThat\u0026rsquo;s how most second brains die. Not from bad notes — from the plumbing. The sync breaks, or the upkeep outpaces the payoff, or the whole thing is trapped in one app\u0026rsquo;s database and moving it feels like surgery. The knowledge was never the problem. The container was.\nSo when I rebuilt it, I started from the failure modes, not the features.\nWhat I actually wanted Three things, none of them \u0026ldquo;more notes\u0026rdquo;:\nMemory I share with my AIs. Every time I open a fresh Claude session, it starts from zero — I re-explain my homelab, my projects, what we decided last week. I wanted a place both of us read and write, so the context survives the session. Something that outlives any tool. No lock-in. If the app of the month dies, my brain shouldn\u0026rsquo;t die with it. Sync that can\u0026rsquo;t rot. The thing that killed v1. The one decision that matters The store and the intelligence are different layers, and only the store is sacred.\nThe store is a folder of plain markdown in git. That\u0026rsquo;s it. Human-readable, diffable, greppable, yours. Everything clever sits above it and is fully rebuildable:\nL5 Visualisation 3D graph, Obsidian, whatever reads markdown L4 Automation scheduled \u0026#34;gardener\u0026#34; runs L3 Agent interface MCP servers — search, graph, note CRUD L2 Index SQLite: full-text + vectors + materialised edges L1 Structure typed frontmatter + [[wikilinks]] L0 Substrate markdown files in git ← the only thing that\u0026#39;s truth Delete L1–L5 and nothing is lost — you rebuild them from L0 with one command. That property is the whole design. The index can corrupt, the embedding model can change, the viewer can break (mine did, spectacularly — that\u0026rsquo;s another post), and the knowledge doesn\u0026rsquo;t care. It\u0026rsquo;s text in git.\nAnd sync is just git pull. No LiveSync daemon to wedge itself, no proprietary replication. The exact thing that killed v1 is now the most boring, battle-tested part of the stack. Three devices, one git pull, done.\nSearch that explains itself The retrieval layer is deliberately not \u0026ldquo;throw it all at embeddings.\u0026rdquo; It fuses three signals — keyword (BM25), vector similarity, and graph expansion (pull in the neighbours of strong hits) — and every result reports which signals fired.\nexo search \u0026#34;hybrid retrieval\u0026#34; → hybrid-retrieval matched_on: [bm25, graph] That matched_on matters more than it looks. An embeddings-only system gives you a ranked list and no reason — you can\u0026rsquo;t tell a real match from a vibe. For a brain I\u0026rsquo;m supposed to trust over years, \u0026ldquo;why did this surface?\u0026rdquo; is a feature, not a nicety.\nThe AI is a librarian, not a hoarder Here\u0026rsquo;s the part I care about most. The AI doesn\u0026rsquo;t just read the brain — it writes to it. Through an MCP server it can search, walk the graph, and author notes. But under a hard rule: every write is a reviewable git diff.\nIt searches before it writes (extend a note, don\u0026rsquo;t spawn a duplicate). It links instead of piling. A scheduled \u0026ldquo;gardener\u0026rdquo; pass finds orphaned notes and stale summaries and proposes fixes — as commits I can read and git revert if it gets something wrong. No black-box mutation of my memory. Just a librarian that files things while I\u0026rsquo;m asleep and leaves a paper trail.\nSo now \u0026ldquo;what am I building?\u0026rdquo; is a question with an instant, honest answer: a single map note, kept current, that every project links into. I ask, the AI pulls it, and neither of us has to remember.\nWhy not just… Obsidian alone? It\u0026rsquo;s a lovely viewer — and I still use it as one. But it can\u0026rsquo;t give an agent structured read/write or explainable retrieval, and its sync is what burned me. Here Obsidian reads the same markdown; it\u0026rsquo;s a window, not the house. Embeddings RAG? Opaque and one-directional. It can rank, but it can\u0026rsquo;t tell you why, and it can\u0026rsquo;t write back. This is transparent and bidirectional. Notion / a SaaS brain? Lock-in by design. git clone is my backup and any text editor is my fallback. A graph database? Unnecessary infra. The graph lives in the wikilinks; SQLite just materialises it. I\u0026rsquo;ll add Neo4j the day my queries actually outgrow a single file, and not a day sooner. What it changes The vault is small still — that\u0026rsquo;s fine; it grows by use. But the loop already pays off: I work, the AI checkpoints decisions into markdown, and the next session — fresh model, no memory of its own — searches the brain and is caught up in seconds. The knowledge stopped living only in my head and in dead chat logs.\nI\u0026rsquo;m a team of one. There\u0026rsquo;s no colleague who remembers why I made a call six months ago, no handover doc someone else maintains. Continuity isn\u0026rsquo;t a nice-to-have; it\u0026rsquo;s the whole job. A second brain that the AI helps keep alive — and that I can git clone onto any machine in thirty seconds — is the first version of this idea that I actually trust to still be here in five years.\nThe notes from v1? They\u0026rsquo;re sitting in a folder, waiting to be triaged into v2. This time I\u0026rsquo;ll still be opening it.\n","permalink":"https://blog.hippotion.com/posts/a-second-brain-you-can-git-clone/","summary":"My first second brain died the way most do — on multi-device sync. The rebuild: plain markdown as the source of truth, every clever layer derived and disposable, and an AI that tends it through reviewable git diffs.","title":"🧠 A Second Brain You Can `git clone`"},{"content":"I brew kombucha If you haven\u0026rsquo;t fallen down this hole: kombucha is sweet tea fermented by a SCOBY (a rubbery pancake of yeast and bacteria) into something tart and fizzy. It\u0026rsquo;s a living hobby — the culture is alive, every batch is a little different, and the only way to get good is to pay attention and remember what you did.\nI was not remembering what I did. Brew dates lived in my head, taste notes lived nowhere, and \u0026ldquo;which jar was the ginger one again?\u0026rdquo; was a genuine question I asked myself out loud, to a fridge.\nSo I built a tracker. It\u0026rsquo;s called HipPotion — same family as everything else I run here. The brewing turned out to be the easy part. Modeling it was where it got interesting.\nWhy a simple list doesn\u0026rsquo;t fit My first instinct was \u0026ldquo;a batch is a row, log some notes.\u0026rdquo; That falls apart fast, because kombucha isn\u0026rsquo;t linear. It has two stages:\nF1 (first ferment): the big jar of sweet tea + SCOBY, fermenting sour over a week or two. One vessel, one culture. F2 (second ferment): you split that sour base into bottles and flavor each one differently — ginger in this one, blackberry in that one, hibiscus in the next — then seal them to build carbonation. So one batch becomes many bottles, each with its own flavor, its own carbonation, its own outcome. A flat \u0026ldquo;batch = row\u0026rdquo; model can\u0026rsquo;t express that. And on top of the branching, every jar and bottle produces a stream of observations over time: pH today, Brix tomorrow, \u0026ldquo;tastes too sweet still\u0026rdquo; the day after.\nThat\u0026rsquo;s three different shapes at once — a lifecycle, a one-to-many split, and a time series — for what looks from the outside like \u0026ldquo;I made some tea.\u0026rdquo;\nThe model I landed on Six tables, each earning its place:\nrecipes — the templates. Tea blend, sugar ratio, target numbers. A batch points at one. batches — an actual F1 brew, with a lifecycle (planned → active → conditioning → finished) and a reference to its recipe. fermentation_log_entries — the time series. One row per observation per batch: pH, Brix, temperature, taste/smell notes, what I did. This is where the \u0026ldquo;pay attention and remember\u0026rdquo; lives. f2_variant_batches — the branch. Each is a flavored bottle split off a parent batch, tracked on its own. starter_log — SCOBY lineage. Cultures have parents; you grow new ones from old ones, and a sick culture ruins a batch, so the lineage matters. botanical_infusions — the flavoring ingredients, managed per recipe. The shape that took the longest to get right was the F1 → F2 split: a variant has to belong to its parent batch but live its own life. Once that relationship was clean, the whole thing clicked — the app finally matched how brewing actually works instead of how it\u0026rsquo;s easy to store.\nThe stack (and where it runs) Nothing exotic: React + Vite + TypeScript on the front (TanStack Query, shadcn/ui, Tailwind), a Hono + Drizzle ORM API on the back, PostgreSQL underneath. Built with AI coding tools — I leaned on them hard for the React/shadcn front-end, less so for the schema, which I argued out by hand because it\u0026rsquo;s the part that had to be right.\nIt runs on my k3s homelab like everything else: a Helm chart deploys the nginx frontend, the Hono API, and a Postgres StatefulSet, all reconciled by Argo CD from Git. Default-deny networking, secrets out of Git — the usual platform defaults. It\u0026rsquo;s a hobby app, but it gets treated like a real one, because the platform doesn\u0026rsquo;t know the difference and I don\u0026rsquo;t want it to.\nIt became an API for something else The unexpected payoff: because the data model was clean and the API was just a set of plain REST endpoints, it made a perfect target for an experiment. I later pointed an AI agent at it from n8n — \u0026ldquo;what\u0026rsquo;s fermenting right now?\u0026rdquo;, \u0026ldquo;log that this batch tastes tart\u0026rdquo; — and the agent just called the same endpoints the UI does. A good schema is reusable in ways you don\u0026rsquo;t plan for. The kombucha tracker quietly became a little knowledge base I can talk to.\nHonest notes This is a personal hobby app for an audience of one (me). It\u0026rsquo;s AI-assisted, it has no tests, and the UI has rough edges. I\u0026rsquo;m not pretending it\u0026rsquo;s a product.\nBut the thing I keep coming back to: the hard, valuable part wasn\u0026rsquo;t the framework or the deployment — it was sitting with a messy real-world process long enough to find the shape of it. The branching ferment, the time series, the lineage. Get the model honest and the rest is just typing. Get it wrong and no amount of nice UI saves you.\nAlso, the kombucha\u0026rsquo;s been better since I started writing things down. Turns out the fridge wasn\u0026rsquo;t a great database.\n","permalink":"https://blog.hippotion.com/posts/kombucha-tracker/","summary":"Brewing kombucha looks simple until you try to model it: one batch splits into many flavored bottles, every jar generates a stream of pH and taste readings, and a SCOBY has a lineage. Here\u0026rsquo;s the little app I built to keep track — and why the schema, not the code, was the real work.","title":"🫙 I Built a Tracker for My Kombucha. The Data Model Was the Hard Part."},{"content":"The question \u0026ldquo;You\u0026rsquo;re running n8n for multiple customers on the same Kubernetes cluster. What stops Customer A from reading Customer B\u0026rsquo;s API keys, calling Customer B\u0026rsquo;s services, or starving Customer B\u0026rsquo;s workflows by burning the whole node?\u0026rdquo;\nThree different walls, three different mechanisms. Most articles I\u0026rsquo;ve read on K8s multi-tenancy list the primitives — namespaces, NetworkPolicies, ResourceQuotas, RBAC — without showing what each one actually catches when you try to cross it. This post does the second part. The receipts are the point.\nThe setup: two namespaces, web-tenant-acme and web-tenant-globex, each running their own n8n instance on the same node. The only thing keeping them apart is the walls we build around each namespace.\nThe mental model: subtractive isolation Kubernetes is a flat network with shared everything by default. You don\u0026rsquo;t add isolation by writing allow rules. You subtract trust by adding default-deny rules, and then carefully allow back only the connections each tenant actually needs.\nA tenant doesn\u0026rsquo;t have access to another tenant because there is no rule allowing it. The absence of an allow rule is the wall.\nThree of these absences make up the picture:\nWall Primitive Failure mode when crossed Network Cilium NetworkPolicy, default-deny egress Connection times out (silent drop) Secret Vault Kubernetes-auth, per-tenant policy 403 permission denied from Vault itself Resource ResourceQuota + LimitRange Pod rejected at admission time Different layers, different error messages. That\u0026rsquo;s how you can tell what stopped you.\nWall 1 — Network: Cilium NetworkPolicy n8n in web-tenant-acme can reach whoami.web-tenant-acme.svc.cluster.local (its own service in its own namespace) but not whoami.web-tenant-globex.svc.cluster.local. The same DNS shape, the same cluster, the same node. One succeeds, the other hangs.\nThe primitive is a default-deny egress policy applied to every pod in the namespace, with two narrow exceptions: intra-namespace traffic (so n8n can still reach its own service) and DNS to kube-system (otherwise nothing resolves anything).\n# Effective policy on every pod in web-tenant-acme: spec: podSelector: {} policyTypes: [Egress, Ingress] egress: - to: # intra-namespace traffic OK - podSelector: {} - to: # DNS to kube-dns OK - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system ports: [{port: 53, protocol: UDP}] There is no rule for web-tenant-globex. Cilium\u0026rsquo;s eBPF datapath drops the SYN packet on the way out.\nThe receipt — an n8n HTTP node configured to GET http://whoami.web-tenant-globex.svc.cluster.local/. It hangs for the full timeout, then errors with AxiosError: timeout of 5000ms exceeded / code: ECONNABORTED.\nThe interesting bit: DNS still works. kube-dns is allowed, so the cross-namespace Service still resolves. The TCP handshake is what gets dropped. That\u0026rsquo;s a useful signal in real incident response — \u0026ldquo;DNS resolves but the connection hangs\u0026rdquo; almost always means a NetworkPolicy is the cause.\nWall 2 — Secret: Vault Kubernetes-auth + ESO Now imagine Acme\u0026rsquo;s n8n misbehaves: somebody pushes a workflow that tries to read Globex\u0026rsquo;s API keys via an ExternalSecret. The network isn\u0026rsquo;t the issue — both tenants need to reach Vault, so they both have an egress rule for sys-vault. The wall has to be at the identity layer.\nEach tenant gets three things:\nA dedicated ServiceAccount (n8n-acme, n8n-globex). A Vault Kubernetes-auth role bound to that SA in that namespace, mapped to a Vault policy that grants read on only its own KV path. A namespaced External Secrets SecretStore that authenticates as the SA via the Kubernetes TokenRequest API. # Vault policy: tenant-acme can read its own secrets, nothing else. path \u0026#34;secret/data/web-tenant-acme\u0026#34; { capabilities = [\u0026#34;read\u0026#34;] } path \u0026#34;secret/metadata/web-tenant-acme\u0026#34; { capabilities = [\u0026#34;read\u0026#34;] } vault write auth/kubernetes/role/tenant-acme \\ bound_service_account_names=n8n-acme \\ bound_service_account_namespaces=web-tenant-acme \\ policies=tenant-acme \\ ttl=1h When Acme\u0026rsquo;s n8n tries an ExternalSecret pointing at secret/web-tenant-globex/..., ESO authenticates fine (the SA is valid), Vault recognises the caller, looks up the tenant-acme policy, and answers with the most satisfying line in this whole demo:\nURL: GET http://sys-vault.sys-vault.svc.cluster.local:8200/v1/secret/data/web-tenant-globex Code: 403. Errors: * permission denied This is the bit that separates \u0026ldquo;namespace isolation\u0026rdquo; from real multi-tenant secret isolation. Plain Kubernetes Secrets + RBAC stop a tenant from listing another tenant\u0026rsquo;s Secret objects, but the moment you go upstream — to Vault, to a cloud KMS, to an SSM Parameter Store — the secret store needs to enforce identity itself. The network said yes; the secret store still says no.\nWall 3 — Resource: ResourceQuota + LimitRange The third concern is the noisy neighbour: Acme\u0026rsquo;s runaway workflow allocating a 4Gi pod and OOM-killing everything else on the node. The network policy doesn\u0026rsquo;t catch this (no network call), and Vault doesn\u0026rsquo;t catch this (no secret request). The kernel will, eventually — but you don\u0026rsquo;t want eventually. You want admission-time rejection.\nTwo primitives:\napiVersion: v1 kind: ResourceQuota metadata: { name: tenant-quota, namespace: web-tenant-acme } spec: hard: requests.cpu: \u0026#34;1\u0026#34; requests.memory: 1Gi limits.cpu: \u0026#34;2\u0026#34; limits.memory: 2Gi pods: \u0026#34;10\u0026#34; --- apiVersion: v1 kind: LimitRange metadata: { name: tenant-limits, namespace: web-tenant-acme } spec: limits: - type: Container default: { cpu: 500m, memory: 512Mi } defaultRequest: { cpu: 50m, memory: 128Mi } max: { cpu: \u0026#34;2\u0026#34;, memory: 1Gi } ResourceQuota caps the namespace total. LimitRange bounds any individual container and supplies defaults so pods that don\u0026rsquo;t declare requests/limits still get reasonable ones — important because a missing limit on a single container can blow past the quota in one allocation.\nThe receipt — a server-side dry-run of a single 4Gi pod, which never gets created:\n$ kubectl apply -n web-tenant-acme --dry-run=server -f noisy-neighbor.yaml Error from server (Forbidden): error when creating \u0026#34;STDIN\u0026#34;: pods \u0026#34;noisy-neighbor\u0026#34; is forbidden: maximum memory usage per Container is 1Gi, but limit is 4Gi Not a kernel OOMKill. Not a pod stuck in Pending. A flat refusal from the API server before the scheduler even sees the request.\nWhat this does not prove A homelab demo on one node with two synthetic tenants is not n8n Cloud. The honest gaps:\nExecution sandboxing. A workflow can still run arbitrary code via the Code node or shell-outs. These walls stop infrastructure leakage; they don\u0026rsquo;t sandbox what n8n itself executes. Real n8n Cloud needs more than namespace walls for that — gVisor / Firecracker / per-tenant worker pools are the usual answers, and n8n\u0026rsquo;s queue mode lends itself to the last. Pooled worker queues. Queue mode runs main/webhook/worker as separate deployments backed by Redis + Postgres. Two tenants sharing a worker pool need additional checks at the job-routing layer to keep workflows from accessing the wrong tenant\u0026rsquo;s binary data. Out of scope for the homelab demo. Control plane. Both tenants reach the same API server. A cluster-admin-equivalent compromise breaks everything. This is the assumption every shared K8s setup makes. Node-level. Same kernel. Container escape, CPU side channels, the usual list — all apply. For paranoid tenants the answer is dedicated nodes via taints/tolerations or separate clusters entirely. The demo proves the namespace-shaped walls hold. It does not prove the whole stack is safe against a determined attacker already running code inside a tenant. That\u0026rsquo;s a different post.\nPart of a Kubernetes-on-the-homelab series — previously: preventing a compromised pod from calling your database, GitOps secrets.\n","permalink":"https://blog.hippotion.com/posts/n8n-multitenant/","summary":"Multi-tenant isolation is easy to assert and hard to verify. Three walls — network, secret, resource — and the actual 403s, timeouts, and admission rejections that prove each one holds.","title":"🧱 How Do You Isolate Two n8n Tenants on Kubernetes — and Prove Each Wall Holds?"},{"content":"Effective date: 2026-05-26\nVia Stoica: A Year of Stoic Practice (\u0026ldquo;the app\u0026rdquo;) is published by Hippotion.\nData collection The app does not collect, transmit, or share any personal data. No analytics, no crash reporting, no advertising SDKs.\nData stored on your device The app stores two values locally on your device using Android\u0026rsquo;s SharedPreferences:\nWhether you have unlocked the Pro tier Your current reading position in the Pro sequence This data never leaves your device.\nIn-app purchases Purchases are processed by Google Play. The app receives only a confirmation of the purchase status. No payment information is handled by the app.\nContact Questions: gyorgy.jani@gmail.com\n","permalink":"https://blog.hippotion.com/posts/via-stoica-privacy/","summary":"Privacy policy for the Via Stoica Android app.","title":"Via Stoica: Privacy Policy"},{"content":"The question I run n8n on my k3s homelab. Not docker-compose on a NUC — the full treatment: GitOps-reconciled, Vault-backed secrets, default-deny networking. The same boring platform everything else here runs on.\nBut \u0026ldquo;I have n8n running\u0026rdquo; proves nothing. I wanted to know if I actually understood it as an agent platform, and to answer a question I kept dodging: for agent work, do I need a cloud model, or is my local one good enough?\nSo I built a real agent and gave it two brains.\nWhat I built A chat assistant over brew-buddy, my homemade kombucha-tracking app (React + a small API + Postgres). You ask it things in plain language; it calls the app\u0026rsquo;s API and answers. The twist: the same question runs through two agents in parallel — one backed by NVIDIA\u0026rsquo;s hosted Llama-3.3-70B, one by a local Phi-3.5-mini on CPU — and the workflow prints both answers side by side.\nChat ──▶ Agent (cloud: NVIDIA 70B) ──┐ tools (shared): └─▶ Agent (local: Phi-3.5) ──┤ • get_all_batches │ • get_batch_detail │ • brewing_statistics (Merge) ──▶ both replies, labeled • add_batch_log ⟵ write • create_batch ⟵ write Both agents share the same read tools. The two write tools are wired to the cloud agent only — more on that below.\nThe nice part: I didn\u0026rsquo;t write a line of glue. n8n\u0026rsquo;s stock OpenAI Chat Model node talks to anything OpenAI-compatible if you override the credential\u0026rsquo;s Base URL — so one node points at https://integrate.api.nvidia.com/v1, the other at http://llama-server.\u0026lt;ns\u0026gt;.svc:8080/v1 for the local server. Same node, two endpoints.\nThe infra that keeps it honest I won\u0026rsquo;t re-explain the platform here — it\u0026rsquo;s in earlier posts: GitOps, Vault-backed secrets, default-deny networking, dual-path TLS ingress. But building the agent made one of them tangible.\nn8n is, by design, a thing that makes arbitrary HTTP calls on a schedule. That\u0026rsquo;s exactly what you want behind a default-deny network policy. n8n couldn\u0026rsquo;t reach the brew-buddy API at all until I declared it — one line:\n# n8n\u0026#39;s namespace allowEgressToNamespaces: [web-ai-engine, web-brew-buddy] # ^ added this for the agent (plus a matching ingress-allow on brew-buddy\u0026rsquo;s side). That\u0026rsquo;s the posture working as intended: the blast radius of a workflow tool is whatever I\u0026rsquo;ve explicitly granted, and not one namespace more. Adding a capability is a reviewable one-liner in Git; Argo reconciles it. No kubectl, no guessing what n8n can reach.\nThe A/B: same agent, same tools, two brains Plain \u0026ldquo;hi\u0026rdquo;. Cloud answers in ~0.5s. Local takes noticeably longer — because even for \u0026ldquo;hi\u0026rdquo;, the agent feeds the model the full system prompt plus the JSON schemas for every tool, and Phi-3.5 has to chew through all of it on CPU before it can say a word. So far, the boring expected result: local is slower.\nThen I asked a real question, and the result flipped in a way I didn\u0026rsquo;t expect.\n\u0026ldquo;What batches do I have?\u0026rdquo;\nCloud (70B) called get_all_batches, got the real rows, and answered:\nYou have two batches: 2026-04-09-A (cold-crash, 3L) and 2026-04-09-W (cold-crash, 3L).\nLocal (Phi-3.5) never called the tool. It didn\u0026rsquo;t seem to realise it had tools. Instead it confidently explained how I could go find the data myself:\nTo list all batches: 1. Access the brew-buddy app. 2. Look for a button labeled \u0026ldquo;List Batches\u0026rdquo;… def get_all_batches(): … … Remember, I\u0026rsquo;m unable to directly interact with apps or databases.\nFake instructions. Fake code. A polite apology. Everything except the actual answer it was sitting on top of.\nWriting data. I asked both to log an observation. Cloud called add_batch_log and wrote a real row to Postgres (\u0026ldquo;I have recorded the observation…\u0026rdquo;). Local bluffed again — \u0026ldquo;here\u0026rsquo;s how you can log it yourself.\u0026rdquo;\nWhy it matters: capability, not latency The interesting finding isn\u0026rsquo;t \u0026ldquo;the big model is better.\u0026rdquo; It\u0026rsquo;s how the small one fails.\nWith a ~3.8B model on CPU, the bottleneck for agent work isn\u0026rsquo;t speed — it\u0026rsquo;s capability. Phi-3.5 couldn\u0026rsquo;t reliably emit tool calls, so n8n\u0026rsquo;s tools never fired, and the model degraded into a chatbot that hallucinates a plausible answer instead of fetching the real one. That failure mode is worse than an error: an error you catch, a confident wrong answer you ship.\nA couple of measurements that sharpened it:\nNVIDIA 70B, plain chat: ~0.5s. NVIDIA 70B, function-calling (with tool schemas): ~8.6s per round-trip — and an agent makes several round-trips per answer. That\u0026rsquo;s real latency you have to budget a timeout for. (It\u0026rsquo;s also why the cloud side initially timed out in n8n until I raised the model node\u0026rsquo;s timeout — the model was fine, n8n was cutting it off.) So the snappy-vs-slow comparison flips depending on whether the question triggers tools. Plain chat: cloud wins on speed. Tool use: the local model is \u0026ldquo;fast\u0026rdquo; only because it skips the tools and makes something up. Speed was never the real axis.\nThe honest caveat: this is this small general model in a multi-tool agent loop. Purpose-built small models with tool-calling fine-tunes do better at narrow tasks — I run a 1.7B one elsewhere that emits a single structured tool call just fine. But for \u0026ldquo;pick the right tool from several and chain them,\u0026rdquo; 70B was in a different league.\nThe trust boundary I gave the write tools (add_batch_log, create_batch) to the cloud agent only. The local agent is read-only — not by instruction, by wiring. Even if Phi-3.5 did decide to call a write tool, the connection isn\u0026rsquo;t there. The reliable model is the only one allowed to mutate real data, and that\u0026rsquo;s enforced structurally, not by trusting a prompt.\nWhat\u0026rsquo;s toy and what\u0026rsquo;s real Worth being straight: this is a single-node homelab. The agent and both model paths share one box. Running n8n on Kubernetes and swapping models isn\u0026rsquo;t novel — n8n\u0026rsquo;s own docs cover queue mode, where a main instance fans work out to a pool of worker pods you scale horizontally, with external Postgres for state. That\u0026rsquo;s the real production shape. Mine is one replica with an emptyDir\u0026rsquo;s worth of ambition.\nWhat I think is worth sharing is the finding (the capability cliff, and that its failure mode is confident fabrication) and the boring thing underneath it: because the platform is default-deny and GitOps-reconciled, running this experiment cost me one reviewable egress line and zero risk to anything else.\nThe boring part is the point The AI was the fun bit. But the reason I could bolt an agent onto a live cluster, point it at a real app, give it write access to one model and not the other, and tear it all down again — without worrying what it might touch — is that the infrastructure was already boring. Default-deny. Secrets out of Git. git push, Argo reconciles.\nThe model picks the tools. The platform decides what the tools can reach. Keep those two honest about each other and self-hosting an agent stops being scary and starts being just another app.\n","permalink":"https://blog.hippotion.com/posts/n8n-agent-cloud-vs-local/","summary":"I built an AI agent in self-hosted n8n over my kombucha-tracking app, then gave it two brains — NVIDIA\u0026rsquo;s 70B and a local Phi-3.5 — sharing the same tools. The cloud model called the tools and answered from real data. The local one couldn\u0026rsquo;t, so it made things up.","title":"🍵 I A/B-Tested Cloud vs Local LLMs in One n8n Agent. The Local One Faked It."},{"content":"\nSome names are chosen. This one grew.\nI came up with hippotion when I was building a kombucha brand — a real one, with labels and bottles and a business plan. It\u0026rsquo;s dormant now. Maybe it becomes something again after I retire. But the name outlasted the business plan, which is usually a sign that the name was the real thing all along.\nThe English layer Hip potion. A trendy brew. Something you\u0026rsquo;d see on a craft label at a farmers market with a logo that takes itself slightly too seriously.\nThat was the joke. Kombucha is, objectively, a hip potion — fermented, alive, slightly weird, loved by people who care too much about gut health and not enough about explaining themselves. The name didn\u0026rsquo;t need explaining. That was the point.\nThe Hungarian layer Here\u0026rsquo;s where it gets layered.\nIn Hungarian, the word doesn\u0026rsquo;t read as \u0026ldquo;hip\u0026rdquo; + \u0026ldquo;potion.\u0026rdquo; It reads as two animals.\nHippó — hippopotamus. That\u0026rsquo;s the first half, and it\u0026rsquo;s obvious.\nTion — to a Hungarian ear, this sounds like sün. Hedgehog. (The ü is a u with dots — \u0026ldquo;sewn\u0026rdquo; but softer. If you\u0026rsquo;ve never heard it, imagine a shy vowel that lives in the middle of a forest and avoids eye contact.)\nSo: hippotion = hippó + sün = hippo + hedgehog.\nThis was not planned on a whiteboard. It arrived. And when it did, the brand suddenly had two mascots — two animals that shouldn\u0026rsquo;t make sense together, but somehow do.\nThe totem logic A hippo is large, slow, unexpectedly dangerous, and deeply underestimated. It doesn\u0026rsquo;t perform strength. It just has it.\nA hedgehog is small, quiet, armed with spines it never has to explain, and entirely content with its own company. It doesn\u0026rsquo;t need to win. It just needs to not be eaten.\nThey protect each other. The hippo protects the sün. The sün protects the hippo. Not because they\u0026rsquo;re the same — because they\u0026rsquo;re different in complementary ways.\nThese animals also have meaning in my personal life that I won\u0026rsquo;t explain here. Some things are allowed to be yours.\nThe creed The site at hippotion.com has one page. No nav, no portfolio, no \u0026ldquo;hire me\u0026rdquo; section. Just this:\nModern luxury is the ability to think clearly, sleep deeply, move slowly,\nand live quietly in a world designed to prevent all four.\nThat\u0026rsquo;s the operating principle. Everything built under the hippotion name — the kombucha, the software, the blog — is an attempt to live closer to that line.\nThe easter egg I never wanted to explain this publicly. The name was designed to work on the surface — hip potion, fine, move on — and reward the people curious enough to sit with it.\nA Hungarian reader might catch it. A kombucha person might catch the drink angle. Someone who looks at the design and notices the hedgehog-and-hippo motif recurring across different projects might wonder. That wondering is the point.\nIf you found this post because you were curious about the name: that\u0026rsquo;s exactly who this was for.\n","permalink":"https://blog.hippotion.com/posts/what-is-hippotion/","summary":"A name that works in two languages, hides two animals, and started as a kombucha label.","title":"🦛 What Is Hippotion"},{"content":"The problem everyone hits You\u0026rsquo;ve got a Kubernetes cluster. Now you need to describe what should run in it. You write some YAML, apply it, it works.\nThen you need a second environment. Or a second service. Or someone else joins the project and asks \u0026ldquo;how do I add an app to this?\u0026rdquo; and you don\u0026rsquo;t have a good answer.\nThis is the manifest management problem, and there are five common solutions — ranging from \u0026ldquo;this works until it doesn\u0026rsquo;t\u0026rdquo; to \u0026ldquo;this is what production platforms actually look like.\u0026rdquo;\nApproach 1: Raw manifests The starting point for almost everyone. Write a YAML file, kubectl apply -f, done.\napiVersion: apps/v1 kind: Deployment metadata: name: myapp namespace: myapp spec: replicas: 1 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myapp:v1.2.3 Where it works: one service, one environment, learning Kubernetes. The feedback loop is immediate — write YAML, see what happens.\nWhere it breaks:\nNo templating. Want to change the image tag across ten services? Ten files, ten edits, ten chances to get it wrong. Live state leaks in. If you export existing resources with kubectl get -o yaml, you get resourceVersion, generation, creationTimestamp, and managedFields in the output. Commit that to Git and you\u0026rsquo;ve created a permanent source of conflicts — ArgoCD compares what\u0026rsquo;s in Git against what\u0026rsquo;s in the cluster, sees stale version counters, and the diff never clears. Copy-paste hell. A Deployment, a Service, an IngressRoute, a ServiceAccount, a NetworkPolicy — five files per app. Add a new app, copy five files, change the names, forget to update one. This is how environments drift apart silently. The fix for the live-state problem is: only commit desired state. Strip every field that Kubernetes manages internally back to its clean spec. It\u0026rsquo;s tedious and easy to forget, which is exactly why people move on from raw manifests.\nApproach 2: Kustomize Kustomize is built into kubectl (kubectl apply -k) and natively supported by ArgoCD. The idea: you have a base/ with your raw manifests, and overlays that patch on top of them for different environments.\napp/ ├── base/ │ ├── deployment.yaml │ ├── service.yaml │ └── kustomization.yaml └── overlays/ ├── staging/ │ ├── kustomization.yaml # patches replicas to 1, image to :staging └── production/ └── kustomization.yaml # patches replicas to 3, image to :v1.2.3 # overlays/production/kustomization.yaml resources: - ../../base patches: - patch: |- - op: replace path: /spec/replicas value: 3 target: kind: Deployment Where it works: multi-environment setups where the difference between environments is mostly configuration values, not structure. Kustomize is good at this — you write the base once and patch only what differs.\nWhere it breaks:\nNo real parameterization. Kustomize patches are surgical edits, not templates. If your base structure needs to vary (different resource shapes per environment, conditional blocks), you\u0026rsquo;re fighting the tool. Patching deep structures is ugly. JSON patches on nested YAML are verbose and hard to read. You end up writing more patch YAML than it would take to just copy the file. Still repetitive across apps. Each app still gets its own base directory. You\u0026rsquo;re not abstracting the shared patterns across apps, only the differences between environments of the same app. Kustomize is a significant step up from raw manifests for multi-environment setups. For complex templating or platform-level abstractions, it runs out of power quickly.\nApproach 3: Helm Helm adds real templating. Charts are parameterized bundles — templates with variables, conditionals, and loops — and values files supply the parameters.\n# templates/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: {{ .Values.name }} namespace: {{ .Release.Namespace }} spec: replicas: {{ .Values.replicas | default 1 }} template: spec: containers: - name: {{ .Values.name }} image: {{ .Values.image.repository }}:{{ .Values.image.tag }} {{- if .Values.resources }} resources: {{ .Values.resources | toYaml | nindent 12 }} {{- end }} # values-production.yaml name: myapp replicas: 3 image: repository: myorg/myapp tag: v1.2.3 Helm renders the templates at deploy time. What lands in the cluster is clean rendered YAML — no internal state, no conflicts.\nWhere it works: almost everywhere. The Helm Hub has charts for most common software already. For custom apps, writing a chart once and parameterizing per-environment is straightforwardly better than copying YAML.\nWhere it breaks:\nChart authoring is verbose. Writing a Helm chart from scratch involves a lot of Go templating boilerplate. For a simple app, it can feel like more scaffolding than application. Debugging rendered output is annoying. helm template is your friend, but errors in templates produce unhelpful messages. The indentation rules (nindent, indent, toYaml) have sharp edges. Values files still pile up. If every app has its own values file and there\u0026rsquo;s no shared structure between them, you\u0026rsquo;re back to copy-paste but now in YAML-that-configures-YAML. Helm is the right tool for most Kubernetes deployments. The ecosystem support alone (upstream charts for Postgres, Redis, Vault, every CNCF project) makes it the pragmatic default.\nApproach 4: Jsonnet / CUE For teams that need programmatic config generation — actual code, not templates — Jsonnet and CUE are the serious alternatives.\n// deployment.jsonnet local k = import \u0026#34;k.libsonnet\u0026#34;; local deployment(name, image, replicas=1) = k.apps.v1.deployment.new(name, replicas, [ k.core.v1.container.new(name, image) ]); { \u0026#34;deployment.yaml\u0026#34;: deployment(\u0026#34;myapp\u0026#34;, \u0026#34;myorg/myapp:v1.2.3\u0026#34;, replicas=3) } Where it works: large platforms where configuration is genuinely complex — many environments, many apps, deep interdependencies. Jsonnet lets you write real functions, share libraries, compose abstractions properly.\nWhere it breaks:\nSteep learning curve. Jsonnet is a full language. CUE even more so — it has types, schemas, and a constraint system that takes time to internalise. Small community. Excellent tooling, but you\u0026rsquo;re solving problems that have fewer Stack Overflow answers. Overkill for most setups. If you\u0026rsquo;re not managing hundreds of services across multiple clusters, Helm is simpler and has everything you need. Jsonnet is used seriously at Google-scale infrastructure teams and in some CNCF projects. For a homelab or a small-to-medium platform, it\u0026rsquo;s the right answer to a question you probably aren\u0026rsquo;t asking yet.\nApproach 5: App-of-apps with generated Application CRDs This is the ArgoCD-native meta-layer. Instead of managing manifests, you manage Application resources — and potentially use a chart or tool to generate those too.\nA naive version: commit a folder of Application YAML files to Git, one per service. ArgoCD watches the folder and deploys each app.\nA more sophisticated version: one \u0026ldquo;root app\u0026rdquo; that points to a chart, which generates all the other Application resources dynamically from a single config file.\nWhere it works: at the platform level, not the individual app level. App-of-apps is how you manage what ArgoCD manages, not how you write the service manifests themselves. Combined with Helm, it gives you centralized control over the entire cluster\u0026rsquo;s structure.\nWhere it breaks:\nManual Application CRDs are painful. If you\u0026rsquo;re maintaining a folder of hand-written Application YAML files — one per service — you\u0026rsquo;ve traded manifest copy-paste for Application copy-paste. Each app needs its own CRD with its repo URL, path, sync policy, project reference. Sync ordering matters. The root app must exist before children can sync. Get the wave ordering wrong and apps try to deploy before their namespaces exist. How this homelab compares My setup sits at the far end of approach 5, using Helm throughout.\nThere\u0026rsquo;s a single applications.yml file that describes every service in the cluster. A root Helm chart reads it and generates all the ArgoCD Application and AppProject CRDs automatically. Adding a service means adding an entry to that file — not touching five different places across five different files.\n# applications.yml — this is the entire service catalog - namespace: web-vaultwarden networkPolicies: profile: web-app applications: - applicationCode: web-vaultwarden path: helm-charts/extra-objects autoSync: true That one entry generates: a Namespace, an ArgoCD AppProject, an ArgoCD Application, a set of Cilium NetworkPolicies (deny-all with ingress from Traefik and DNS/HTTPS egress), and a ServiceAccount. Nothing is written by hand.\nThe actual service manifests live in an extra-objects chart — a thin wrapper that renders raw YAML from values files. No templating in the service manifests themselves (they\u0026rsquo;re simple enough not to need it), but the infrastructure scaffolding around each app is entirely generated.\nThe result: every service gets the same operational properties. Same GitOps workflow, same secret management, same network isolation, same TLS termination. The platform work was done once. Adding a new app is writing manifests for the app\u0026rsquo;s specific behavior, not recreating the scaffolding.\nThe honest spectrum Approach Templating Abstraction Ecosystem Complexity Raw manifests None None None Low Kustomize Patches only Overlays Medium Low-medium Helm Full Per-chart Large Medium Jsonnet/CUE Full + typed Libraries Small High App-of-apps Depends Platform-level ArgoCD-native High Most setups should start at Helm. Kustomize if you\u0026rsquo;re multi-environment and comfortable with patching. App-of-apps when you\u0026rsquo;re managing the platform layer, not individual services. Jsonnet/CUE when you know you\u0026rsquo;ve outgrown Helm — which is a specific and relatively rare problem to have.\nRaw manifests are fine for learning. They\u0026rsquo;re the wrong answer for anything you intend to maintain.\nMore on how the homelab is structured: My Homelab Runs on GitOps.\n","permalink":"https://blog.hippotion.com/posts/gitops-manifest-approaches/","summary":"Raw YAML, Kustomize, Helm, Jsonnet — there\u0026rsquo;s more than one way to describe what you want running in a cluster. Here\u0026rsquo;s what each actually looks like in practice and where each one breaks.","title":"📦 Five Ways to Manage Kubernetes Manifests (and Why They're Not All Equal)"},{"content":"The problem with cloud LLM access Running a local model is great for privacy. But local models hit a ceiling — for the heavy lifting, you want a cloud API like NVIDIA NIM with Llama 3.3 70B.\nThe moment you open that channel, you have a new risk: what if someone (or some automation) accidentally pastes a password, a private key, or someone\u0026rsquo;s personal data into the chat? It leaves the cluster. It\u0026rsquo;s logged somewhere you don\u0026rsquo;t control.\nThe standard answer is \u0026ldquo;train your users.\u0026rdquo; I\u0026rsquo;d rather have a technical control.\nThe architecture Open WebUI → ai-guard proxy │ ┌────────┴────────┐ │ │ llama-server if SAFE: (classify) forward to NVIDIA NIM │ if SENSITIVE: block + explain Every request to NVIDIA NIM goes through ai-guard first. ai-guard pulls the user message, sends it to the local llama.cpp server with a classification prompt, and makes a binary decision:\nSAFE → forward to NVIDIA NIM with the real API key (which ai-guard holds, not the client) SENSITIVE: \u0026lt;reason\u0026gt; → return HTTP 400, log the block, nothing leaves the cluster The local model is already running for inference — this reuses it as a privacy gatekeeper at zero extra infrastructure cost.\nThe implementation The proxy is ~150 lines of FastAPI. The classifier call:\nCLASSIFIER_PROMPT = \u0026#34;\u0026#34;\u0026#34;You are a data security classifier. Check if the text below contains sensitive information: passwords, API keys, tokens, credentials, personal identifiable information (names, emails, phone numbers, SSNs, addresses), financial data (card numbers, bank accounts), or private keys. Reply with ONLY one of: SAFE SENSITIVE: \u0026lt;one-line reason\u0026gt; Text to check: \u0026#34;\u0026#34;\u0026#34; async def classify(text: str) -\u0026gt; tuple[bool, str]: async with httpx.AsyncClient(timeout=60) as client: resp = await client.post( f\u0026#34;{LLAMA_BASE}/chat/completions\u0026#34;, json={ \u0026#34;model\u0026#34;: \u0026#34;phi-3.5-mini\u0026#34;, \u0026#34;messages\u0026#34;: [{\u0026#34;role\u0026#34;: \u0026#34;user\u0026#34;, \u0026#34;content\u0026#34;: CLASSIFIER_PROMPT + text[:3000]}], \u0026#34;max_tokens\u0026#34;: 30, \u0026#34;temperature\u0026#34;: 0, \u0026#34;stream\u0026#34;: False, }, headers={\u0026#34;Authorization\u0026#34;: \u0026#34;Bearer sk-no-key\u0026#34;}, ) answer = resp.json()[\u0026#34;choices\u0026#34;][0][\u0026#34;message\u0026#34;][\u0026#34;content\u0026#34;].strip() if answer.upper().startswith(\u0026#34;SENSITIVE\u0026#34;): reason = answer.split(\u0026#34;:\u0026#34;, 1)[1].strip() if \u0026#34;:\u0026#34; in answer else \u0026#34;sensitive content detected\u0026#34; return True, reason return False, \u0026#34;\u0026#34; temperature=0 and max_tokens=30 keep the response deterministic and fast. The model only needs to output one word or one line.\nThe main handler:\n@app.post(\u0026#34;/v1/chat/completions\u0026#34;) async def proxy_chat(request: Request): body = await request.json() user_text = extract_user_text(body.get(\u0026#34;messages\u0026#34;, [])) if user_text.strip(): try: is_sensitive, reason = await classify(user_text) except Exception as exc: log.error(\u0026#34;classifier error: %s — allowing request through\u0026#34;, exc) is_sensitive = False if is_sensitive: return JSONResponse(status_code=400, content={ \u0026#34;error\u0026#34;: { \u0026#34;message\u0026#34;: f\u0026#34;Request blocked by ai-guard: {reason}. Remove sensitive content before sending to external models.\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;content_policy_violation\u0026#34;, } }) # Safe — forward to upstream with streaming support ... Fail-open: if the classifier itself errors (llama-server down, timeout), the request goes through and the error is logged. Fail-closed would be safer for high-stakes environments, but this is a homelab and I\u0026rsquo;d rather not block all cloud LLM access because the local model is warming up.\nKubernetes deployment ai-guard runs in the same namespace as llama-server and Open WebUI (web-ai-engine). Intra-namespace traffic is always allowed in Cilium, so no new network policy needed.\nOpen WebUI uses semicolon-separated lists for multiple API backends:\n- name: OPENAI_API_BASE_URLS value: \u0026#34;http://llama-server.web-ai-engine.svc:8080/v1;http://ai-guard.web-ai-engine.svc:8080/v1\u0026#34; - name: OPENAI_API_KEYS value: \u0026#34;sk-no-key;sk-no-key\u0026#34; The second entry is ai-guard. Open WebUI passes sk-no-key as the API key — ai-guard ignores it and uses its own UPSTREAM_API_KEY from a Kubernetes Secret (pulled from Vault via External Secrets Operator). The real NVIDIA API key never touches the client.\nThe latency tradeoff The classification step adds 5–15 seconds on CPU inference. That\u0026rsquo;s the cost of keeping the check fully private — the classifier never sends data anywhere.\nFor a personal homelab assistant, this is fine. For a high-throughput production setup, you\u0026rsquo;d want the classifier on a GPU or a dedicated smaller model purpose-built for classification.\nWhat it catches The classifier prompt targets:\nPasswords, API keys, tokens, credentials PII: names, emails, phone numbers, SSNs, addresses Financial data: card numbers, bank accounts Private keys False negatives are possible — no classifier is perfect. This is a first line of defense, not a compliance control. The value is catching the obvious, accidental leaks.\nSource github.com/janos-gyorgy/ai-guard — MIT licensed, Kubernetes manifests included.\n","permalink":"https://blog.hippotion.com/posts/ai-pii-guardrail-proxy/","summary":"A local model classifies every prompt before it leaves the cluster. If it\u0026rsquo;s sensitive, it\u0026rsquo;s blocked. If it\u0026rsquo;s clean, it goes to NVIDIA NIM. 150 lines of FastAPI, deployed on k3s.","title":"🔒 Building a PII Guardrail Proxy for Cloud LLM Calls"},{"content":"The problem with blocking The PII guardrail proxy I built last week works by classifying prompts and blocking the sensitive ones. That\u0026rsquo;s fine for a chat interface where a human can rephrase. It doesn\u0026rsquo;t work for automated pipelines.\nIf a Jira ticket contains someone\u0026rsquo;s name and an internal hostname, you don\u0026rsquo;t want the agent to fail — you want it to process the ticket without exposing that data. Blocking is the wrong primitive for pipelines. Anonymization is the right one.\nThe pattern Input text → anonymizer: extract PII, replace with semantic fakes → \u0026#34;Nathan Chen from DataSoft LLC needs ProjectX fixed on dev.internal.net\u0026#34; + mapping: {\u0026#34;Nathan Chen\u0026#34; → \u0026#34;John Smith\u0026#34;, \u0026#34;DataSoft LLC\u0026#34; → \u0026#34;ACME\u0026#34;, ...} → cloud LLM: processes coherent text, never sees real values → \u0026#34;Nathan Chen should check the ProjectX docs with the DataSoft LLC team\u0026#34; → string substitution with reverse mapping → \u0026#34;John Smith should check the OAuth docs with the ACME team\u0026#34; Two things that make this work:\nDeanonymization needs no LLM. Once you have the mapping, restoring is pure string substitution. The model call only happens on the way in.\nSemantic fakes beat placeholder tokens. An earlier version of this used [PERSON_1], [ORG_1] tokens. The problem: cloud models see bracketed text and subtly change behaviour — shorter responses, hedging, dropped context. When the cloud model sees Nathan Chen from DataSoft LLC, it treats it as real text and responds naturally. Quality is noticeably better.\nPrior art — what already exists This is a well-established pattern. Worth knowing what\u0026rsquo;s out there:\nLLM Guard (Protect AI) — the most complete open-source implementation. Anonymize + Deanonymize scanner pair with a Vault for the mapping. Production-grade, actively maintained. Start here if you\u0026rsquo;re building this for anything serious.\nMicrosoft PII Shield — session-based proxy. Returns a session ID with the anonymized text, uses it to deanonymize the response.\nanonLLM — uses GLiNER (a proper NER model) + Faker for realistic replacements. Better accuracy than a general chat model.\nREDACT — IEEE paper describing a system using Ollama for PII redaction in documents.\nHuggingFace Anonymizer SLM series — purpose-built models (0.6B/1.7B/4B) fine-tuned specifically for anonymization. 9.20/10 quality score for 1.7B, close to GPT-4.1\u0026rsquo;s 9.77.\nThat last one is what this implementation actually uses.\nThe model: Anonymizer-1.7B eternisai/Anonymizer-1.7B is a Qwen3-1.7B fine-tune trained on ~30k anonymization samples using GRPO with GPT-4.1 as judge. It outputs structured tool calls instead of free text:\n{ \u0026#34;name\u0026#34;: \u0026#34;replace_entities\u0026#34;, \u0026#34;arguments\u0026#34;: { \u0026#34;replacements\u0026#34;: [ {\u0026#34;original\u0026#34;: \u0026#34;John Smith\u0026#34;, \u0026#34;replacement\u0026#34;: \u0026#34;Nathan Chen\u0026#34;}, {\u0026#34;original\u0026#34;: \u0026#34;ACME Corp\u0026#34;, \u0026#34;replacement\u0026#34;: \u0026#34;DataSoft LLC\u0026#34;}, {\u0026#34;original\u0026#34;: \u0026#34;auth.acme.internal\u0026#34;, \u0026#34;replacement\u0026#34;: \u0026#34;dev.internal.net\u0026#34;} ] } } No prompt engineering needed. The model knows exactly what it\u0026rsquo;s doing and outputs a structured contract. Compare that to the first version of this service, which sent a long JSON-format prompt to Phi-3.5-mini and hoped the output parsed correctly.\nThe model runs via Ollama (which handles the Qwen3 chat template and tool calling natively), pointed at the GGUF version from HuggingFace: hf.co/gabriellarson/Anonymizer-1.7B-GGUF.\nThe implementation llm-anonymizer is a FastAPI service with two endpoints.\nPOST /anonymize — calls Ollama with the tool definition, parses the response:\nTOOLS = [{ \u0026#34;type\u0026#34;: \u0026#34;function\u0026#34;, \u0026#34;function\u0026#34;: { \u0026#34;name\u0026#34;: \u0026#34;replace_entities\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;Replace PII entities with anonymized versions\u0026#34;, \u0026#34;parameters\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;object\u0026#34;, \u0026#34;properties\u0026#34;: { \u0026#34;replacements\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;array\u0026#34;, \u0026#34;items\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;object\u0026#34;, \u0026#34;properties\u0026#34;: { \u0026#34;original\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;string\u0026#34;}, \u0026#34;replacement\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;string\u0026#34;}, }, \u0026#34;required\u0026#34;: [\u0026#34;original\u0026#34;, \u0026#34;replacement\u0026#34;], }, } }, \u0026#34;required\u0026#34;: [\u0026#34;replacements\u0026#34;], }, }, }] resp = await client.post(f\u0026#34;{OLLAMA_BASE}/api/chat\u0026#34;, json={ \u0026#34;model\u0026#34;: MODEL, \u0026#34;messages\u0026#34;: [ {\u0026#34;role\u0026#34;: \u0026#34;system\u0026#34;, \u0026#34;content\u0026#34;: SYSTEM_PROMPT}, {\u0026#34;role\u0026#34;: \u0026#34;user\u0026#34;, \u0026#34;content\u0026#34;: text + \u0026#34;\\n/no_think\u0026#34;}, # skip Qwen3 thinking mode ], \u0026#34;tools\u0026#34;: TOOLS, \u0026#34;stream\u0026#34;: False, }) tool_calls = resp.json()[\u0026#34;message\u0026#34;][\u0026#34;tool_calls\u0026#34;] replacements = tool_calls[0][\u0026#34;function\u0026#34;][\u0026#34;arguments\u0026#34;][\u0026#34;replacements\u0026#34;] # Build reverse mapping: replacement → original (for deanonymization) anonymized = text mapping = {} for pair in replacements: anonymized = anonymized.replace(pair[\u0026#34;original\u0026#34;], pair[\u0026#34;replacement\u0026#34;]) mapping[pair[\u0026#34;replacement\u0026#34;]] = pair[\u0026#34;original\u0026#34;] The /no_think suffix tells the model to skip its chain-of-thought — faster response, same accuracy for this task.\nPOST /deanonymize — no model call, just substitution:\nfor replacement, original in sorted(mapping.items(), key=lambda x: len(x[0]), reverse=True): text = text.replace(replacement, original) Sorted by length descending so longer tokens don\u0026rsquo;t get partially overwritten by shorter ones.\nThe Kubernetes stack Ollama runs as a separate deployment in the same namespace as everything else (web-ai-engine). Intra-namespace traffic is always allowed — no new network policies.\nllm-anonymizer (FastAPI) → Ollama (port 11434) → Anonymizer-1.7B GGUF One-time model pull after first deploy:\nkubectl exec -n web-ai-engine deploy/ollama -- \\ ollama pull hf.co/gabriellarson/Anonymizer-1.7B-GGUF Ollama caches it on a 10Gi PVC, so pod restarts don\u0026rsquo;t re-download.\nThe n8n pipeline Five-node chain triggered by webhook:\nWebhook → /anonymize → NVIDIA NIM → /deanonymize → Respond The NVIDIA NIM call includes a system prompt instructing it to treat the text as normal input. No mention of tokens, no special handling — because the text looks like real text.\nWire any upstream source to the webhook: Jira event, Slack slash command, a scheduled job that processes internal docs. The pipeline is source-agnostic.\nThe caveats 1.7B isn\u0026rsquo;t GPT-4.1. The model scores 9.20/10 on the benchmark — which means roughly 1 in 10 cases has a missed or incorrect entity. Test with real examples from your domain before depending on it.\nDeanonymization breaks on heavy rephrasing. If the cloud model restructures a sentence enough that the fake value no longer appears verbatim, the substitution silently misses it. The prompt helps but doesn\u0026rsquo;t eliminate the risk.\nOllama adds a deployment. It\u0026rsquo;s ~500MB image + the model weights (~1GB Q4). On a constrained single-node cluster that\u0026rsquo;s real overhead. llama-server already covers general chat; Ollama is purely for this model\u0026rsquo;s tool-calling support.\nSource github.com/janos-gyorgy/llm-anonymizer — MIT licensed, Kubernetes manifests and n8n workflow included.\n","permalink":"https://blog.hippotion.com/posts/llm-anonymizer-privacy-pipeline/","summary":"Replace PII with semantically realistic fakes before sending to a cloud LLM, then restore the originals from the response. Started with a general model and prompt engineering — then upgraded to a purpose-built 1.7B fine-tune via Ollama.","title":"🕵️ Privacy-Preserving LLM Pipelines: Anonymize Before You Send"},{"content":"What \u0026ldquo;operating an LLM\u0026rdquo; actually means Running a local model is easy. Understanding what it\u0026rsquo;s doing is less so.\nAfter deploying llama.cpp + Open WebUI on k3s (previous post), I had a chat interface backed by a local model. What I didn\u0026rsquo;t have: any visibility into how the model was behaving — whether requests were queuing, how fast tokens were being generated, how much of the context window was in use.\nThe instinct for this kind of problem is usually \u0026ldquo;add a proxy layer.\u0026rdquo; There are several tools in this space — LiteLLM being the most popular — that sit between the client and the inference server and record token counts, latency, and spend. I tried this first. LiteLLM OOMed at startup on a node already at 76% memory. Heavy Python import tree, not a lot of headroom.\nThe thing I\u0026rsquo;d missed: llama.cpp ships a Prometheus metrics endpoint. No proxy required.\n--metrics One additional argument to the inference server:\nargs: - -m - /models/Phi-3.5-mini-instruct-Q4_K_M.gguf - --host - \u0026#34;0.0.0.0\u0026#34; - --port - \u0026#34;8080\u0026#34; - --ctx-size - \u0026#34;4096\u0026#34; - --n-predict - \u0026#34;1024\u0026#34; - --parallel - \u0026#34;1\u0026#34; - --metrics # ← this - --log-disable After restart, GET /metrics on port 8080 returns valid Prometheus exposition format:\n# HELP llamacpp:tokens_predicted_total Number of generation tokens processed. # TYPE llamacpp:tokens_predicted_total counter llamacpp:tokens_predicted_total 0 # HELP llamacpp:predicted_tokens_seconds Average generation throughput in tokens/s. # TYPE llamacpp:predicted_tokens_seconds gauge llamacpp:predicted_tokens_seconds 0 # HELP llamacpp:requests_processing Number of requests processing. # TYPE llamacpp:requests_processing gauge llamacpp:requests_processing 0 The full set of metrics:\nMetric Type What it measures llamacpp:prompt_tokens_total counter Input tokens processed (cumulative) llamacpp:tokens_predicted_total counter Output tokens generated (cumulative) llamacpp:prompt_tokens_seconds gauge Current prompt throughput (tok/s) llamacpp:predicted_tokens_seconds gauge Current generation throughput (tok/s) llamacpp:tokens_predicted_seconds_total counter Total time spent generating llamacpp:prompt_seconds_total counter Total time spent on prompts llamacpp:requests_processing gauge Requests currently being processed llamacpp:requests_deferred gauge Requests queued, waiting for a slot llamacpp:n_decode_total counter Total llama_decode() calls llamacpp:n_busy_slots_per_decode counter Slots active per decode call These cover the metrics that matter for a personal inference server: throughput, latency (derivable from total time / total tokens), and queue depth.\nPrometheus scrape config Adding a static scrape target in the existing Prometheus configuration:\nextraScrapeConfigs: | - job_name: llama-server static_configs: - targets: - llama-server.web-ai-engine.svc:8080 metrics_path: /metrics The only non-obvious thing here is the network policy: Prometheus lives in dashboard-homelab, and llama-server lives in web-ai-engine. With Cilium network policies enforcing namespace isolation, the dashboard namespace needs to be allowed to make inbound connections to the AI engine namespace. In applications.yml:\n- namespace: web-ai-engine networkPolicies: allowIngressFromNamespaces: [dashboard-homelab] Without this, Prometheus scrape attempts fail silently with a timeout.\nGrafana dashboard via ConfigMap Rather than importing a dashboard JSON manually through the Grafana UI, the Grafana sidecar handles it automatically. Any ConfigMap with the label grafana_dashboard: \u0026quot;1\u0026quot; is picked up, loaded, and available in Grafana — across all namespaces by default.\nThe dashboard ConfigMap lives in web-ai-engine, not dashboard-homelab. The sidecar finds it regardless:\napiVersion: v1 kind: ConfigMap metadata: name: grafana-dashboard-llm namespace: web-ai-engine labels: grafana_dashboard: \u0026#34;1\u0026#34; data: llm-metrics.json: | { \u0026#34;title\u0026#34;: \u0026#34;LLM Metrics\u0026#34;, \u0026#34;uid\u0026#34;: \u0026#34;llm-metrics\u0026#34;, ... } Argo CD reconciles the ConfigMap. The Grafana sidecar picks it up. The dashboard appears. No manual steps, no Grafana UI interaction, no state outside Git.\nThis means the dashboard is version-controlled, reproducible on cluster rebuild, and consistent across environments. The same YAML that describes the app\u0026rsquo;s Kubernetes resources also describes what the monitoring looks like.\nWhat the dashboard shows After sending a few messages through Open WebUI:\nGeneration throughput — the llamacpp:predicted_tokens_seconds gauge drops to 0 between requests and spikes during generation. On this hardware (Intel N100, CPU-only inference, Phi-3.5-mini Q4_K_M), it reads 3–5 tok/s during active generation. This is the number to watch if you\u0026rsquo;re comparing models or quantisation levels.\nCumulative tokens — llamacpp:prompt_tokens_total and llamacpp:tokens_predicted_total both increase monotonically. The ratio between them is roughly the input/output ratio of your usage pattern. For conversational use it\u0026rsquo;s typically 3:1 prompt to generation; for summarisation tasks it flips.\nQueue depth — llamacpp:requests_deferred is 0 almost always, which is expected with --parallel 1. If it\u0026rsquo;s consistently above 0, you have more concurrent users than the server can handle with the current slot configuration.\nms/token — derived from rate(llamacpp:tokens_predicted_seconds_total[5m]) / rate(llamacpp:tokens_predicted_total[5m]) * 1000. This is the per-token latency, which is the number that governs whether the response feels fast or slow. 200–300ms/token feels instant; above 400ms you start noticing.\nWhat\u0026rsquo;s missing compared to a proxy layer LiteLLM and similar proxies give you things this setup doesn\u0026rsquo;t:\nPer-model routing — if you\u0026rsquo;re running multiple models, a proxy can route requests to the right one. With a single model, irrelevant. Virtual API keys — per-user or per-application key scoping. Not needed when the whole thing is behind SSO. Spend tracking — meaningful when you\u0026rsquo;re paying per token. For a local model, the cost is electricity, which Prometheus already covers through the power monitoring dashboard. For a single-model homelab, the native metrics are sufficient. If I add more models later or need per-user attribution, a proxy layer becomes worth the RAM.\nThe pattern The broader point is that the observable unit here isn\u0026rsquo;t the proxy — it\u0026rsquo;s the inference server itself. Scraping llama.cpp directly means the metrics survive proxy changes, backend swaps, or routing redesigns. The inference server is the thing doing the work; it\u0026rsquo;s the right place to measure.\nStarter manifests with the metrics configuration included: homelab-ai-inference-starter\n","permalink":"https://blog.hippotion.com/posts/llm-observability-llamacpp-prometheus/","summary":"llama.cpp\u0026rsquo;s inference server ships a /metrics endpoint. One flag, Prometheus scraping, a Grafana dashboard loaded via ConfigMap sidecar — AI observability without a proxy layer.","title":"📈 Observing Local LLM Inference: llama.cpp's Built-in Prometheus Metrics"},{"content":"The GPU assumption Most write-ups about self-hosting LLMs start with a GPU. A 3090, an A100, at minimum something with CUDA. The implication is that without one you\u0026rsquo;re wasting your time — inference will be too slow to be useful.\nThat\u0026rsquo;s not been my experience.\nI\u0026rsquo;ve been running a local LLM stack on a ThinkCentre mini PC (Intel N100, 16 GB RAM, no discrete GPU) for a few months. The model is Phi-3.5-mini-instruct, 3.8 billion parameters, 4-bit quantised. Response time is 3–6 tokens per second on CPU — slow enough that you notice it, fast enough that you use it. For the things I actually reach for a local model to do — rephrase something, summarise a document, explain a config option without sending it to an external API — the latency is fine.\nThe point isn\u0026rsquo;t that CPU inference beats GPU inference. It\u0026rsquo;s that \u0026ldquo;good enough for personal use\u0026rdquo; is a much lower bar than \u0026ldquo;production LLM serving\u0026rdquo;, and the hardware you already have probably clears it.\nThe stack Two components:\nllama.cpp (ghcr.io/ggml-org/llama.cpp:server) — inference server that loads a GGUF model file and exposes an OpenAI-compatible REST API. No Python, no framework overhead, minimal memory footprint beyond the model itself.\nOpen WebUI (ghcr.io/open-webui/open-webui) — a polished chat interface that speaks OpenAI API format. It points at the llama-server endpoint as its backend, handles conversation history, and supports RAG file uploads out of the box.\nThe architecture is simple on purpose:\nBrowser → Open WebUI (:80) │ │ OpenAI-compatible API ▼ llama-server (:8080) │ │ reads GGUF model file ▼ hostPath /srv/ai-models Open WebUI doesn\u0026rsquo;t know or care that the backend is llama.cpp running on CPU. It sees an OpenAI-compatible API. This matters: if I swap llama-server for Ollama, vLLM, or a cloud endpoint, the frontend doesn\u0026rsquo;t change. The interface is the standard.\nModel choice GGUF models on Hugging Face are available at multiple quantisation levels. The trade-off is quality vs. RAM:\nModel Quant Size RAM at runtime Notes Llama-3.2-3B Q4_K_M ~2 GB ~3 GB Fastest, lowest quality Phi-3.5-mini Q4_K_M ~2.4 GB ~3–4 GB Good balance — what I use Mistral-7B-Instruct Q4_K_M ~4.1 GB ~5–6 GB Noticeably better, needs more RAM Llama-3.1-8B Q4_K_M ~4.7 GB ~6–8 GB High quality, stretches 16 GB with other workloads On 16 GB RAM with a full k3s stack running alongside (Argo CD, Traefik, Vault, Prometheus, etc.), Phi-3.5-mini leaves enough headroom that the cluster stays stable. Mistral-7B works too, but it\u0026rsquo;s tighter.\nModels live in /srv/ai-models on the node, mounted into the pod as a hostPath volume. Single-node homelab, so there\u0026rsquo;s no scheduling concern. Download once with wget, done.\nKey configuration choices Context size (--ctx-size 4096): How many tokens the model holds in its attention window. Larger context = more RAM + slower inference. 4096 is fine for conversational use. If you\u0026rsquo;re summarising long documents, bump to 8192 and watch your RAM usage.\nMax output tokens (--n-predict 1024): Hard cap on response length. llama.cpp will stop there even mid-sentence. 1024 is usually enough; increase if you find it cutting off long explanations.\nParallel slots (--parallel 1): How many concurrent inference requests the server handles. On CPU there\u0026rsquo;s no benefit to more than 1 — each slot competes for the same cores. Leave it at 1.\nMemory limits: Set the container limit to roughly 2× the model\u0026rsquo;s file size. A 2.4 GB GGUF typically uses 3–4 GB at runtime with context loaded.\nresources: requests: cpu: 500m memory: 1Gi limits: memory: 6Gi No CPU limit. llama-server will use however many cores are available during inference — that\u0026rsquo;s what makes it usable. A CPU limit would throttle inference to unusable speeds.\nDeployment as a GitOps push The whole stack lives in one YAML values file, deployed through the extra-objects chart that I use for raw manifests across the cluster. Argo CD watches the repo and reconciles automatically.\nNothing was kubectl apply-ed. The deployment happened by pushing to Git.\nWhat that means in practice: when I bumped the Open WebUI image version, I changed one line, pushed, and Argo CD rolled the pod. No manual steps, no SSH, no kubectl. The same process I use for any other service in the cluster.\nThe namespace, network policies, service account, and RBAC all generate from a single entry in applications.yml — same as every other app. The AI inference stack isn\u0026rsquo;t special from an operations perspective.\n# applications.yml excerpt - namespace: web-ai-engine applications: - applicationCode: web-ai-engine path: helm-charts/extra-objects autoSync: true Access and auth The service is exposed at ai.hippotion.com through the same dual-path ingress setup I use everywhere: Cloudflare Tunnel for external access, direct-to-server via Pi-hole DNS for local access, Traefik handling both with a wildcard Let\u0026rsquo;s Encrypt cert. See that post for the full explanation.\nAuth is handled by Traefik\u0026rsquo;s ForwardAuth middleware pointing at an oauth2-proxy backed by GitLab. Open WebUI\u0026rsquo;s own auth is disabled (WEBUI_AUTH: false) — the OAuth layer upstream handles it. One login covers every service in the cluster.\nThe WEBUI_SECRET_KEY (used to sign Open WebUI sessions) comes from Vault via External Secrets Operator. Nothing sensitive in Git.\nWhat the day-to-day is actually like Slow is the obvious caveat. Phi-3.5-mini at 3–6 tok/s means a paragraph-length response takes 20–30 seconds. For coding help where you\u0026rsquo;re reading what came before while it generates, that\u0026rsquo;s fine. For quick factual lookups, it\u0026rsquo;s a little tedious.\nThe useful cases for a local model, for me:\nRephrasing or editing text — paste something, ask it to tighten it. No data leaves the house. Config explanation — paste a Kubernetes manifest or a Traefik config block, ask what it does. Again, stays local. Quick summaries — short documents, log snippets, error messages. Experimentation — trying prompting techniques, testing system prompts, benchmarking quantisation levels without API costs. For longer reasoning tasks I use a cloud model. The local stack is for the cases where I want the answer to stay on-premises, or where I\u0026rsquo;m iterating and don\u0026rsquo;t want to pay per token.\nThe starting point if you want to try it The manifests are on GitHub: homelab-ai-inference-starter\nIt includes the llama-server and Open WebUI deployments, resource configuration, and ingress options for Traefik and nginx. The README walks through downloading a model, applying the manifests, and the configuration knobs worth knowing.\nNo GPU required. The ThinkCentre in the corner of my desk does the job.\n","permalink":"https://blog.hippotion.com/posts/local-llm-k8s-no-gpu/","summary":"A CPU-only self-hosted LLM stack running on k3s: llama.cpp as the inference server, Open WebUI as the chat interface, deployed as a single Git push.","title":"🤖 Local LLM Inference on Kubernetes, No GPU Required"},{"content":"The reflex Something\u0026rsquo;s wrong. A GitLab runner stops picking up jobs. An event processor starts dropping messages. A pod restarts in a loop. The node looks healthy — CPU fine, memory fine — but something is clearly off.\nThe reflex: restart the node, see if it clears.\nSometimes it does clear, and you move on. But you didn\u0026rsquo;t fix anything. You reset the state and crossed your fingers. If it happens again in two weeks, you\u0026rsquo;ll do the same thing. After enough iterations you have a \u0026ldquo;flaky node\u0026rdquo; that everyone reboots periodically and nobody understands.\nThere\u0026rsquo;s a better sequence. It takes twenty minutes instead of two, and you come out with either a real fix or actual knowledge of what happened.\nStep one: quarantine, don\u0026rsquo;t kill Before you touch anything, take the node out of rotation without destroying its current state.\nkubectl cordon \u0026lt;node\u0026gt; Cordon marks the node as unschedulable. No new pods land on it. Existing pods keep running. If you need the workloads somewhere else immediately:\nkubectl drain \u0026lt;node\u0026gt; --ignore-daemonsets --delete-emptydir-data Now you\u0026rsquo;ve removed the node from production traffic without rebooting. The node is still alive. Everything that happened on it is still there: logs, open files, kernel ring buffer, running processes, memory state.\nThis is the difference. A reboot wipes that. A cordon preserves it.\nStep two: look at what\u0026rsquo;s actually there SSH in. Don\u0026rsquo;t grep for anything specific yet — do a pass for anything unusual.\nKernel messages first. The kernel will often tell you exactly what went wrong before any application did.\ndmesg -T --level=err,warn | tail -50 OOM kills show up here. Disk errors show up here. CPU soft lockups show up here. If you\u0026rsquo;ve got any of those, you have your answer before you\u0026rsquo;ve even looked at application logs.\nCheck for filesystem problems.\ndf -h # is anything full? dmesg | grep -i \u0026#34;ext4\\|xfs\\|btrfs\\|i/o error\\|ata\u0026#34; A filesystem at 100% is silent until it isn\u0026rsquo;t. A flaky drive starts dropping I/O errors into dmesg long before SMART reports anything. Application developers rarely think about this case — their app just starts writing logs that say \u0026ldquo;failed to write\u0026rdquo; without specifying that the disk is full or dying.\nSystem resource pressure.\nvmstat 1 5 # is there swap activity? iostat -x 1 5 # is a disk saturated? cat /proc/pressure/io # kernel PSI — pressure stall info PSI is underused. It tells you whether processes were actually stalled waiting for I/O, not just whether throughput was high. A disk at 80% utilisation might be fine; a disk with 40% I/O PSI pressure is actively hurting performance.\nWhat were the pods doing right before things went sideways?\nkubectl describe node \u0026lt;node\u0026gt; # events section at the bottom kubectl get events --field-selector involvedObject.kind=Pod -A | sort -k1 Look for OOMKilled exits, failed liveness probes, and throttling events. Kubernetes events expire after an hour by default — another reason not to reboot immediately; those events are still there if you look now.\nA real example: the GitLab runner A GitLab runner pod stops picking up jobs. It looks alive — the process is running, no crashes in the pod logs. Jobs sit in the queue.\nRestart reflex: delete the pod, let it reschedule, it picks up jobs again.\nBut why did it stop?\njournalctl -u gitlab-runner --since \u0026#34;1 hour ago\u0026#34; # or, if it\u0026#39;s a container: kubectl logs \u0026lt;runner-pod\u0026gt; --previous In one instance: the runner\u0026rsquo;s working directory was on a tmpfs that hit its size limit. The runner silently failed to create job workspaces and stopped accepting new jobs. The error was one line in the pod logs: mkdir /builds: no space left on device. The pod was healthy by every other metric.\nFix: bump the tmpfs size limit in the runner config. The restart would have cleared tmpfs temporarily, and the runner would have failed again the next time a large job filled it up.\nThe debug took five minutes. The permanent fix took two minutes. Without quarantining the node first, the evidence was gone.\nAnother one: the event consumer An event processor starts falling behind. Messages queue up. The pod shows no errors. Memory looks fine.\nThis one was subtler: the processor was connected to a downstream dependency over a persistent TCP connection. The connection had gone into a half-open state — the processor thought it was alive, the remote end had already dropped it. New messages were being sent into a dead socket and silently discarded.\nss -tnp | grep \u0026lt;pid\u0026gt; # look at the socket state CLOSE_WAIT on a connection that should be ESTABLISHED. The application wasn\u0026rsquo;t checking whether the connection was actually working before using it, just whether it existed.\nRestart would have cleared the socket state, fixed the symptom, and left the bug in the code.\nWhat to look for — a short checklist When a node is misbehaving, in order:\ndmesg -T --level=err,warn — kernel errors, OOM kills, disk errors df -h \u0026amp;\u0026amp; df -i — full filesystems (space and inodes separately) kubectl describe node \u0026lt;node\u0026gt; — pressure conditions, recent events kubectl logs \u0026lt;pod\u0026gt; --previous — what the pod logged before it died or got stuck ss -tnp — socket states for network-adjacent issues vmstat 1 5 + iostat -x 1 5 — resource pressure journalctl -p err -b — system journal errors since last boot Most problems show up in the first three.\nAfter you\u0026rsquo;ve found something (or not found something) If you found the cause: fix it, test it, uncordon the node.\nkubectl uncordon \u0026lt;node\u0026gt; Document what you found — a comment in the relevant config, a commit message, a note. \u0026ldquo;Fixed runner tmpfs limit\u0026rdquo; in the commit history is more useful than \u0026ldquo;flaky runner, restarted.\u0026rdquo;\nIf you genuinely found nothing: that\u0026rsquo;s information too. Cordon, reboot, uncordon, and note that the node rebooted clean with no identified cause. If it happens again, you have a pattern. Check whether anything changed in the workloads around that time. Check whether the reboot timing correlates with anything — cron jobs, backups, maintenance windows.\nA reboot you can explain is a fix. A reboot you can\u0026rsquo;t explain is a time bomb.\nWhy this matters on a single-node cluster In a multi-node setup you can afford to be lazier — cordon, drain, reboot, let the scheduler handle it, look at it later. On a single node there\u0026rsquo;s no \u0026ldquo;later.\u0026rdquo; The node coming back is all you\u0026rsquo;ve got.\nBut the habit is worth building regardless of node count. The engineers who understand their systems are the ones who looked before they rebooted.\nThe actual rule Quarantine first. Debug second. Restart third (if you still need to).\nA restart takes two minutes. The evidence it destroys might take two hours to reconstruct — or might be gone for good. The cordon costs you nothing.\n","permalink":"https://blog.hippotion.com/posts/dont-restart-quarantine-first/","summary":"Rebooting a misbehaving node feels productive. It isn\u0026rsquo;t. You\u0026rsquo;re erasing your evidence and skipping the lesson.","title":"🚨 Don't Restart the Node. Quarantine It First."},{"content":"The problem I built Dice \u0026amp; Shrines with five asymmetric guardian characters. Each one has a different passive and active ability that changes how reinforcements distribute, which territories you can attack, and what happens when you take damage.\nThe question I couldn\u0026rsquo;t answer from just playing was: are they actually balanced?\nNot \u0026ldquo;do they feel different\u0026rdquo; — they obviously do. But is Fox\u0026rsquo;s stored critical actually overpowered? Is Turtle\u0026rsquo;s loss-recovery passive strong enough to matter, or is it just flavour? Is there a first-mover advantage baked into the map structure?\nYou can\u0026rsquo;t answer questions like these from vibes. You need data. So I built a stats service.\nWhat gets recorded Every game produces five event types, posted as fire-and-forget HTTP calls from the game client to game-stats.hippotion.com/event:\nmap_generated — logged when the map generator accepts a map. Records territory count, average territory size, minimum size, and how many generation attempts it took. This tells me how often the generator discards its own work and whether the acceptance criteria are too strict.\ngame_start — fired when a game begins. Captures the number of players, the guardian assigned to each slot, and which slot is human. Returns a gameId that travels with the game for the rest of its life.\nattack — fired on every single dice roll. Attacker, defender, from-territory, to-territory, how many dice each side had, what they rolled, who won. This is the raw material for the probability analysis.\nelimination — fired when a player is knocked out. Records which guardian they were and how many players remained, so I can tell who exits first and who makes the final stand.\ngame_end — fired on win or abandon. Records the winner\u0026rsquo;s guardian, how many turns the game took, and whether it was abandoned.\nThe service is a FastAPI app backed by PostgreSQL, running in the homelab on the same k3s cluster as the game. About 150 lines of Python plus a schema.sql that the app runs on startup.\nThe dashboard The stats dashboard is a single-page HTML response from / — self-contained, no external framework, chart.js for the visualisations. It polls /api/stats every 30 seconds and updates in place.\nWhat it shows:\nOverview cards: total games, games today, games this week, human win rate, average turns per game, overall attack win rate, abandoned game count.\nActivity charts: games per day (last 7 days), game duration distribution in 10-turn buckets.\nDeath spiral analysis: when players abandon (broken into phases: instant, early, mid-early, mid, late), and first-mover advantage — win percentage by player slot 0 through 5.\nAttack behaviour: the dice margin chart is the most interesting one. It shows attack volume and win rate for every possible attacker-dice-minus-defender-dice value, from strongly negative (attacker is outmatched) to strongly positive. Overlaid: a win rate line. You can see the actual probability curve emerging from real games and compare it to what the math predicts.\nGuardian intelligence: win rate, pick count, average attacks per game, survival rate to turn 50+, and average turns per winning game — per guardian, human players only.\nElimination intelligence: when the first player gets knocked out per game, and a guardian fate table showing average elimination order and first-out percentage. Earliest-exiting guardian is surfaced explicitly.\nMap influence: territory count versus average game length. Also an attack efficiency heatmap — win rate for every attacker-dice × defender-dice combination, 1 through 8, rendered as a colour grid.\nRecent games: last 15 games with the human player\u0026rsquo;s guardian, result, and IP address so I can tell if it\u0026rsquo;s me testing or an actual player who wandered in.\nWhat the data showed The attack win rate across all games sits just under 60%. That\u0026rsquo;s higher than a naive analysis suggests it should be — if both sides roll fairly, equal dice should be near-even. The explanation is selection bias: players only attack when they have a dice advantage. Nobody sends 2 dice at 8 dice repeatedly. The average attack has a positive margin, so the average win rate is above 50%.\nThe margin chart made this explicit. The plurality of attacks have a margin of +2 or more. The sub-zero margin attacks — technically losing plays — are a real but small fraction, usually late-game desperation or deliberate tempo plays.\nHuman vs AI attack quality turned out to be the sharpest comparison. Humans and AI have different average margins. The AI is greedy but disciplined about attack selection; humans sometimes take gambles the AI wouldn\u0026rsquo;t. You can see it in the numbers.\nFirst-mover advantage is measurable but not massive. Player slot 0 (goes first) has a slightly higher win rate than the average. Slots at the higher end of turn order are somewhat depressed. Not broken, but real — and a useful thing to watch if I ever add a competitive mode.\nGuardian balance: the win rate gap between the best and worst guardian tells me whether the balance is within acceptable range or a concern. The dashboard calls it out explicitly: if the gap exceeds 15 percentage points, it flags it as a balance issue. That threshold is arbitrary, but it forces a decision rather than letting drift accumulate unnoticed.\nAbandonment phases: most abandonments are instant — the player clicked \u0026ldquo;new game\u0026rdquo; before actually playing. The interesting number is mid-game abandonment, which is a proxy for death spirals: you see your income drop, you know you\u0026rsquo;re losing, you close the tab. That\u0026rsquo;s a design signal, not just a metric.\nDesigning for measurement The useful insight from building this is that it changes how you design the game. Once you know every attack is being logged, you start thinking about what the attack data will tell you. Shrines give territories a guaranteed die — does that show up in attack margins near shrine territories? I didn\u0026rsquo;t add territory-topology tracking, but I could. The schema is just a few columns away.\nThe same goes for guardian abilities. Fox\u0026rsquo;s stored critical fires at turn boundaries — I log turn number on every attack, so I can look for Fox spikes in attack win rate on certain turns. I haven\u0026rsquo;t run that query yet, but the data is there if the balance question becomes sharp enough to need it.\nThat\u0026rsquo;s the thing about adding observability to something you built yourself: you stop guessing about whether it\u0026rsquo;s working and start reading the evidence. The game got more interesting to design once I could see what was actually happening inside it.\nThe stack FastAPI — event intake and stats API, ~150 lines PostgreSQL — five tables: maps, games, game_guardians, attacks, eliminations chart.js — dashboard visualisations, loaded from CDN k3s + Argo CD — deployed as a Kubernetes pod, Dockerised, managed GitOps alongside everything else on the homelab Source at dice-n-shrines-stats.\n","permalink":"https://blog.hippotion.com/posts/dice-and-shrines-stats/","summary":"Building a telemetry backend for Dice \u0026amp; Shrines — every attack logged, every guardian tracked, every die rolled accounted for. What the data revealed about balance, luck, and how people actually play.","title":"📊 I Added a Stats Service to My Game to Answer One Question. It Multiplied."},{"content":"It started as a sandbox I wanted to get a feel for AI-assisted coding tools — Claude Code, Codex — in a low-stakes environment where breaking things was fine. A browser game seemed like the right vehicle: self-contained, no prod database, no users to disappoint.\nI picked a premise I knew was fun: Dice Wars. The Flash-era classic. Roll dice to attack adjacent territories, biggest army snowballs. Simple enough that I could focus on the tooling rather than the design. Or so I thought.\nSix weeks later I had five asymmetric character classes, a procedural hex map generator with acceptance criteria, a FastAPI telemetry service recording every dice roll, and a stat dashboard I check more than I probably should. The tooling became a background concern. The game took over.\nHow the game works The rules are genuinely minimal.\nYou start with a random slice of a procedurally generated map — a patchwork of irregular coloured territories, each one a cluster of hex tiles that reads as a solid blob. You and up to seven opponents begin scattered across it. Objective: own everything.\nAttacking is one click. Select your territory, click an adjacent enemy territory. Both sides roll all their dice and sum. Higher total wins. Attacker wins: you capture the territory, your dice advance in minus one. Defender wins: your attack is repelled, you\u0026rsquo;re reduced to a single die on the attacking territory.\nReinforcements are the mechanic that makes this a strategy game. At the end of your turn, you receive bonus dice equal to the size of your largest contiguous group of territories. Not total territories — the biggest connected blob. Fragmented territory generates almost nothing. A solid connected bloc snowballs.\nThat one rule creates the entire strategic texture. Grab fast but stay connected. Chokepoints are worth defending at a loss. Cutting an opponent in half collapses their income immediately. The late game turns into tense standoffs until one roll cracks something open.\nThe shrines Early on I added shrines — special territories marked with a ★. They behave differently from normal territories in a few ways:\nHigher dice cap: normal territories max out at 8 dice, shrines at 10 Minimum floor: a shrine never drops below 2 dice after attacking, win or loss — it can\u0026rsquo;t be stripped bare Guaranteed reinforcement: the shrine gets a die first at end of turn, before random distribution Aura: each of your own territories adjacent to the shrine gets a +1 guaranteed die (shown with a dim ◆ indicator) The shrine mechanic does something interesting to the risk calculus. Holding a shrine isn\u0026rsquo;t just a territory — it\u0026rsquo;s a node that warps the value of everything adjacent to it. You\u0026rsquo;ll defend an aura territory harder than a territory of equivalent size elsewhere on the map, because losing it means losing the aura bonus. It also means attacking into a shrine aura is expensive: the shrine can\u0026rsquo;t be worn down easily, and the neighboring territories keep refilling.\nShrines turned out to be the moment the math got interesting.\nFive guardians The game has a character select screen. Each player — human or AI — picks a guardian before the map generates. Five options, each with a passive and an active ability that meaningfully change how you play:\nHippo gets to manually place one reinforcement die each turn before the rest distribute randomly. One die placement doesn\u0026rsquo;t sound like much until you realise you always have a frontline territory that needs it more than anywhere else. High floor, consistent.\nHedgehog fills weakest territories first during reinforcement. Defensively solid — you never bleed out a border territory to zero while inland territories sit at cap. It doesn\u0026rsquo;t generate more dice, but it wastes fewer.\nFox banks a stored die every other turn and can spend it as a critical multiplier on an attack. The stored dice accumulate (up to a cap), so the power compounds if you resist using it. Two-hop flanking passive: Fox can attack across two territory hops from border territories, making it very hard to feel safely tucked away from.\nOwl has a passive two-hop attack range like Fox but for all attacks — Owl sees further. The active ability is a Dice Transfer: move dice from one of your territories to an adjacent friendly one. Lets you concentrate force without committing to an attack.\nTurtle gets 2 dice back to a neighboring territory any time it loses a defense. It\u0026rsquo;s the only guardian that turns taking damage into a resource. Hard to snowball, but very hard to finish off.\nThe AI cycles through the five guardians deterministically, so in a full 8-player game you\u0026rsquo;ll face at least one of each.\nThe map The map generator produces irregular coloured territories from a hex grid. Hexes are the visual scaffolding — what matters in gameplay is the blob they form. Internal edges within a territory are hidden; only the outer border of each cluster is drawn. The result reads like a contested piece of ground rather than a grid.\nThe generator has acceptance criteria. A proposed map is rejected if territory sizes are too uneven — a tiny starting territory with 2 hexes versus a sprawling 15-hex territory produces a wildly unfair game. The stats service actually records generation attempts and acceptance rates per map, so I can see how often the map generator throws away its own work.\nThe mapId is stamped on every game record, so I can eventually correlate map topology with game outcomes. I haven\u0026rsquo;t done that analysis yet, but the data is there.\nWhat the tools actually felt like Claude Code handles the kind of work that benefits from holding a lot of context simultaneously: \u0026ldquo;this change to resolveAttack() in game.js needs corresponding updates to the AI logic in ai.js and the render pass in render.js.\u0026rdquo; That\u0026rsquo;s tedious to track manually and exactly the kind of thing where the tool earns its keep.\nCodex was more useful for the boilerplate-heavy parts — filling in schema.sql, wiring up FastAPI endpoints, the chart.js setup in the dashboard. Directed generation of code you know exactly how should look.\nNeither tool replaces thinking. The guardian ability design, the shrine balance, the question of whether Fox\u0026rsquo;s stored critical is too swingy in epic mode — that\u0026rsquo;s all still just sitting down and working it out. The tools speed up the translation from \u0026ldquo;I know what I want\u0026rdquo; to \u0026ldquo;here is working code.\u0026rdquo; The game design itself doesn\u0026rsquo;t compress.\nIt\u0026rsquo;s live The game runs at dice.hippotion.com. Single HTML file served from a Kubernetes ConfigMap on my homelab. No accounts, no install. It runs fast — AI turns are instant when you toggle off animations, and a full game can be over in five minutes or stretch to twenty depending on how the map falls.\nStats dashboard at game-stats.hippotion.com. More on that in the next post.\n","permalink":"https://blog.hippotion.com/posts/dice-and-shrines/","summary":"What started as a Claude Code / Codex sandbox became a territory conquest game with five asymmetric guardians, procedurally generated hex maps, and a stats service to balance them. Here\u0026rsquo;s what happened.","title":"🎲 I Built a Browser Game to Learn AI Coding Tools. It Turned Into Something Else."},{"content":"The question \u0026ldquo;How do you achieve zero-downtime deployments in Kubernetes?\u0026rdquo;\nThe expected answer: rolling updates. That\u0026rsquo;s correct but incomplete. Rolling updates are the mechanism. They don\u0026rsquo;t give you zero downtime automatically — they give you a framework in which zero downtime is achievable, if you configure everything correctly.\nMost clusters cause brief downtime on every deployment. Usually 5–30 seconds. Usually blamed on \u0026ldquo;the load balancer\u0026rdquo; or \u0026ldquo;DNS\u0026rdquo;. Almost always caused by one of four missing pieces.\nThe rolling update baseline Kubernetes replaces pods in waves. You control the pace:\nspec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # how many extra pods can exist during update maxUnavailable: 0 # how many pods can be unavailable during update maxUnavailable: 0 means Kubernetes never terminates a pod until a replacement is ready. This prevents the obvious failure mode where you have zero running pods mid-deployment.\nmaxSurge: 1 means one extra pod beyond the desired count runs during the update. For a deployment with 3 replicas, you\u0026rsquo;ll briefly have 4 pods running.\nThis alone doesn\u0026rsquo;t prevent downtime.\nPiece 1: The readiness probe (the most common missing piece) Kubernetes considers a pod \u0026ldquo;ready\u0026rdquo; when all its containers pass their readiness probes. If you don\u0026rsquo;t define a readiness probe, Kubernetes considers the pod ready as soon as the container starts. Containers start before applications are ready to serve traffic.\n# Without this, traffic arrives before your app is listening readinessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 3 What happens without it: Kubernetes starts the new pod, marks it ready immediately, adds it to the Service endpoints, routes traffic to it — while your app is still initialising (loading config, connecting to the database, warming caches). The first few requests to the new pod fail or time out.\nThe fix: define a readiness probe that actually checks application readiness. An HTTP endpoint that returns 200 only after the app has finished starting is the minimum. A deeper check that verifies the database connection is better.\nCommon mistake: using the same endpoint for liveness and readiness with the same thresholds. They serve different purposes:\nReadiness: \u0026ldquo;am I ready to accept traffic?\u0026rdquo; — controls whether traffic is sent Liveness: \u0026ldquo;am I still alive?\u0026rdquo; — controls whether the pod is restarted A pod can fail its readiness probe (temporarily overloaded, warming up) without failing its liveness probe. If you make liveness too aggressive, Kubernetes restarts pods that would have recovered on their own.\nPiece 2: The termination grace period (the other common missing piece) When Kubernetes wants to terminate a pod, it sends SIGTERM. Your application has terminationGracePeriodSeconds (default: 30) to finish in-flight requests and shut down cleanly. After that, Kubernetes sends SIGKILL.\nThe problem: there\u0026rsquo;s a race condition. Kubernetes removes the pod from the Service endpoints and sends SIGTERM roughly simultaneously. The endpoint update has to propagate through the control plane, kube-proxy, and the load balancer. During that propagation window — typically 1–10 seconds — traffic can still arrive at a pod that has already started shutting down.\nThe fix is a preStop hook that adds a short sleep before the termination sequence:\nlifecycle: preStop: exec: command: [\u0026#34;sleep\u0026#34;, \u0026#34;5\u0026#34;] This gives the endpoint removal time to propagate before your app receives SIGTERM. The total shutdown sequence is then:\nKubernetes removes pod from endpoints preStop hook runs (sleep 5s — enough for endpoint propagation) SIGTERM is sent App drains in-flight requests and shuts down If still running after terminationGracePeriodSeconds: SIGKILL Set terminationGracePeriodSeconds to cover the sleep plus your app\u0026rsquo;s actual shutdown time:\nspec: terminationGracePeriodSeconds: 60 # 5s preStop + up to 55s for app shutdown Without the sleep: requests fail during the propagation window. With it: the window is covered.\nPiece 3: PodDisruptionBudgets (for node maintenance) Rolling updates handle normal deployments. Node drains (kubectl drain, cloud provider maintenance windows, k3s upgrades) are a different code path that bypasses your rolling update strategy entirely.\nWhen a node is drained, Kubernetes evicts all pods on it as fast as it can. Without constraints, it will evict all replicas of your deployment simultaneously if they all happen to land on the same node.\nA PodDisruptionBudget sets a floor:\napiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: myapp-pdb namespace: myapp spec: minAvailable: 1 # at least 1 replica must stay up during disruption selector: matchLabels: app: myapp Now node drain will evict pods one at a time, waiting for replacement pods to come up before evicting the next one. If no replacement can be scheduled (e.g., you\u0026rsquo;re draining the only node), the drain will block rather than cause downtime.\nminAvailable: 1 is the minimum. For production with 3+ replicas, minAvailable: 2 or maxUnavailable: 1 is more appropriate.\nPiece 4: minReadySeconds (the one everyone forgets) Even with a correct readiness probe, there\u0026rsquo;s a subtle risk: a pod that passes its readiness probe briefly and then fails due to a transient startup issue (flapping). Kubernetes would add it to the endpoint pool, route traffic to it, watch it fail the readiness probe, remove it — and during that window, some requests fail.\nminReadySeconds says: a pod must pass its readiness probe continuously for this many seconds before Kubernetes considers it \u0026ldquo;available\u0026rdquo; and allows the next pod in the rolling update to be terminated:\nspec: minReadySeconds: 10 This slows deployments slightly but catches flapping probes before they cause production traffic to hit an unstable pod.\nThe complete deployment snippet Putting it together:\napiVersion: apps/v1 kind: Deployment metadata: name: myapp namespace: myapp spec: replicas: 3 minReadySeconds: 10 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: terminationGracePeriodSeconds: 60 containers: - name: myapp image: myapp:latest lifecycle: preStop: exec: command: [\u0026#34;sleep\u0026#34;, \u0026#34;5\u0026#34;] readinessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 3 livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 15 periodSeconds: 10 failureThreshold: 5 And the PDB alongside it:\napiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: myapp-pdb namespace: myapp spec: minAvailable: 2 selector: matchLabels: app: myapp What interviewers are actually testing The follow-up is usually: \u0026ldquo;What if your new version has a bug that isn\u0026rsquo;t caught immediately — how do you roll back?\u0026rdquo;\nkubectl rollout undo deployment/myapp reverts to the previous ReplicaSet. Kubernetes stores the last few ReplicaSets by default (revisionHistoryLimit, default 10). The rollback uses the same rolling update mechanism, so it\u0026rsquo;s also zero-downtime.\nThe harder follow-up: \u0026ldquo;What if the bug only shows up after 10 minutes of load?\u0026rdquo; That\u0026rsquo;s where you need a canary deployment — send a small percentage of traffic to the new version, observe, then shift the rest. Argo Rollouts handles this natively. Without it, you\u0026rsquo;re doing it manually with two Deployments and weighted Services.\nThis is part of a series on Kubernetes interview questions. Previously: secrets in a GitOps repo. Next: network isolation between services.\n","permalink":"https://blog.hippotion.com/posts/k8s-zero-downtime/","summary":"Kubernetes rolling updates don\u0026rsquo;t give you zero-downtime for free. There are four separate things you have to get right, and most clusters get at least one wrong.","title":"⚡ Your Deployment Causes 30 Seconds of Downtime. What Went Wrong?"},{"content":"The question \u0026ldquo;How do you prevent configuration drift in a Kubernetes cluster?\u0026rdquo;\nConfiguration drift: the cluster\u0026rsquo;s actual state diverges from what\u0026rsquo;s declared in your source of truth. Someone runs kubectl edit deployment myapp to bump a memory limit during an incident. Someone adds a debug sidecar directly. Someone applies a YAML file from their laptop that was never committed to Git. The fix works. It goes undocumented. Six months later, a new deployment overwrites it. The incident recurs.\nThere are two distinct problems here that require different solutions:\nDetection and remediation: how do you notice drift and revert it? Prevention: how do you stop non-compliant resources from being created in the first place? Detection and remediation: Argo CD selfHeal If you\u0026rsquo;re using GitOps with Argo CD, detection and remediation are handled for you:\nsyncPolicy: automated: prune: true selfHeal: true selfHeal: true means Argo CD continuously compares the cluster state to the Git repo and reverts any divergence. Someone runs kubectl edit deployment myapp and changes the replica count? Argo CD detects the diff on its next reconciliation cycle (default: every 3 minutes) and reverts it.\nprune: true means resources that exist in the cluster but not in Git are deleted. Someone kubectl apply\u0026rsquo;d a debug pod directly? Gone on the next sync.\nThis is the audit trail story too. Every legitimate change is a Git commit with an author, a timestamp, and a commit message. Everything that isn\u0026rsquo;t in Git doesn\u0026rsquo;t survive past the next reconciliation. If you want to know what changed and when, git log is the answer.\nThe gap selfHeal doesn\u0026rsquo;t close selfHeal reverts drift after the fact. There\u0026rsquo;s a window — up to 3 minutes — where a drifted resource is serving traffic. For most changes, that\u0026rsquo;s fine. For a bad resource (wrong RBAC, missing network policy, container running as root), 3 minutes is enough to be a problem.\nThe other gap: selfHeal doesn\u0026rsquo;t tell you who made the change or generate an alert. It just silently fixes it. You need audit logging (kube-apiserver --audit-log-path) or an alerting rule on Argo CD\u0026rsquo;s health events to know that drift happened.\nPrevention: Kyverno Kyverno is a policy engine that runs as a Kubernetes admission webhook. Every resource creation or modification goes through it before being persisted. If the resource violates a policy, Kyverno can reject it outright (enforce mode) or allow it with a warning (audit mode).\nThe policies are Kubernetes resources themselves — they live in Git, they\u0026rsquo;re applied via GitOps, they\u0026rsquo;re versioned. No separate policy language to learn.\nA policy that requires readiness probes on all Deployments:\napiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: require-readiness-probe spec: validationFailureAction: Enforce rules: - name: check-readiness-probe match: any: - resources: kinds: - Deployment validate: message: \u0026#34;Deployments must define a readiness probe.\u0026#34; pattern: spec: template: spec: containers: - (name): \u0026#34;*\u0026#34; readinessProbe: (httpGet | tcpSocket | exec): \u0026#34;*\u0026#34; With this policy active: kubectl apply -f deployment-without-probe.yaml is rejected at the API server. The error message is the one you defined in message. The deployment never reaches etcd.\nA policy that blocks containers running as root:\napiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: disallow-root-containers spec: validationFailureAction: Enforce rules: - name: check-runAsNonRoot match: any: - resources: kinds: [Deployment, StatefulSet, DaemonSet] validate: message: \u0026#34;Containers must not run as root.\u0026#34; pattern: spec: template: spec: containers: - (name): \u0026#34;*\u0026#34; securityContext: runAsNonRoot: true A policy that enforces resource limits (common in multi-tenant clusters):\napiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: require-resource-limits spec: validationFailureAction: Enforce rules: - name: check-limits match: any: - resources: kinds: [Deployment] validate: message: \u0026#34;CPU and memory limits are required.\u0026#34; pattern: spec: template: spec: containers: - resources: limits: memory: \u0026#34;?*\u0026#34; cpu: \u0026#34;?*\u0026#34; Kyverno can also mutate and generate Policies aren\u0026rsquo;t only for validation. Kyverno can mutate incoming resources (add default labels, inject sidecars, set default resource requests) and generate new resources in response to events (create a NetworkPolicy whenever a new namespace is created).\nAuto-add a standard label to every Deployment:\napiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: add-labels spec: rules: - name: add-team-label match: any: - resources: kinds: [Deployment] mutate: patchStrategicMerge: metadata: labels: managed-by: kyverno Auto-create a default NetworkPolicy when a namespace is created:\napiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: add-default-networkpolicy spec: rules: - name: default-deny match: any: - resources: kinds: [Namespace] generate: kind: NetworkPolicy name: default-deny-all namespace: \u0026#34;{{request.object.metadata.name}}\u0026#34; data: spec: podSelector: {} policyTypes: - Ingress - Egress The complete drift prevention picture Developer runs: kubectl apply -f bad-deployment.yaml → API server receives request → Kyverno admission webhook intercepts → Policy check: no readiness probe → Rejected → API server returns 403 with Kyverno\u0026#39;s message → Resource never reaches etcd Developer runs: kubectl edit deployment myapp (valid change, just not via Git) → Edit succeeds (no policy violation) → Argo CD reconciliation fires (within 3 minutes) → Diff detected: cluster state ≠ Git state → selfHeal: revert to Git state → If audit logging enabled: event recorded with username and timestamp Git is the audit trail for what should be there. kube-apiserver audit logs are the trail for what was attempted. Kyverno is the enforcer at admission time. Argo CD is the continuous reconciler. Four layers, each with a different job.\nWhat interviewers are actually testing The follow-up is usually: \u0026ldquo;What\u0026rsquo;s the difference between Kyverno and OPA Gatekeeper?\u0026rdquo;\nBoth are admission webhook policy engines. The practical differences:\nKyverno: policies are k8s-native YAML, no separate language to learn. Generate and mutate policies built in. Easier to get started with. OPA Gatekeeper: policies are written in Rego, a purpose-built policy language that\u0026rsquo;s more expressive but has a steeper learning curve. Better if you\u0026rsquo;re already using OPA elsewhere (Terraform, microservice authorization). For a Kubernetes-only environment, Kyverno is the pragmatic choice. For a platform team that uses OPA across the stack, Gatekeeper gives you policy consistency.\nThe deeper follow-up: \u0026ldquo;How do you test policies before enforcing them?\u0026rdquo; Use Audit mode first (validationFailureAction: Audit). Violations are logged as PolicyReport objects but requests aren\u0026rsquo;t rejected. Review the reports, fix the existing violations, then switch to Enforce. Never flip directly to Enforce in production — you\u0026rsquo;ll break things that were already running.\nThis is part of a series on Kubernetes interview questions. Previously: network isolation between services.\n","permalink":"https://blog.hippotion.com/posts/k8s-config-drift/","summary":"Manual kubectl in production is the Kubernetes equivalent of SSH\u0026rsquo;ing into a server and editing files. It works until it doesn\u0026rsquo;t, and when it doesn\u0026rsquo;t, nobody knows why.","title":"🔄 Someone kubectl apply'd a Hotfix Directly. How Do You Detect and Prevent It?"},{"content":"The question \u0026ldquo;How do you enforce network isolation between services in a Kubernetes cluster?\u0026rdquo;\nThe default Kubernetes network model is flat. Every pod can reach every other pod, in any namespace, on any port. There are no firewalls, no ACLs, no segmentation. A compromised frontend pod can connect directly to your PostgreSQL port, your Redis port, your internal admin API, and every other service in the cluster.\nThis is intentional — Kubernetes doesn\u0026rsquo;t assume you want isolation, because not everyone does. But if you do want it, you need to add it.\nNetworkPolicy: the primitive A NetworkPolicy is a Kubernetes resource that selects a set of pods and defines what traffic is allowed to reach them (ingress) and what traffic they\u0026rsquo;re allowed to send (egress). Traffic that isn\u0026rsquo;t explicitly allowed is dropped.\nThe catch: NetworkPolicy resources have no effect unless your CNI plugin supports them. The default k3s CNI (Flannel) does not. Calico, Cilium, and Canal do. If you\u0026rsquo;re running Flannel and you apply a NetworkPolicy, it will be silently ignored — no error, no warning.\nThe default-deny pattern The correct starting point is a default-deny policy that blocks everything, applied to the namespace. You then add explicit allow policies for the traffic you actually need.\n# Block all ingress and egress in this namespace by default apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: myapp spec: podSelector: {} # matches all pods in the namespace policyTypes: - Ingress - Egress With this in place, your pods can\u0026rsquo;t receive traffic and can\u0026rsquo;t send traffic. You then add back what you need.\nAllowing specific traffic Allow the web frontend to receive traffic from the ingress controller:\napiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-ingress-from-traefik namespace: myapp spec: podSelector: matchLabels: app: frontend policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: sys-traefik Allow the backend to talk to PostgreSQL:\napiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-egress-to-postgres namespace: myapp spec: podSelector: matchLabels: app: backend policyTypes: - Egress egress: - to: - podSelector: matchLabels: app: postgres ports: - port: 5432 protocol: TCP After these two policies: the frontend receives traffic from Traefik, and the backend can reach Postgres. The frontend cannot reach Postgres. The backend cannot receive traffic from the ingress controller. Neither can call anything else.\nThe DNS gotcha Once you add a default-deny egress policy, DNS stops working. Your pods can no longer resolve service names because they can\u0026rsquo;t reach kube-dns in the kube-system namespace.\nYou need to explicitly allow it:\napiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-egress-dns namespace: myapp spec: podSelector: {} policyTypes: - Egress egress: - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system ports: - port: 53 protocol: UDP - port: 53 protocol: TCP Missing this is the most common reason \u0026ldquo;everything broke after I added NetworkPolicies\u0026rdquo;. Add it to every namespace that has a default-deny policy.\nCilium: the same model with more power Cilium implements the standard NetworkPolicy API and adds its own CiliumNetworkPolicy CRD with L7 capabilities.\nStandard NetworkPolicy works at L3/L4 — IP addresses and ports. Cilium\u0026rsquo;s CRD adds:\nL7 HTTP filtering: allow specific HTTP methods and paths, not just port 8080.\napiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: allow-api-reads namespace: myapp spec: endpointSelector: matchLabels: app: api ingress: - fromEndpoints: - matchLabels: app: frontend toPorts: - ports: - port: \u0026#34;8080\u0026#34; protocol: TCP rules: http: - method: \u0026#34;GET\u0026#34; path: \u0026#34;/api/v1/.*\u0026#34; DNS-based egress: allow egress to github.com by hostname rather than IP address. This matters for external services with dynamic IPs.\negress: - toFQDNs: - matchName: \u0026#34;github.com\u0026#34; toPorts: - ports: - port: \u0026#34;443\u0026#34; protocol: TCP Identity-based policies: Cilium assigns a cryptographic identity to each pod based on its labels. Policies are enforced by identity, not IP address. Pod restarts (which change IPs) don\u0026rsquo;t break policy enforcement.\nWhat a real namespace policy set looks like For a typical web app with frontend, backend, and database:\nNamespace: myapp ├── default-deny-all (ingress + egress, all pods) ├── allow-egress-dns (egress, all pods, port 53) ├── allow-ingress-frontend (ingress frontend, from sys-traefik namespace) ├── allow-egress-frontend-to-backend (egress frontend, to backend:8080) ├── allow-ingress-backend (ingress backend, from frontend) ├── allow-egress-backend-to-postgres (egress backend, to postgres:5432) └── allow-ingress-postgres (ingress postgres, from backend) Eight policies. The database has exactly one inbound path: from the backend. The frontend has no path to the database at all. A compromised frontend pod cannot scan the internal network — egress to arbitrary destinations is blocked.\nWhat interviewers are actually testing The follow-up is usually: \u0026ldquo;How do you manage this at scale? Writing NetworkPolicies for every namespace by hand doesn\u0026rsquo;t scale.\u0026rdquo;\nThe answer: you don\u0026rsquo;t write them by hand. You template them. In a GitOps setup, your namespace configuration declares what network access the service needs in a structured form, and a Helm chart or operator generates the actual NetworkPolicy resources from those declarations.\nFor example, an applications.yml entry might look like:\nnetworkPolicies: denyAll: true allowIngressFromIngress: true allowEgressToNamespaces: [\u0026#34;sys-postgres\u0026#34;] And a Helm chart translates that into four concrete NetworkPolicy objects. The developer declares intent; the platform enforces it. No one writes raw YAML for each namespace.\nThe second follow-up: \u0026ldquo;What about east-west traffic between services in the same namespace?\u0026rdquo; Add allowIntraNamespace: true as a flag that generates a policy allowing all pod-to-pod traffic within the namespace, while still blocking cross-namespace traffic.\nThis is part of a series on Kubernetes interview questions. Previously: zero-downtime deployments. Next: preventing configuration drift.\n","permalink":"https://blog.hippotion.com/posts/k8s-network-isolation/","summary":"Default Kubernetes is a flat network. Every pod can reach every other pod. In a cluster with ten services, that\u0026rsquo;s ten potential blast radiuses instead of one.","title":"🛡️ How Do You Prevent a Compromised Pod From Calling Your Database?"},{"content":"The question \u0026ldquo;How would you design a CI/CD pipeline that deploys to Kubernetes without storing any cluster credentials anywhere?\u0026rdquo;\nThe expected wrong answer: export your kubeconfig, base64-encode it, paste it into a CI secret named KUBE_CONFIG, and call it a day. This works. Most clusters that got hacked had this setup.\nThere are two correct answers in 2026, and which one you reach for depends on what you\u0026rsquo;re actually deploying.\nAnswer 1: GitOps (the one your interviewer probably wants) In a GitOps setup, your CI pipeline never touches the cluster. It can\u0026rsquo;t leak credentials it doesn\u0026rsquo;t have.\nThe flow:\nDeveloper pushes code → CI builds and tests → CI updates the image tag in the Git repo (a commit, not a kubectl command) → Argo CD detects the change → Argo CD applies it to the cluster The cluster reaches out to Git. CI never reaches into the cluster. The only thing with cluster credentials is Argo CD itself — running inside the cluster, with no credentials to leak externally.\nFor self-hosted setups on Hetzner or Vultr, this is particularly clean because there\u0026rsquo;s no cloud IAM to configure. You point Argo CD at your GitLab repo, tell it which branch to watch, and you\u0026rsquo;re done.\n# The Argo CD Application CRD — the only thing you need apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: myapp namespace: argocd spec: source: repoURL: https://gitlab.example.com/myorg/myapp targetRevision: main path: helm-charts/myapp destination: server: https://kubernetes.default.svc namespace: myapp syncPolicy: automated: prune: true selfHeal: true selfHeal: true means if someone manually kubectl applys something, Argo CD reverts it. The Git repo is the only source of truth.\nThe CI image-tag update step looks like this:\n# .gitlab-ci.yml deploy: stage: deploy script: - | # Update the image tag in values.yaml and push sed -i \u0026#34;s/tag: .*/tag: ${CI_COMMIT_SHORT_SHA}/\u0026#34; values/myapp.yml git config user.email \u0026#34;ci@example.com\u0026#34; git config user.name \u0026#34;CI\u0026#34; git add values/myapp.yml git commit -m \u0026#34;chore: bump myapp to ${CI_COMMIT_SHORT_SHA}\u0026#34; git push CI needs write access to the Git repo — but that\u0026rsquo;s a deploy key, not a cluster credential. If it leaks, someone can push code. You\u0026rsquo;d rotate the deploy key and audit the commits. If a cluster credential leaks, someone owns your cluster.\nAnswer 2: OIDC federation (for when you genuinely need push-based) Some operations don\u0026rsquo;t fit the GitOps model. Infrastructure provisioning (terraform apply), one-off database migrations, or initial cluster bootstrapping — these need direct cluster access. The correct pattern here is OIDC federation.\nThe idea: your CI platform (GitLab, GitHub Actions) already issues JWT tokens to every job. These JWTs are signed by the CI platform and contain claims like which repo, which branch, which pipeline triggered the job. You configure your Kubernetes API server to trust those JWTs, and the CI job authenticates directly using the token it already has.\nNo stored credentials. Every job gets a fresh token. The token expires when the job ends.\nFor a self-hosted GitLab, configure your k8s API server to trust GitLab as an OIDC issuer:\n# /etc/rancher/k3s/config.yaml (or kube-apiserver flags) kube-apiserver-arg: - \u0026#34;oidc-issuer-url=https://gitlab.example.com\u0026#34; - \u0026#34;oidc-client-id=your_client_id\u0026#34; - \u0026#34;oidc-username-claim=sub\u0026#34; - \u0026#34;oidc-groups-claim=groups_direct\u0026#34; Then create a ClusterRoleBinding that maps a specific GitLab identity to a Kubernetes role:\napiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: gitlab-ci-deployer subjects: - kind: User name: \u0026#34;project_path:myorg/myapp:ref_type:branch:ref:main\u0026#34; apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: deploy-role apiGroup: rbac.authorization.k8s.io The subject name is the sub claim from the GitLab JWT — it encodes the repo path and branch. Only jobs running on main in myorg/myapp get this binding. A job on a feature branch gets nothing.\nIn the CI job:\ndeploy: stage: deploy id_tokens: K8S_TOKEN: aud: your_client_id script: - | kubectl config set-credentials gitlab-ci \\ --token=\u0026#34;${K8S_TOKEN}\u0026#34; kubectl config set-context deploy \\ --cluster=mycluster \\ --user=gitlab-ci kubectl config use-context deploy kubectl rollout restart deployment/myapp -n myapp The token in K8S_TOKEN is injected by GitLab. It expires with the job. The API server validates the signature against GitLab\u0026rsquo;s JWKS endpoint on every request.\nWhich one to use GitOps OIDC federation CI needs cluster access No Yes (short-lived token) Audit trail Git history kube-apiserver audit log Revocability Revert the commit Token expires with the job Self-hosted setup effort Low Moderate (OIDC config) Works for infra provisioning Not really Yes For application deployments: GitOps. The cluster reconciles continuously, drift is impossible, and CI is completely decoupled from cluster state.\nFor infrastructure provisioning or one-off operations: OIDC federation. Short-lived credentials, branch-scoped permissions, nothing to rotate.\nWhat you should never do: store a kubeconfig or a long-lived ServiceAccount token in CI secrets. Not because it\u0026rsquo;s hard to make work — it\u0026rsquo;s easy — but because the blast radius of a leak is unbounded, there\u0026rsquo;s no audit trail, and there\u0026rsquo;s no expiry. Everything that goes wrong with static secrets goes wrong eventually.\nThis is part of a series on Kubernetes interview questions. Next: how to handle secrets in a GitOps repository.\n","permalink":"https://blog.hippotion.com/posts/k8s-cicd-no-credentials/","summary":"A common interview question in 2026. If your answer is \u0026lsquo;kubeconfig in a CI secret\u0026rsquo;, you\u0026rsquo;re not wrong — but you\u0026rsquo;re also not getting the job.","title":"🔑 Deploy to Kubernetes Without Storing Any Cluster Credentials in CI"},{"content":"The question \u0026ldquo;You\u0026rsquo;re using GitOps — everything goes through Git. How do you handle secrets?\u0026rdquo;\nThe wrong answer: base64-encode them and commit them as Kubernetes Secret objects. Base64 is not encryption. Anyone with read access to the repo has your secrets. If the repo is public, everyone does.\nThe slightly better wrong answer: use a private repo and just not think about it. This works until a deploy key leaks, someone joins and then leaves the company, or you need to rotate one secret and have to find every place it\u0026rsquo;s referenced.\nThere are three real answers. They make different tradeoffs.\nThe constraint The constraint is actually tighter than \u0026ldquo;don\u0026rsquo;t commit secrets\u0026rdquo;. It\u0026rsquo;s: your Git repo should be safe to make public at any point, and secrets must be rotatable without touching Git.\nIf rotating a password requires a new commit, someone has to be awake to merge and deploy it. That\u0026rsquo;s not how you want to handle a 3am incident.\nOption 1: External Secrets Operator + Vault This is the most robust pattern and the one worth knowing for interviews.\nThe idea: secrets live in a dedicated secret store (HashiCorp Vault, or a cloud equivalent). A Kubernetes operator called ESO watches ExternalSecret CRD objects in the cluster and syncs the referenced secret into a real Kubernetes Secret. The CRD is safe to commit — it says where the secret lives, not what it is.\n# This lives in Git — safe to commit apiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: name: myapp-db-credentials namespace: myapp spec: refreshInterval: 1h secretStoreRef: name: vault kind: ClusterSecretStore target: name: myapp-db-credentials # the k8s Secret it creates data: - secretKey: DB_PASSWORD remoteRef: key: secret/myapp property: db-password Rotation: you update the secret in Vault. ESO syncs it to the cluster within refreshInterval. No Git commit, no deployment. The pod reads the updated Secret on the next restart (or immediately if you mount it as an env var and the app handles SIGHUP).\nAudit trail: Vault logs every read and write. You know exactly which service account read which secret at what time.\nThe cost: you\u0026rsquo;re running Vault. For a homelab or small team, that\u0026rsquo;s an extra thing to operate. For production, it\u0026rsquo;s worth it.\nSelf-hosted setup:\n# ClusterSecretStore — connects ESO to your Vault instance apiVersion: external-secrets.io/v1beta1 kind: ClusterSecretStore metadata: name: vault spec: provider: vault: server: \u0026#34;http://sys-vault.sys-vault.svc.cluster.local:8200\u0026#34; path: \u0026#34;secret\u0026#34; version: \u0026#34;v2\u0026#34; auth: kubernetes: mountPath: \u0026#34;kubernetes\u0026#34; role: \u0026#34;external-secrets\u0026#34; ESO authenticates to Vault using the pod\u0026rsquo;s Kubernetes ServiceAccount token. Vault validates it against the cluster\u0026rsquo;s token review endpoint. No static credentials anywhere.\nOption 2: Sealed Secrets Sealed Secrets uses asymmetric encryption. The cluster holds a private key. You use the kubeseal CLI to encrypt a secret with the cluster\u0026rsquo;s public key. The resulting SealedSecret object is safe to commit — only the cluster can decrypt it.\n# Encrypt a secret for committing to Git kubectl create secret generic myapp-db \\ --from-literal=DB_PASSWORD=hunter2 \\ --dry-run=client -o yaml \\ | kubeseal \\ \u0026gt; sealed-secrets/myapp-db.yaml The resulting YAML looks like:\napiVersion: bitnami.com/v1alpha1 kind: SealedSecret metadata: name: myapp-db namespace: myapp spec: encryptedData: DB_PASSWORD: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEq... This gets committed. The Sealed Secrets controller in the cluster decrypts it and creates the real Secret automatically.\nThe tradeoff: rotation means re-sealing. You need the cluster\u0026rsquo;s public key (which is public) and access to the plaintext secret. You commit a new SealedSecret. That\u0026rsquo;s a Git commit, which means a review, a merge, and a deploy. For a 3am incident, that\u0026rsquo;s a lot of friction.\nAlso: if the cluster\u0026rsquo;s private key is lost, you can\u0026rsquo;t decrypt any of your sealed secrets. Back up the private key.\nGood fit for: small teams, homelab, situations where secrets change rarely and the GitOps review process is actually desirable.\nOption 3: SOPS SOPS (Secrets OPerationS) encrypts files at rest using age keys or cloud KMS. You commit encrypted files. CI decrypts them during deployment using a key it holds in memory (not stored in Git).\n# Encrypt a file for Git sops --encrypt --age age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8q \\ secrets/myapp.yaml \u0026gt; secrets/myapp.enc.yaml # In CI: decrypt to temp file, apply, delete sops --decrypt secrets/myapp.enc.yaml | kubectl apply -f - The difference from Sealed Secrets: SOPS encrypts at the file level, not the k8s object level. You can use it outside of Kubernetes (application configs, Terraform variables). The key can live in the CI environment, a cloud KMS, or a personal age key.\nThe tradeoff: CI needs the decryption key, which puts you back in \u0026ldquo;secret in CI\u0026rdquo; territory — just for the encryption key rather than the actual secrets. If you use a cloud KMS, OIDC federation handles that (no stored key). If you use an age key, it lives in CI secrets.\nGood fit for: teams already using Helm and Helm Secrets, polyglot environments where not everything is Kubernetes, small teams where Vault feels like overengineering.\nComparison ESO + Vault Sealed Secrets SOPS Rotation without Git commit Yes No Depends Audit trail Full (Vault) None Depends on KMS Complexity High Low Medium Works outside k8s With effort No Yes Recovery if key lost Vault backup Lose all secrets Key backup CI needs secret material No No Yes (decrypt key) What interviewers are actually testing The interesting follow-up question is: \u0026ldquo;How do you rotate a secret without downtime?\u0026rdquo;\nThe answer requires you to understand that pods mount Secret objects at startup. Updating the Secret in Kubernetes doesn\u0026rsquo;t automatically restart the pod. Your options are:\nMount the secret as a volume and have the app watch for file changes (good) Restart the deployment after rotation (kubectl rollout restart, automatable) Use a sidecar like Vault Agent Injector that handles refresh in-process (complex but zero-restart) The correct answer depends on the app. An API key that can be rotated gradually is different from a database password where the old one is invalidated immediately.\nThis is part of a series on Kubernetes interview questions. Previously: deploying without cluster credentials. Next: zero-downtime deployments.\n","permalink":"https://blog.hippotion.com/posts/k8s-gitops-secrets/","summary":"GitOps says Git is the source of truth. Secrets say don\u0026rsquo;t put them in Git. These two things appear to be in direct conflict. They\u0026rsquo;re not.","title":"🤫 How Do You Handle Secrets in a GitOps Repository?"},{"content":"The three options that didn\u0026rsquo;t work When I started the homelab I looked at the standard ways to make self-hosted services accessible and found the usual compromises:\nOpen ports on the router. Point a DNS record at your public IP, forward 443 to your server. Simple. Also: your home IP is now publicly associated with your services, your ISP can see your traffic, and a misconfigured app means an open door. Hard pass.\nVPN for everything. WireGuard on the router, every device gets a tunnel. Secure, but every phone and laptop needs to be configured, split tunnelling adds complexity, and \u0026ldquo;let me pull up the recipe\u0026rdquo; becomes a two-tap operation that my family won\u0026rsquo;t do. And I still want public access for some services.\nCloudflare Tunnel, but broken HTTPS locally. This is the one I started with. The cloudflared pod dials out to Cloudflare, no open ports, external access works great. But locally, *.hippotion.com resolves to Cloudflare\u0026rsquo;s anycast IP, traffic leaves the house, Cloudflare terminates TLS, traffic comes back in through the tunnel. Every local request makes a round trip to a Cloudflare edge node. Worse: browsers cache HSTS for hippotion.com, so http:// URLs on the local network silently upgrade to https://, which fails because there\u0026rsquo;s no local certificate. Intermittent, confusing, and hard to explain to anyone else on the network.\nWhat I wanted: the tunnel for external access, direct-to-server for local access, real TLS in both cases, and one configuration per application.\nThe insight: Pi-hole already controls local DNS My network already runs Pi-hole for ad blocking. Pi-hole uses dnsmasq under the hood and can resolve any hostname to any IP you want. One config line in Pi-hole\u0026rsquo;s values:\ndnsmasq: customDnsEntries: - address=/hippotion.com/192.168.0.109 The address= directive is a wildcard. Every device on the LAN that uses Pi-hole for DNS — which is all of them, because the router hands out Pi-hole\u0026rsquo;s IP via DHCP — will now resolve anything.hippotion.com to 192.168.0.109, the server\u0026rsquo;s LAN address. External traffic still goes to Cloudflare\u0026rsquo;s IP because it uses the public authoritative DNS. The split is automatic; no per-device configuration.\nLocal browser → Pi-hole → server\u0026rsquo;s LAN IP directly.\nThat solves routing. Now TLS.\nWhy HTTP-01 won\u0026rsquo;t work here, and why DNS-01 will The standard way to get a Let\u0026rsquo;s Encrypt certificate is the HTTP-01 challenge: Let\u0026rsquo;s Encrypt sends a request to http://yourdomain.com/.well-known/acme-challenge/\u0026lt;token\u0026gt; and your server responds. This requires Let\u0026rsquo;s Encrypt\u0026rsquo;s servers to reach your server over the public internet.\nThat doesn\u0026rsquo;t work here. There are no open ports. Let\u0026rsquo;s Encrypt can\u0026rsquo;t reach the server. HTTP-01 is out.\nDNS-01 is different. Instead of proving you control a server, you prove you control the DNS zone by creating a temporary TXT record at _acme-challenge.yourdomain.com. Let\u0026rsquo;s Encrypt checks DNS, finds the record, issues the cert. No inbound connection required — just API access to your DNS provider.\nhippotion.com is on Cloudflare. cert-manager has a Cloudflare DNS solver that calls the Cloudflare API to create and delete the TXT record automatically. The certificate request flow:\ncert-manager creates an ACME Order with Let\u0026rsquo;s Encrypt cert-manager calls the Cloudflare API: add _acme-challenge.hippotion.com TXT \u0026lt;token\u0026gt; Let\u0026rsquo;s Encrypt queries DNS, finds the record, issues the cert cert-manager deletes the TXT record, writes the cert to a Kubernetes Secret cert-manager renews automatically ~30 days before expiry The Cloudflare API token needs Zone:DNS:Edit permission for hippotion.com. It lives in Vault and syncs to the sys-cert-manager namespace via External Secrets Operator — same pattern as every other secret in the cluster, nothing in Git.\napiVersion: cert-manager.io/v1 kind: Certificate metadata: name: hippotion-wildcard namespace: sys-traefik spec: secretName: hippotion-wildcard-tls issuerRef: name: letsencrypt-cloudflare kind: ClusterIssuer dnsNames: - \u0026#34;hippotion.com\u0026#34; - \u0026#34;*.hippotion.com\u0026#34; One certificate. Every subdomain. cert-manager stores it as hippotion-wildcard-tls in the sys-traefik namespace, where Traefik can read it.\nTwo Traefik entrypoints Traefik has three entrypoints configured:\nEntrypoint Port Purpose web 80 Redirects all traffic to websecure websecure 443 Local HTTPS, serves the wildcard cert cloudflare 7080 Receives plain HTTP from the cloudflared pod The cloudflare entrypoint is the key piece. Cloudflare Tunnel terminates TLS at Cloudflare\u0026rsquo;s edge and forwards plain HTTP to the cluster. If that plain HTTP landed on web (port 80), it would get redirected to websecure (port 443), which would fail because cloudflared isn\u0026rsquo;t sending HTTPS. A separate entrypoint on a separate port handles tunnel traffic without redirection.\nTraefik is configured to use the wildcard cert as its default:\ntlsStore: default: defaultCertificate: secretName: hippotion-wildcard-tls Any websecure request that doesn\u0026rsquo;t match a more specific TLS configuration gets the wildcard cert. No per-app certificate configuration.\nOne IngressRoute handles both paths Every application gets a single IngressRoute with both entrypoints:\napiVersion: traefik.io/v1alpha1 kind: IngressRoute metadata: name: myapp namespace: myapp spec: entryPoints: - cloudflare # plain HTTP from cloudflared - websecure # local HTTPS with wildcard cert routes: - match: Host(`myapp.hippotion.com`) kind: Rule middlewares: - name: oauth-auth namespace: sys-oauth2-gitlab services: - name: myapp port: 8080 That\u0026rsquo;s it. The same hostname, the same routing rule, the same middleware — served correctly on both paths. No conditional logic, no separate ingress for local vs external.\nThe OAuth middleware (oauth-auth) works on both paths too. Local browsers get redirected to GitLab for authentication the same way external browsers do. The SSO cookie is set on hippotion.com, so it works across all subdomains regardless of which path the traffic came through.\nWhat the two traffic paths look like end to end External browser (anywhere): Browser → Cloudflare DNS (hippotion.com → Cloudflare anycast IP) → Cloudflare edge (TLS terminated, certificate managed by Cloudflare) → cloudflared pod in cluster (plain HTTP) → Traefik :7080 (cloudflare entrypoint) → app pod Local browser (home WiFi): Browser → Pi-hole DNS (*.hippotion.com → 192.168.0.109) → Traefik :443 (websecure entrypoint) → Traefik serves hippotion-wildcard-tls (Let\u0026#39;s Encrypt cert, trusted by browser) → app pod Both paths hit the same Traefik IngressRoute rule. The app sees an HTTP request either way. TLS is handled at the edge — Cloudflare for external traffic, Traefik for local traffic.\nThe HSTS detail Cloudflare likely has HSTS enabled for your domain. Browsers cache this: once they see an HSTS header for hippotion.com, they\u0026rsquo;ll refuse to load any http:// URL under that domain for the duration of the max-age. They silently upgrade to https:// and fail if there\u0026rsquo;s no cert.\nThis is why the original setup — tunnel only, no local cert — felt unreliable locally. The browser was doing the right thing (enforcing HTTPS) but the cert didn\u0026rsquo;t exist. The wildcard cert fixes this because HTTPS now actually works locally. The HSTS enforcement is fine once TLS is real.\nWhat it\u0026rsquo;s like to operate Adding a new service means writing one IngressRoute with two entrypoints and pushing to Git. No DNS records to create (cloudflared picks up hostnames from a config list), no certificates to request (the wildcard covers everything), no VPN profiles to distribute. The platform handles it.\nLocal access works when the internet is down. The Pi-hole DNS and the wildcard cert are entirely on-premises — as long as the server is up, the services are reachable, Cloudflare outage or not. I noticed this during a brief Cloudflare incident a few months ago: external access went down, everything inside the house kept working without interruption.\nI\u0026rsquo;m not a networking expert. I just followed the constraint — no open ports, no VPN — and the DNS-01 + split DNS solution fell out naturally. It turned out to be simpler to configure than the alternatives, and cleaner to operate.\n","permalink":"https://blog.hippotion.com/posts/homelab-dual-path-tls/","summary":"No open ports. Real TLS at home. One IngressRoute per app. This is the networking setup I landed on after ruling out everything that required a compromise.","title":"🔐 Same Hostname, Two Traffic Paths: Local HTTPS Without a VPN"},{"content":"Why this exists I\u0026rsquo;ve been working in DevOps and platform engineering long enough to know what I don\u0026rsquo;t know. The patterns that separate robust infrastructure from \u0026ldquo;it works on my machine\u0026rdquo; infrastructure — GitOps, admission policies, network segmentation, secrets management — are easy to read about. They\u0026rsquo;re harder to actually internalise without running them yourself.\nSo I built a homelab. An old ThinkCentre I had sitting around, k3s, and a rule I set for myself before writing a single line of configuration: GitLab is the only source of truth. No manual kubectl after bootstrap. All changes go through git push.\nThat rule turned out to be more consequential than I expected.\nThe stack The cluster runs about thirty services across two categories: infrastructure that makes the platform work, and applications that actually do things.\nInfrastructure:\nk3s — lightweight Kubernetes, single-node Cilium — CNI with NetworkPolicy support (Flannel, k3s\u0026rsquo;s default, silently ignores NetworkPolicies) Argo CD — GitOps reconciler, watches the repo, applies changes Traefik — ingress controller, two entrypoints Cloudflare tunnel — external access without open ports cert-manager — wildcard TLS cert via Let\u0026rsquo;s Encrypt DNS-01 oauth2-proxy — GitLab SSO protecting everything by default Vault + External Secrets Operator — secrets management Pi-hole — local DNS for *.hippotion.com Applications: a media server (Jellyfin, *arr stack), Immich for photos, Vaultwarden for passwords, Home Assistant, n8n for automation, a Hugo blog, Obsidian via browser-based KasmVNC, and a few custom-built things I\u0026rsquo;ll get to below.\nTraffic reaches the cluster in two ways External traffic (from anywhere on the internet) goes through a Cloudflare tunnel. The cloudflared pod dials out to Cloudflare — no open ports on the server, no firewall rules, no exposed IP. Cloudflare terminates TLS and forwards plain HTTP to Traefik on port 7080. Cloudflare handles the certificate for external visitors.\nLocal traffic (home WiFi) goes through Pi-hole, which resolves *.hippotion.com to the server\u0026rsquo;s LAN IP. Traefik receives HTTPS on port 443, served with a wildcard certificate that cert-manager issues from Let\u0026rsquo;s Encrypt via DNS-01 challenge. Port 80 redirects to 443; the cloudflare entrypoint on 7080 does not redirect, because it\u0026rsquo;s already receiving plain HTTP from cloudflared.\nThe result: the same IngressRoute handles both paths.\nspec: entryPoints: - cloudflare # plain HTTP from the cloudflared pod - websecure # local HTTPS with wildcard cert routes: - match: Host(`myapp.hippotion.com`) kind: Rule middlewares: - name: oauth-auth namespace: sys-oauth2-gitlab services: - name: myapp port: 8080 Every IngressRoute has both entrypoints. If you forget one, the service is unreachable from half your access paths. Learned that the first time I added an app and couldn\u0026rsquo;t reach it from the phone.\nOne file generates everything The centrepiece of the setup is applications.yml — a single file that is the complete list of everything running in the cluster. Every entry generates a Namespace, an Argo CD AppProject, an Application, NetworkPolicies, and RBAC. Nothing is created anywhere else.\nAn entry looks like this:\n- namespace: web-vaultwarden networkPolicies: profile: web-app applications: - applicationCode: web-vaultwarden path: helm-charts/extra-objects autoSync: true Six lines. That deploys a namespace, an Argo CD app that watches helm-charts/extra-objects/values-web-vaultwarden.yml, a full set of Cilium NetworkPolicies based on the web-app profile (deny-all with ingress from Traefik and egress to external), and a ServiceAccount. Adding a new service to the cluster is this file plus a values file with the actual Kubernetes manifests.\nThe profile: web-app notation deserves a word. Raw NetworkPolicy YAML is repetitive and error-prone — every namespace needs a deny-all base plus specific allows. I template it. A Helm chart maps profile names to concrete policy sets. web-app means: deny all ingress except from the ingress namespace, deny all egress except DNS and external HTTPS. web-app-internal means the same but no external egress — suitable for services that only talk to other in-cluster services. media-server adds port 6881 for BitTorrent. The policies are generated; no one writes them by hand.\nSecrets without storing them in Git Kubernetes Secret objects are not secrets. They\u0026rsquo;re base64-encoded blobs in etcd, and base64 is not encryption. Committing them to a Git repo — even a private one — is the wrong answer.\nThe setup here uses HashiCorp Vault as the actual secret store, with External Secrets Operator syncing Vault paths to Kubernetes Secrets. What lives in Git is an ExternalSecret CRD:\napiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: name: myapp-credentials namespace: myapp spec: secretStoreRef: name: vault kind: ClusterSecretStore target: name: myapp-credentials data: - secretKey: DB_PASSWORD remoteRef: key: secret/myapp property: db-password This is safe to commit. It says where the secret lives, not what it is. Vault contains the actual value. ESO syncs it to the cluster and refreshes every hour. Rotation means updating the value in Vault — no Git commit, no deployment.\nVault runs in-cluster with a sidecar that auto-unseals on restart. Not production-grade (the unseal key is on the same PVC as Vault itself), but pragmatic for a homelab where availability matters more than a sophisticated key management ceremony.\nThree things I built that were worth building Local AI inference The cluster runs a local LLM. The web-ai-engine namespace has Open WebUI fronting a llama-server serving Phi-3.5 Mini in GGUF format. The model file lives on the node\u0026rsquo;s filesystem, mounted as a hostPath volume.\nweb-openclaw is a personal AI assistant UI that can route requests to either external providers (via NVIDIA\u0026rsquo;s API) or the local llama-server, depending on the task. The local model handles things that don\u0026rsquo;t need to leave the house; the external API handles things that do. The network policy for web-openclaw explicitly allows egress to web-ai-engine and nowhere else for local inference.\nRunning a 3.8B parameter model on homelab hardware is genuinely useful and costs nothing per query. It\u0026rsquo;s not GPT-4, but for summarisation, first drafts, and things you don\u0026rsquo;t want sending to a third-party API, it\u0026rsquo;s more than good enough.\nBrew Buddy I make kombucha. I was tracking fermentation batches in a notes app and getting annoyed at not being able to see history across batches. So I built a tracker.\nBrew Buddy is a React frontend and a Go API backed by PostgreSQL, all running in the web-brew-buddy namespace. The images are built locally and imported into the cluster\u0026rsquo;s container runtime with k3s ctr images import. It\u0026rsquo;s deployed like any other app — a values file, an entry in applications.yml, a Vault secret for the database password.\nThe point isn\u0026rsquo;t the app. The point is that the platform handles a custom hobby project with the same operational properties as Vaultwarden or Immich. Same GitOps workflow, same secret management, same network isolation, same TLS termination. Adding an app to this cluster takes an afternoon of writing manifests and a few seconds of git push. The platform work was done once.\nQR device login This one has its own post because it took three days and four complete rewrites of oauth2-proxy\u0026rsquo;s session format to get right.\nThe short version: the Homer dashboard on the living room TV needed a way to log in without typing credentials on a TV keyboard. I built a device-flow OAuth service — phone scans QR, phone authenticates with GitLab, TV session is created. End session from the phone kills the TV\u0026rsquo;s session immediately by deleting the oauth2-proxy Redis ticket.\nIt\u0026rsquo;s the most overengineered solution to a problem I have, and I don\u0026rsquo;t regret a minute of it.\nWhat operating this way actually changes The practical difference of the no-manual-kubectl rule is larger than it sounds.\nThe audit trail is automatic. Every change to the cluster is a git commit with an author, a timestamp, and a diff. There\u0026rsquo;s no \u0026ldquo;what did I change last Tuesday?\u0026rdquo; — I know exactly what changed last Tuesday, and I can revert it with git revert. The Argo CD UI shows the diff between what\u0026rsquo;s in Git and what\u0026rsquo;s running. If there\u0026rsquo;s a diff, something went wrong.\nNew services are cheap to add. The platform does the repetitive work — namespace, RBAC, network policies, TLS termination, OAuth protection. Adding a new app is writing the manifests and updating applications.yml. The infrastructure concerns are handled.\nRecovery is straightforward. If I rebuild the node (which I\u0026rsquo;ve done), I run two bootstrap scripts, apply one Argo CD manifest, and the cluster reconciles itself from Git over the next few minutes. The only things that require manual work are the secrets that can\u0026rsquo;t live in Git — two OAuth credentials and the Cloudflare tunnel token, all recreated by scripts/create-secrets.sh.\nExperimentation is safe. I run things on toggleable: true apps that I\u0026rsquo;m not sure I\u0026rsquo;ll keep. Turning them off is removing the entry from applications.yml and pushing. Turning them back on is adding it back.\nWhat it doesn\u0026rsquo;t solve Bootstrap is manual. The first kubectl apply -f argocd/root-app.yaml happens outside of GitOps by definition. The three bootstrap secrets can\u0026rsquo;t be in Git. This is unavoidable — you need to trust something before GitOps can take over, and that something is a short manual procedure.\nSome things fight the model. k3s\u0026rsquo;s built-in addon controller rewrites the metrics-server Deployment on every k3s restart, removing a patch needed for Cilium compatibility. The fix is a pod that watches for the revert and reapplies the patch. It works, but it\u0026rsquo;s a workaround for a component I don\u0026rsquo;t control.\nSingle-node means single point of failure. For a homelab, that\u0026rsquo;s acceptable. For anything important, it\u0026rsquo;s not.\nThe honest summary I set out to learn production-grade Kubernetes patterns, and I did. The GitOps constraint turned out to be the best engineering decision in the project — not because it made things easier in the short term (it didn\u0026rsquo;t), but because it forced every change through a path that is auditable, reversible, and consistent.\nThe cluster is a single ThinkCentre running about thirty services, secured by Cilium network policies, authenticated via GitLab SSO, with secrets managed by Vault and all configuration in a Git repo that I could hand to someone tomorrow and they\u0026rsquo;d understand what\u0026rsquo;s running and why.\nThat\u0026rsquo;s the goal. For a homelab, I\u0026rsquo;ll call it achieved.\n","permalink":"https://blog.hippotion.com/posts/homelab-gitops/","summary":"I wanted to learn production-grade Kubernetes patterns without breaking production. One node, a full GitOps stack, and a hard rule: no manual kubectl after bootstrap.","title":"🏗️ My Homelab Runs on GitOps. Here's What That Actually Means."},{"content":"The problem My homelab runs a single-node k3s cluster with a full GitOps stack — Argo CD, Traefik, oauth2-proxy for GitLab SSO, the usual over-engineered personal project. One thing that always bothered me: when I want to show the Homer dashboard on the living room TV, I have to type my credentials on a keyboard that wasn\u0026rsquo;t designed for the living room.\nThe obvious fix is a QR code. Phone scans it, phone authenticates, TV unlocks. Conceptually simple. In practice, a two-day debugging adventure that took me deep into oauth2-proxy\u0026rsquo;s source code.\nThe design The flow I wanted:\nTV opens qr.hippotion.com, shows a QR code and polls for completion Phone scans, opens the device URL, taps \u0026ldquo;Continue with GitLab\u0026rdquo; Phone completes GitLab OAuth Server marks the session as ready TV\u0026rsquo;s poll fires, gets redirected to Homer Later: phone taps \u0026ldquo;End Session\u0026rdquo;, TV locks immediately This is the OAuth 2.0 Device Authorization Grant pattern adapted for a single trusted user. I wrote it in Go with Redis for session storage. The service generates a device token, stores it with a 5-minute TTL, and uses it as the OAuth state parameter. The phone completes GitLab OAuth and the callback handler links the resulting session to the device token. The TV\u0026rsquo;s poll loop picks it up and redirects.\nThat part was straightforward. The hard part was making the TV\u0026rsquo;s session work for all protected apps on the domain, not just the QR page.\nThe oauth2-proxy problem My homelab uses oauth2-proxy as a ForwardAuth backend for Traefik. Every protected app (home.hippotion.com, argo.hippotion.com, grafana.hippotion.com, etc.) sends unauthenticated requests through oauth2-proxy, which redirects to GitLab if no valid _oauth2_proxy session cookie is present.\nThe QR auth service creates its own session cookie (qr_session), but oauth2-proxy knows nothing about it. After QR login, clicking any link from Homer would immediately ask for GitLab credentials again.\nThe obvious solution: after the phone authenticates, set a valid _oauth2_proxy cookie on the TV\u0026rsquo;s browser. If I can forge a cookie that oauth2-proxy accepts, all apps work instantly.\nHow hard can it be?\nAttempt 1: AES-GCM + JSON I looked at the oauth2-proxy source and found what looked like the session format: a JSON struct with short field names (\u0026quot;e\u0026quot; for email, \u0026quot;ca\u0026quot; for created-at, etc.), encrypted with AES-GCM, base64url-encoded.\ntype oauthSession struct { CreatedAt *time.Time `json:\u0026#34;ca\u0026#34;` ExpiresOn *time.Time `json:\u0026#34;ea\u0026#34;` Email string `json:\u0026#34;e\u0026#34;` User string `json:\u0026#34;u\u0026#34;` } SHA256-hash the cookie secret → 32-byte AES key → GCM encrypt → base64url encode. Set as _oauth2_proxy cookie. Clean, simple, wrong.\noauth2-proxy returned 302 every time. I added debug logging to print the cookie value, copied it, and tested it directly against the ForwardAuth endpoint with curl. The logs revealed everything:\nError loading cookied session: cookie signature not valid, removing session Cookie signature not valid. Not \u0026ldquo;decryption failed\u0026rdquo;, not \u0026ldquo;session expired\u0026rdquo;. A signature check.\nFinding the real format The error came from pkg/middleware/stored_session.go:94. I fetched the source:\nval, _, ok := encryption.Validate(c, secret, s.Cookie.Expire) if !ok { return nil, errors.New(\u0026#34;cookie signature not valid\u0026#34;) } encryption.Validate splits the cookie value on | and expects three parts. Looking at utils.go:\nfunc Validate(cookie *http.Cookie, seed string, expiration time.Duration) (value []byte, t time.Time, ok bool) { parts := strings.Split(cookie.Value, \u0026#34;|\u0026#34;) if len(parts) != 3 { return } if checkSignature(parts[2], seed, cookie.Name, parts[0], parts[1]) { // ... } } The cookie format is encryptedValue|timestamp|hmac. My cookie was just encryptedValue. Three-part, not one. First problem found.\nFor the HMAC, I needed to verify against a real cookie to get the key format right. oauth2-proxy sets _oauth2_proxy_csrf cookies during the login flow — I captured one from a 302 response and reverse-engineered it in Python:\nkey = secret_raw.encode() # raw string, not decoded data = (cookie_name + enc_val + ts).encode() # concatenated, NO separators sig = base64.urlsafe_b64encode(hmac.new(key, data, hashlib.sha256).digest()) Two surprises: the HMAC key is the raw cookie secret string (not base64-decoded), and the input is a bare concatenation with no | separators between fields.\nI ran the test. The CSRF cookie\u0026rsquo;s signature matched. I had the format.\nBut oauth2-proxy still rejected the session.\nThe wrong cipher I switched from AES-GCM to the correct HMAC format and tried again. Still 302. cookie signature not valid again.\nWait — was it even getting to the signature check? If decryption failed first, it wouldn\u0026rsquo;t reach that error. I added more debug logging to print the full cookie value and tested it with Python\u0026rsquo;s cryptography library:\ncandidates = { \u0026#39;24-byte std-b64 decode\u0026#39;: base64.b64decode(secret_str), \u0026#39;32-byte raw string\u0026#39;: secret_str.encode(), \u0026#39;32-byte sha256 of b64\u0026#39;: hashlib.sha256(base64.b64decode(secret_str)).digest(), ... } for label, key in candidates.items(): try: pt = AESGCM(key).decrypt(nonce, ct_tag, None) print(f\u0026#39;SUCCESS [{label}]: {pt.decode()}\u0026#39;) except Exception as e: print(f\u0026#39;FAIL [{label}]: {e}\u0026#39;) The 24-byte base64-decoded key decrypted successfully. The cookie was correctly decrypted. But still rejected. Which meant the signature check was passing but something else was wrong upstream — it wasn\u0026rsquo;t even getting to the signature.\nI went back to the source. session_store.go → NewCookieSessionStore:\ncipher, err := encryption.NewCFBCipher(encryption.SecretBytes(secret)) AES-CFB. Not GCM. The cookie session store uses CFB. GCM exists in the codebase for a different purpose (the Redis ticket store, which I hadn\u0026rsquo;t discovered yet). I had been encrypting with the wrong cipher the entire time.\nAnd SecretBytes — a function I\u0026rsquo;d been reading but not understanding:\nfunc SecretBytes(secret string) []byte { b, err := base64.RawURLEncoding.DecodeString(strings.TrimRight(secret, \u0026#34;=\u0026#34;)) if err == nil { for _, i := range []int{16, 24, 32} { if len(b) == i { return b } } } return []byte(secret) // fallback: raw string } The cookie secret q7OF9sK2/Pnt9QKNoBBmxWRL3GAbWzvj contains /. That\u0026rsquo;s valid standard base64 but not URL-safe base64 — RawURLEncoding fails. Fallback to raw string: 32 bytes, valid AES-256 key. My Python test had used standard base64 decoding, which did succeed (and produced a different 24-byte key). My Go implementation had done the same. Both were deriving the wrong key.\nI rewrote the cipher to AES-CFB with the raw-string key. New test. Same error. Still rejecting.\nMessagePack and LZ4 Back to the source. EncodeSessionState:\nfunc (s *SessionState) EncodeSessionState(c encryption.Cipher, compress bool) ([]byte, error) { packed, err := msgpack.Marshal(s) // ... compressed, err := lz4Compress(packed) // ... return c.Encrypt(compressed) } MessagePack. LZ4 compression. Then AES-CFB.\nI had been encrypting raw JSON. The whole time.\nThe struct tags confirmed it:\ntype SessionState struct { CreatedAt *time.Time `msgpack:\u0026#34;ca,omitempty\u0026#34;` ExpiresOn *time.Time `msgpack:\u0026#34;eo,omitempty\u0026#34;` // \u0026#34;eo\u0026#34;, not \u0026#34;ea\u0026#34; as I\u0026#39;d assumed AccessToken string `msgpack:\u0026#34;at,omitempty\u0026#34;` Email string `msgpack:\u0026#34;e,omitempty\u0026#34;` User string `msgpack:\u0026#34;u,omitempty\u0026#34;` } Even the ExpiresOn field name was different from what I\u0026rsquo;d guessed (\u0026quot;eo\u0026quot; not \u0026quot;ea\u0026quot;).\nI added the vmihailenco/msgpack and pierrec/lz4 dependencies, rewrote the encoding pipeline: msgpack → lz4 → AES-CFB(raw-string key) → base64url(encrypted) → sign with HMAC.\nRan the curl test. HTTP 200.\nAfter three days and four complete rewrites of the encoding logic, oauth2-proxy accepted the forged session.\nThe access token problem Celebrating was premature. The browser test worked from curl, but real ForwardAuth requests kept failing intermittently. Looking at the logs:\nError loading cookied session: session is invalid This came from validateSession in the storedSessionLoader — after successfully loading the session, it was calling the provider\u0026rsquo;s ValidateSession method and getting false back. I checked the GitLab provider:\nfunc (p *GitLabProvider) ValidateSession(ctx context.Context, s *sessions.SessionState) bool { return validateToken(ctx, p, s.AccessToken, makeOIDCHeader(s.IDToken)) } oauth2-proxy calls GitLab\u0026rsquo;s /oauth/token/info endpoint with the access token to verify the session is still active. My forged session had an empty AccessToken field. Empty access token → validateToken returns false immediately → session rejected.\nThe fix: during the phone\u0026rsquo;s GitLab OAuth flow, exchangeCode was already calling GitLab\u0026rsquo;s token endpoint and receiving an access token, but I\u0026rsquo;d been discarding it. I changed the function signature to return it, stored it in the session, included it in the forged session\u0026rsquo;s at field.\nThe token was issued for my qr-auth GitLab app, not oauth2-proxy\u0026rsquo;s app. But GitLab\u0026rsquo;s /oauth/token/info endpoint doesn\u0026rsquo;t check the issuing application — it just validates the token is active and returns 200. oauth2-proxy only checks for a 200 response. The token worked.\nEverything worked.\nThe End Session problem — three attempts Attempt 1: Delete qr_session, lock the QR page The first End Session implementation deleted the qr_session key from Redis. To make this actually lock the screen, I restored the Homer proxy at qr.hippotion.com — the TV would show Homer via an ExternalName Kubernetes service pointing at the Homer pod, guarded by a Traefik ForwardAuth middleware that checked the qr_session cookie. Homer makes status API calls every ~30 seconds, which re-triggered ForwardAuth, and deleting qr_session meant the screen would lock within 30 seconds automatically.\nThis worked for qr.hippotion.com, but the _oauth2_proxy cookie was stateless — a signed, self-contained encrypted blob in the browser. There was no server-side record to delete. Other apps (argo.hippotion.com, grafana.hippotion.com, etc.) kept working until the 8-hour cookie expiry.\nThe TV screen was locked. The session wasn\u0026rsquo;t.\nAttempt 2: Shorter cookie TTL The tempting quick fix: reduce the forged cookie\u0026rsquo;s TTL from 8 hours to something shorter, like 30 minutes. End Session would lock the TV immediately. Other apps would expire within 30 minutes on their own.\nRejected. 30 minutes of residual access on a shared TV is too long, and the TTL is arbitrary — it doesn\u0026rsquo;t match what End Session is supposed to mean.\nAttempt 3: Redis-backed oauth2-proxy sessions The correct fix is what oauth2-proxy calls persistence tickets. Instead of encoding the entire session into the cookie, oauth2-proxy stores the session in Redis and puts only a ticket reference in the cookie. When the ticket is deleted from Redis, the session is gone on the next request.\nThe ticket format, from pkg/sessions/persistence/ticket.go:\n// ticketID format: \u0026#34;_oauth2_proxy-\u0026lt;hex(16 random bytes)\u0026gt;\u0026#34; ticketID := fmt.Sprintf(\u0026#34;%s-%s\u0026#34;, cookieOpts.Name, hex.EncodeToString(rawID)) // ticket string in the cookie: \u0026#34;v2.\u0026lt;base64url(ticketID)\u0026gt;.\u0026lt;base64url(ticketSecret)\u0026gt;\u0026#34; func (t *ticket) encodeTicket() string { return fmt.Sprintf(\u0026#34;v2.%s.%s\u0026#34;, base64.RawURLEncoding.EncodeToString([]byte(t.id)), base64.RawURLEncoding.EncodeToString(t.secret)) } // session stored in Redis, encrypted with the *ticket* secret (not the cookie secret) func (t *ticket) saveSession(s *sessions.SessionState, saver saveFunc) error { c, err := encryption.NewGCMCipher(t.secret) // GCM, not CFB // ... ciphertext, err := s.EncodeSessionState(c, false) // msgpack, NO lz4 return saver(t.id, ciphertext, t.options.Expire) } This is a completely different format from the cookie session:\nCookie session Redis session (ticket) Cipher AES-CFB AES-128-GCM Key cookie secret (raw string) per-session ticket secret Serialization msgpack msgpack Compression lz4 none Storage in the cookie Redis, keyed by ticket ID Revocable no yes I rewrote the session creation to generate a random ticket ID and secret, encrypt the msgpack session with AES-GCM using the ticket secret, store it in Redis, and set the signed ticket reference as the _oauth2_proxy cookie.\nI stored the ticket ID alongside the qr_session in Redis:\n{ \u0026#34;email\u0026#34;: \u0026#34;user@example.com\u0026#34;, \u0026#34;username\u0026#34;: \u0026#34;username\u0026#34;, \u0026#34;access_token\u0026#34;: \u0026#34;...\u0026#34;, \u0026#34;oauth2_ticket_id\u0026#34;: \u0026#34;_oauth2_proxy-eeeb18501625dee77f344c0a6193d0bc\u0026#34; } End Session now does two Redis deletes:\nfunc handleLogout(w http.ResponseWriter, r *http.Request) { sessionID := r.FormValue(\u0026#34;session_id\u0026#34;) ctx := r.Context() if raw, err := rdb.Get(ctx, \u0026#34;session:\u0026#34;+sessionID).Result(); err == nil { var sd sessionData if json.Unmarshal([]byte(raw), \u0026amp;sd) == nil \u0026amp;\u0026amp; sd.OAuth2TicketID != \u0026#34;\u0026#34; { rdb.Del(ctx, sd.OAuth2TicketID) // kills oauth2-proxy session } } rdb.Del(ctx, \u0026#34;session:\u0026#34;+sessionID) // kills qr session } I configured oauth2-proxy to use Redis session storage pointing at the same Redis instance, added the Cilium network policy to allow ingress from the oauth2-proxy namespace, and removed the Homer proxy from qr.hippotion.com — it was no longer needed.\nOne final gotcha: session_store_type = \u0026quot;redis\u0026quot; in oauth2-proxy\u0026rsquo;s legacy config file does nothing. There\u0026rsquo;s no error, no warning. It silently ignores the option. The flag only works when passed as an actual CLI argument via extraArgs in the Helm chart values:\nextraArgs: session-store-type: redis redis-connection-url: \u0026#34;redis://qr-auth-redis:6379\u0026#34; After that, End Session worked correctly. Phone taps the button, ticket is deleted from Redis, the next ForwardAuth request for any app on the domain immediately redirects to the QR lock screen.\nWhat the final architecture looks like Phone: scan QR → /device?token=xxx → intermediate page (\u0026#34;Continue with GitLab\u0026#34;) → GitLab OAuth on phone (already logged in → direct callback) → /callback: exchange code → get email + access token → create Redis ticket: AES-128-GCM(msgpack(session), ticketSecret) → store ticket in Redis at \u0026#34;_oauth2_proxy-\u0026lt;hex\u0026gt;\u0026#34; → mark device token as authed, store ticketID in qr session TV: poll fires → read qr session from Redis (has email, accessToken, ticketID) → set _oauth2_proxy cookie: signed ticket reference → set qr_session cookie → redirect to home.hippotion.com Any protected app (home, argo, grafana, ...): → Traefik ForwardAuth → oauth2-proxy → oauth2-proxy reads _oauth2_proxy cookie → decodes ticket → looks up \u0026#34;_oauth2_proxy-\u0026lt;hex\u0026gt;\u0026#34; in Redis → decrypts session → validates email, access token → 200 OK Phone: \u0026#34;End Session\u0026#34; → POST /logout with session_id → delete \u0026#34;session:\u0026lt;id\u0026gt;\u0026#34; from Redis (qr session gone) → delete \u0026#34;_oauth2_proxy-\u0026lt;hex\u0026gt;\u0026#34; from Redis (oauth2 ticket gone) → next ForwardAuth on TV: Redis lookup fails → redirect to login The intermediate page on the phone (\u0026ldquo;Continue with GitLab\u0026rdquo; button instead of auto-redirect) was an unexpected requirement. Mobile browsers opened by the camera app often don\u0026rsquo;t share sessions with the browser where GitLab is logged in. When you auto-redirect to GitLab in a browser with no existing session, GitLab redirects to the sign-in page. The OAuth state is stored in a session cookie that GitLab sets during the initial authorize request. On mobile, the sign-in form submission can lose this cookie due to SameSite restrictions — after sign-in, GitLab can\u0026rsquo;t resume the OAuth flow and falls back to /users/sign_in with no further redirect. The intermediate page gives the user a visible moment to confirm they\u0026rsquo;re in a browser with an active GitLab session before initiating the OAuth redirect.\nLessons Read the source, not the docs. The docs say \u0026ldquo;AES encryption\u0026rdquo; without specifying the mode or how the key is derived. The source has the answer in twenty lines.\nTest at the boundary. The curl test against the ForwardAuth endpoint was the most valuable debugging step. It isolated exactly which layer was failing and gave me the real error message instead of a browser redirect loop. Without it, I\u0026rsquo;d still be guessing.\nFormat assumptions are fragile. I assumed JSON because JSON is the default for everything. oauth2-proxy uses MessagePack because it produces smaller cookies. LZ4 because it decompresses fast. AES-CFB because that\u0026rsquo;s what was chosen when the code was written. None of this is unreasonable, but none of it is obvious from the outside.\nTwo formats, same codebase. Cookie sessions and Redis ticket sessions use different ciphers, different compression, different key derivation. The GCM cipher I found first is correct — but for Redis sessions, not cookie sessions. The CFB cipher is for cookie sessions. I had the right code in the wrong place.\nConfig files can silently ignore options. session_store_type = \u0026quot;redis\u0026quot; in oauth2-proxy\u0026rsquo;s legacy config file does nothing. --session-store-type=redis on the command line works. No error, no warning, no indication that the option was parsed but not applied.\nRevocability requires server-side state. A self-contained encrypted cookie cannot be revoked without adding a denylist (which has its own scaling problems). If you need End Session to mean something, you need a server-side session store. oauth2-proxy supports Redis sessions precisely for this reason — the ticket design is clean and the revocation path is a single Redis delete.\nThe code is at github.com/janos-gyorgy/qr-device-login.\n","permalink":"https://blog.hippotion.com/posts/qr-device-login/","summary":"My homelab uses oauth2-proxy for GitLab SSO. I wanted a QR code login for the TV dashboard. Two days and four complete rewrites later, I knew more about oauth2-proxy\u0026rsquo;s session format than I ever planned to.","title":"📱 Building a QR Code Login for a Homelab (And Learning oauth2-proxy's Session Format the Hard Way)"},{"content":"When I took over DevOps, the handover was a person, not a document. That person was leaving. Everything I\u0026rsquo;d need to keep thirty-odd services and a fleet of customer servers alive lived in his head, in scattered runbooks, and in the muscle memory of having done it before. The classic shape: the system worked, and exactly one human knew why.\nSo the first real project wasn\u0026rsquo;t a migration or a dashboard. It was writing down the system before the only other copy walked out the door.\nThe obvious move is to write the docs — one big knowledge base, ordered however the system happens to be wired. I tried that for about a day. It doesn\u0026rsquo;t work, and the reason it doesn\u0026rsquo;t work is the whole point of this post.\nThe two questions a new hire is actually asking Watch someone learn an unfamiliar platform and you\u0026rsquo;ll notice they\u0026rsquo;re never confused about one thing. They\u0026rsquo;re confused about two, and they\u0026rsquo;re different kinds of confused.\nThe first is \u0026ldquo;what is this technology?\u0026rdquo; — what\u0026rsquo;s a Pod, what does ArgoCD actually do, why would anyone want a secret manager with leases. This confusion is generic. It has nothing to do with us. The answer is the same whether you\u0026rsquo;re here or anywhere else.\nThe second is \u0026ldquo;how do we use it?\u0026rdquo; — where our ArgoCD lives, how our customer tokens are minted, which Grafana panel goes red first when a backup stalls. This confusion is entirely local. No textbook will ever answer it, because the answer is our repo and our decisions.\nA single linear document forces these two into one sequence, and they fight. Explain Kubernetes from scratch and the engineer who already knows it skims and misses the system-specific bit buried in paragraph six. Skip the basics and the engineer who doesn\u0026rsquo;t know it is lost before they reach anything useful. You can\u0026rsquo;t order one list to serve both readers. So I stopped trying.\nTrack 1 is the textbook. Track 2 is the house. The fix was to split the knowledge base along that exact seam.\nTrack 1 — Technical Foundation. Ten pages of generic DevOps: Linux, containers, Kubernetes concepts, Helm, GitOps \u0026amp; ArgoCD, GitLab CI/CD, Vault, Argo Events, observability, Terraform. Every page is something you could, in principle, read on any platform team on earth. Assumed background is stated up front — comfortable with Linux and shell, no Kubernetes required — so nobody has to guess whether a page is for them.\nTrack 2 — Our System. A dozen-plus pages of nothing but us: the cluster and its app-of-apps, the deploy pipelines, the customer model, the monitoring and backup agent, our Vault layout and token expiry monitoring, SSO, the approval portal, the full new-customer install. Every page assumes you already understand the underlying tech — and if you don\u0026rsquo;t, it links straight back to its Track 1 counterpart.\nThat\u0026rsquo;s the rule that keeps the split honest: each Track 1 page ends with an \u0026ldquo;in our system\u0026rdquo; link down to its implementation, and each Track 2 page names its Track 1 prerequisite at the top. Concept and implementation are separate documents, permanently wired to each other.\nThe win is that both tracks stand alone. A senior who\u0026rsquo;s done Kubernetes for years skips Track 1 entirely and reads Track 2 like a system design doc. A strong sysadmin with zero cloud-native experience leans hard on Track 1 first. Same knowledge base, two honest reading paths, neither one padded for the other reader.\nThe interleave is the whole trick Two tracks on their own would just be two piles. The thing that makes them a roadmap is the order you walk them in — and the order is a zipper, not two straight lines.\nTrack 1: Technical Foundation Track 2: Our System ─────────────────────────────── ────────────────────────────────── K8s concepts → then → K8s in our cluster ArgoCD concepts → then → our ArgoCD + GitOps flow Vault concepts → then → Vault here, customer tokens Observability theory → then → our Grafana dashboards, alert types Learn the concept cold, then immediately see it wearing our clothes. The generic mental model gets nailed down by a concrete, real, in-production example before it has time to evaporate — which is the difference between \u0026ldquo;I read about ArgoCD once\u0026rdquo; and \u0026ldquo;I know where our ArgoCD is and what drift looks like on it.\u0026rdquo; Read-then-do, not read-then-read.\nFour phases, because \u0026ldquo;learn DevOps\u0026rdquo; isn\u0026rsquo;t a task A pile of pages still isn\u0026rsquo;t a plan, so the roadmap sits on top of both tracks and spends them over twenty weeks, in four phases, each with one blunt milestone:\nPhase Weeks Milestone Foundations 1–3 Can describe every component and monitor alerts Operations 4–8 Can deploy a customer stack and restore a backup solo Ownership 9–14 Can install a new customer from scratch Mastery 15–20 Can train someone else The milestones are deliberately verbs, not reading counts. Nobody is \u0026ldquo;done with Phase 2\u0026rdquo; because they finished the pages. They\u0026rsquo;re done when they\u0026rsquo;ve restored a backup without me in the room. The last milestone is the one that matters most to me personally — can train someone else — because that\u0026rsquo;s the only state in which I\u0026rsquo;m allowed to be hit by a bus.\nThe readiness tracker, or: vibes don\u0026rsquo;t scale Here\u0026rsquo;s the part I\u0026rsquo;m most attached to, because it\u0026rsquo;s the part that fixes the original problem. \u0026ldquo;Are you ready to own this?\u0026rdquo; answered by gut feel is exactly the tribal-knowledge trap I was trying to escape, just relocated into the new hire\u0026rsquo;s head.\nSo full ownership is broken into eight weighted domains, and at the end of every phase you score yourself against them — honestly — and then study your lowest numbers, not your favorites. It turns \u0026ldquo;do I know enough yet?\u0026rdquo; from a vibe into a number with a gap next to it. The same instinct I\u0026rsquo;d apply to a service I\u0026rsquo;m monitoring, pointed at a person\u0026rsquo;s readiness instead. You don\u0026rsquo;t get to feel ready. You get to be measurably less unready every three weeks.\nWhat I\u0026rsquo;d tell the next me The mistake I almost made was treating onboarding docs as a description of the system. They\u0026rsquo;re not. A description is ordered by how the machine is built. Onboarding has to be ordered by how a human learns — and a human learning a platform is running two processes at once, the general and the specific, and you have to feed both without starving either.\nSplitting the knowledge base in two felt like more work and more surface to maintain. It was the opposite. Now when the tech changes, I edit Track 1. When we change, I edit Track 2. The seam that makes it easy to read is the same seam that makes it easy to keep alive.\nThe handover I got was a person. The handover I\u0026rsquo;m leaving is a map — and it\u0026rsquo;s drawn so the next person can read it without me standing behind them. That was the entire goal. The fact that I can now point a brand-new hire at a URL instead of at my calendar is just the proof it worked.\n","permalink":"https://blog.hippotion.com/posts/inherited-a-system-no-map/","summary":"How I turned a tribal-knowledge handover into a two-track learning roadmap — one track for the technology, one for our system, designed to interleave.","title":"I Inherited a System With No Map. So I Drew Two."}]