<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Kubernetes on hippotion</title><link>https://blog.hippotion.com/tags/kubernetes/</link><description>Recent content in Kubernetes on hippotion</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 21 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.hippotion.com/tags/kubernetes/index.xml" rel="self" type="application/rss+xml"/><item><title>📝 Dev Notes</title><link>https://blog.hippotion.com/posts/dev-notes/</link><pubDate>Sun, 21 Jun 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/dev-notes/</guid><description>Running notes on things I&amp;rsquo;ve hit, fixed, or found worth remembering.</description><content:encoded><![CDATA[<h2 id="kubernetes-init-container-crash-loop-leaves-dirty-emptydir">Kubernetes: init container crash loop leaves dirty emptyDir</h2>
<p>When a pod&rsquo;s init container crashes, Kubernetes restarts <strong>only the init container</strong> — not the whole pod. The <code>emptyDir</code> volume survives between retries. If your init container does a <code>git clone</code> into a fixed path, the second attempt fails with &ldquo;destination path already exists.&rdquo;</p>
<p>Fix: <code>rm -rf</code> the target dir before cloning.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sh" data-lang="sh"><span class="line"><span class="cl">rm -rf /git/repo
</span></span><span class="line"><span class="cl">git clone --depth<span class="o">=</span><span class="m">10</span> --branch<span class="o">=</span>main https://... /git/repo
</span></span></code></pre></div><p>After many restarts, no manual cleanup needed. Events expire in ~1h, old pods are replaced automatically by the Deployment controller. Check recovery with:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">kubectl get events -n &lt;namespace&gt; --sort-by<span class="o">=</span><span class="s1">&#39;.lastTimestamp&#39;</span> <span class="p">|</span> tail -10
</span></span></code></pre></div><h2 id="a-cpu-spike-that-was-actually-memory-thrashing-adding-ga4-to-hugo">A &ldquo;CPU spike&rdquo; that was actually memory thrashing (adding GA4 to Hugo)</h2>
<p>Wanted Google Analytics on this blog. PaperMod already calls a <code>google_analytics.html</code> partial in <code>head.html</code>, but it&rsquo;s gated behind <code>hugo.IsProduction | or (eq site.Params.env &quot;production&quot;)</code>. My blog pod runs <code>hugo server</code>, which <strong>always</strong> reports the environment as <em>development</em> — so the partial never fires. I &ldquo;fixed&rdquo; that by setting <code>env = &quot;production&quot;</code>.</p>
<p>That was the wrong lever. <code>env = production</code> flips on Hugo&rsquo;s whole production path — minification, OpenGraph, Twitter cards, schema JSON across every page. The next full rebuild blew past the pod&rsquo;s <strong>128Mi</strong> memory limit and got <strong>OOMKilled</strong> (exit 137). Server load jumped.</p>
<p>The right way to add GA without touching the build mode: drop the tag in <code>layouts/_partials/extend_head.html</code>. PaperMod includes that partial <em>unconditionally</em>, above the production guard — so it loads under <code>hugo server</code> too.</p>
<p>But here&rsquo;s the part that fooled me. After reverting <code>env</code>, load was <em>still</em> climbing — to ~14 on a single node — and <code>ps</code> showed hugo at &ldquo;500% CPU&rdquo;. Looked like a runaway compute loop. It wasn&rsquo;t:</p>
<pre tabindex="0"><code>%Cpu(s): 2.1 us, 41.0 sy, 6.9 id, 50.0 wa     &lt;- 50% iowait, 2% userspace
PID ... S  %CPU  COMMAND
... D  333  hugo    &lt;- state D, RES pinned at 127MiB (the 128Mi cgroup limit)
</code></pre><p>Two lessons:</p>
<ol>
<li><strong><code>ps %CPU</code> is a lifetime average</strong>, not instantaneous. A process that ran hot for 1s then blocked still shows a big number for a while. Use <code>top</code> for what&rsquo;s happening <em>now</em>.</li>
<li><strong>High load + high <code>%wa</code> + a <code>D</code>-state process sitting at its cgroup memory limit = memory thrashing, not CPU.</strong> Hugo wasn&rsquo;t computing — it was wedged against the 128Mi ceiling, and every allocation triggered kernel reclaim/swap. A sub-second build dragged out for minutes in uninterruptible I/O sleep, and all those blocked tasks are what inflate load average (Linux counts <code>D</code>-state in load).</li>
</ol>
<p>The actual fix was boring: 128Mi was always marginal for <code>hugo-extended</code> + PaperMod. Bumped the limit to 512Mi and the thrash vanished.</p>
<p>Takeaway: when load spikes, read <code>%wa</code> and process state before blaming the CPU. And don&rsquo;t flip <code>env=production</code> on a long-lived <code>hugo server</code> just to ungate one partial — use <code>extend_head.html</code>.</p>
<h2 id="self-hosting-supabase-lean-on-k3s-the-gotcha-checklist">Self-hosting Supabase (lean) on k3s: the gotcha checklist</h2>
<p>Ran the community <code>supabase/supabase</code> chart on a 16Gi single node — enabled db, rest, auth, meta, studio, kong + the log pipeline (analytics/Logflare + vector); left realtime, storage, imgproxy, edge-functions off. The deploy is easy; these are the things that actually bit:</p>
<ul>
<li><strong>Studio shows &ldquo;no tables&rdquo;.</strong> Supabase is single-database by design — Studio, PostgREST and auth all use the database named <code>postgres</code>. App tables in a <em>separate</em> database are invisible to all of it. Put your schema in <code>postgres</code>&rsquo;s <code>public</code> schema.</li>
<li><strong>Studio won&rsquo;t schedule with edge-functions disabled.</strong> Its Deployment mounts the functions PVC unconditionally. Either run functions, or create the PVC yourself and leave functions off.</li>
<li><strong>edge-functions crashloops</strong> if you keep it: it boots by fetching a Deno module from the internet, which a deny-all egress policy blocks. You usually only want the PVC it leaves behind anyway.</li>
<li><strong>vector (log collector) stays silent</strong> under a deny-all policy. It discovers pods via the Kubernetes API, so it needs <strong>API egress</strong>, not just app ports (<code>allowEgressToKubeApi</code>). A log shipper that can&rsquo;t reach the API collects nothing and doesn&rsquo;t say why.</li>
<li><strong><code>secretRef</code> must contain <em>every</em> key the chart maps</strong> — including non-secret ones like <code>database</code> and <code>openAiApiKey</code>. Miss one and pods sit in <code>CreateContainerConfigError</code>.</li>
<li><strong>ESO <code>ExternalSecret</code> shows perpetual <code>OutOfSync</code> in Argo CD</strong> unless you spell out the remoteRef defaults (<code>conversionStrategy: Default</code>, <code>decodingStrategy: None</code>, <code>metadataPolicy: None</code>) — ESO writes them back, and the compact form drifts.</li>
<li><strong><code>postgres</code> is not a superuser.</strong> <code>CREATE DATABASE … OWNER app</code> fails with <code>must be member of role</code>. Supabase keeps the real superuser (<code>supabase_admin</code>) to itself; <code>GRANT app TO postgres</code> first.</li>
<li><strong>Logflare needs no BigQuery.</strong> It runs on the self-hosted Postgres backend (the <code>_supabase</code> database, <code>_analytics</code> schema) — logs land in <code>_analytics.log_events_*</code>.</li>
</ul>
<p>None of this is in the README. It&rsquo;s the gap between &ldquo;I deployed Supabase&rdquo; and &ldquo;I run it.&rdquo;</p>
]]></content:encoded></item><item><title>Every Robot in My House Can Text Me Now</title><link>https://blog.hippotion.com/posts/every-robot-texts-me/</link><pubDate>Fri, 29 May 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/every-robot-texts-me/</guid><description>My house is full of automation that never told me anything — until I gave it one push bus. The first thing I taught it to do was warn me before Claude Code cuts out mid-task.</description><content:encoded><![CDATA[<h2 id="the-silence">The silence</h2>
<p>My house runs on quiet little robots. A tracker watches my kombucha ferment. A
job narrates kids&rsquo; books in Hungarian. A media stack pulls and files things. Home
Assistant minds the sensors. A dozen services, all doing their jobs, all
completely mute. When a batch finished or an import failed, I found out the same
way every time: by going to look.</p>
<p>Then the silence got expensive. Claude Code stopped dead in the middle of a task
because I&rsquo;d burned through my plan&rsquo;s usage window — no warning, no countdown,
just a wall. The information <em>existed</em>; a dashboard in my own cluster was already
polling it. It just had no way to reach my pocket.</p>
<p>So I built one thing: a push bus. One place anything in the cluster can POST to,
that actually buzzes my phone. And the first job I gave it was to warn me before
my AI assistant goes dark.</p>
<hr>
<h2 id="the-boring-part-said-honestly">The boring part (said honestly)</h2>
<p>The bus is <a href="https://ntfy.sh">ntfy</a> — a self-hosted pub/sub notifier. Picking it
took about five minutes, because self-hosting ntfy for a homelab is a thoroughly
solved problem. There are at least three off-the-shelf bridges from Prometheus
Alertmanager to ntfy. I&rsquo;m not going to pretend the bus is the clever bit.</p>
<p>What I <em>did</em> do deliberately:</p>
<ul>
<li>📦 Deployed it <strong>GitOps-native</strong> — one entry in my app-of-apps, reconciled by
Argo CD, no <code>docker run</code> anywhere.</li>
<li>🔒 Locked it to <strong>deny-all auth</strong> with bearer tokens. Security alerts ride this
bus; a world-readable topic on a public URL was a non-starter. (Which also means
it sits <em>outside</em> my usual OAuth gate — the phone app can&rsquo;t do an interactive
login flow, so ntfy does its own token auth.)</li>
<li>🏷️ Topics by severity: <code>hl-crit</code>, <code>hl-warn</code>, <code>hl-info</code>, <code>hl-event</code>. Subscribe
and mute by how much I care.</li>
</ul>
<p>Then the interesting parts showed up at the edges, where they always do.</p>
<hr>
<h2 id="edge-one-my-own-firewall-403d-me">Edge one: my own firewall 403&rsquo;d me</h2>
<p>First test, the usage producer POSTing to <code>https://ntfy.hippotion.com</code>:</p>
<pre tabindex="0"><code>HTTP 403 Forbidden
error code: 1010
</code></pre><p>That <code>1010</code> looks like ntfy rejecting my token. It isn&rsquo;t. <strong>It&rsquo;s Cloudflare.</strong>
Error 1010 means &ldquo;your browser signature is banned&rdquo; — Cloudflare&rsquo;s bot protection
took one look at a Python script&rsquo;s <code>urllib</code> User-Agent and slammed the door.</p>
<p>My own producer couldn&rsquo;t reach my own bus, because the request left the cluster,
went all the way out to my own edge, and got flagged as a bot on the way back in.</p>
<p>The fix is the architecture I should&rsquo;ve had from the start: in-cluster producers
POST to the <strong>internal</strong> service address and never touch the public internet at
all.</p>
<pre tabindex="0"><code># wrong: out to Cloudflare and back, gets bot-blocked
https://ntfy.hippotion.com/hl-warn

# right: stays inside the cluster
http://ntfy.web-ntfy.svc.cluster.local/hl-warn
</code></pre><p>The phone still uses the public URL happily — the real ntfy app carries a
signature Cloudflare trusts. Only scripts trip 1010. <strong>Lesson: your own edge is
not your friend when you&rsquo;re a script. Keep cluster traffic in the cluster.</strong></p>
<hr>
<h2 id="edge-two-the-obvious-data-source-was-lying">Edge two: the obvious data source was lying</h2>
<p>To warn me about Claude usage, the naïve move is to parse Claude Code&rsquo;s local
logs — they sit right there in <code>~/.claude/projects/.../*.jsonl</code>, token counts and
all.</p>
<p>Don&rsquo;t. Those counts are <strong>unreliable for accounting</strong> — known to undercount,
wildly, in some cases by ~100x. Every tool that parses that JSONL inherits the
bug.</p>
<p>The number that&rsquo;s actually true lives in the claude.ai usage API — the same
<code>five_hour</code> and <code>seven_day</code> windows your plan enforces against. And I already had
a service polling exactly that. So the producer is just a tiny sidecar on that
existing pod, reading its <code>/api/usage</code> over <strong>localhost</strong> (same pod — no network
policy to negotiate, no second credential, nothing else hammering claude.ai):</p>
<ul>
<li>📈 ≥80% of a window → <code>hl-warn</code> (high).</li>
<li>🚨 ≥95% → <code>hl-crit</code> (urgent).</li>
<li>🔁 One ping per window per reset cycle, escalating warn→crit, keyed on the
reset timestamp so it never spams.</li>
</ul>
<p>The first time it mattered, my phone buzzed at 80% with hours of runway left
instead of a brick wall mid-task.</p>
<hr>
<h2 id="what-id-tell-past-me">What I&rsquo;d tell past me</h2>
<p>Three things, none of them about ntfy:</p>
<ol>
<li><strong>Reuse the signal you already have.</strong> I didn&rsquo;t build a usage poller — I bolted
a sidecar onto the one already running. The smallest producer is one that reads
localhost.</li>
<li><strong>Your own edge can betray you.</strong> A firewall that protects you from bots will
happily block your own automation. In-cluster talks in-cluster.</li>
<li><strong>Check whether your data source is telling the truth</strong> before you build an
alert on it. An alert you don&rsquo;t trust is worse than no alert — you&rsquo;ll learn to
ignore it, and then it&rsquo;ll be right once.</li>
</ol>
<p>Next, the high-leverage move: point Prometheus Alertmanager at the same bus, and
every infra alert I have — plus every one I&rsquo;ll ever add — lands on the phone
through one bridge. The kombucha ping can wait. The disk-full one can&rsquo;t.</p>
<p>The house is still full of quiet robots. The difference is now they know my
number.</p>
]]></content:encoded></item><item><title>Is Anyone Knocking? A Security Pass on My Homelab</title><link>https://blog.hippotion.com/posts/is-anyone-knocking/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/is-anyone-knocking/</guid><description>I set out to answer a simple worry — is someone trying to get into my server? — and found the scarier question underneath it: if they did, would I even know? My front door was solid. The inside had an alarm with the wires cut, a web terminal sitting on the open internet, and no floor under the blast radius. Here&amp;rsquo;s the audit, and the three things I fixed.</description><content:encoded><![CDATA[<h2 id="the-question-i-actually-had">The question I actually had</h2>
<p>It started as a nervous-Sunday kind of question: <em>is a third party trying to
get into my server — over SSH, or some other way?</em> I run a single-node
Kubernetes homelab that hosts a couple dozen little apps, some of them public.
You read about credential-stuffing bots and you start to wonder who&rsquo;s been
rattling the handle while you slept.</p>
<p>So I did the audit. The good news came first, and it&rsquo;s worth saying plainly
because it&rsquo;s the part most homelabs get wrong: <strong>the front door is solid.</strong>
Nothing is reachable from the internet except through a Cloudflare Tunnel —
an outbound-only connection, zero open inbound ports on my router. Almost
every service sits behind OAuth. The cluster has 140 network policies doing
real east-west segmentation. And the login history? Eleven straight weeks
where every single shell login came from one IP — my own workstation on the
LAN. No strangers. No 3 a.m. logins from a VPS in another hemisphere.</p>
<p>I could have stopped there feeling good. That would have been a mistake.</p>
<h2 id="the-scary-finding-wasnt-an-attacker">The scary finding wasn&rsquo;t an attacker</h2>
<p>The useful question turned out not to be <em>&ldquo;is someone knocking?&rdquo;</em> but
<em>&ldquo;if someone got in, would anything tell me?&rdquo;</em> And when I traced that wire,
it ended in the dark.</p>
<p>I have a full monitoring stack — Prometheus, Grafana, Alertmanager, the works.
Alertmanager was running. It was also configured to notify exactly <strong>no one</strong>:
no receivers, and upstream, <strong>no alert rules at all</strong>. It was a smoke detector
with the battery taken out and, for good measure, no smoke sensor either. If an
attacker had walked in, the alarm would have stayed perfectly, silently green.</p>
<p>That reframed the whole job. Three gaps, in priority order.</p>
<h2 id="gap-1--an-alarm-with-no-one-to-call">Gap 1 — an alarm with no one to call</h2>
<p>I built the missing chain end to end. A small exporter on the host parses the
SSH journal and <code>fail2ban</code> state and writes metrics into node_exporter&rsquo;s
textfile collector — so it rides the monitoring I already had instead of adding
a new moving part. On top sit the alert rules that were never there. The one
that matters most is blunt:</p>
<blockquote>
<p><strong>A shell login succeeded from a non-LAN IP.</strong></p>
</blockquote>
<p>That should be impossible in normal life, so if it ever fires, I want it
shouting. It now emails me the instant it happens, alongside quieter alerts for
brute-force spikes, distributed scans, <code>fail2ban</code> going down, and — the
meta-alert I&rsquo;m fondest of — <em>the watchdog itself going stale</em>, because a
security monitor that silently dies is worse than none. And <code>fail2ban</code> now
actually bans the bots, with escalating ban times and my LAN permanently on the
allow-list.</p>
<p>The honest lesson: I&rsquo;d been treating &ldquo;I have Prometheus&rdquo; as if it meant &ldquo;I have
monitoring.&rdquo; Dashboards you have to remember to look at are not monitoring.
<strong>Monitoring is the thing that interrupts you.</strong> Until an alert can reach your
phone, you don&rsquo;t have a security alarm — you have a security <em>museum</em>.</p>
<h2 id="gap-2--there-was-a-web-terminal-on-the-open-internet">Gap 2 — there was a web terminal on the open internet</h2>
<p>This is the one that made me wince. Among my public hostnames was <code>ttyd</code> — a
browser-based shell. A full terminal on my server, reachable from anywhere,
sitting behind a single OAuth proxy. One misconfiguration, one OAuth bypass,
and that&rsquo;s not &ldquo;an app is compromised,&rdquo; that&rsquo;s <em>root on the box from a browser
tab.</em></p>
<p>The fix here isn&rsquo;t more locks. It&rsquo;s the realization that <strong>the strongest
control is not exposing the thing at all.</strong> I deleted the web terminal
entirely — app, manifests, dashboard tile, all of it. Then I went down the
public hostname list and pulled everything with no business being public off
the tunnel: the secrets UI, the ingress dashboard, Prometheus, Alertmanager,
the network-observability console, the DNS admin. They still work — on my LAN,
over the same wildcard cert — they&rsquo;re just not the internet&rsquo;s business anymore.
A service that isn&rsquo;t exposed has no attack surface to harden.</p>
<h2 id="gap-3--no-floor-under-the-blast-radius">Gap 3 — no floor under the blast radius</h2>
<p>The network policies limit how far a compromised pod can talk sideways. But
nothing stopped a workload from running as root, mounting the host filesystem,
or grabbing the host network in the first place. So I turned on Kubernetes'
built-in Pod Security Admission: every namespace now at least <em>reports</em>
baseline violations, and the clean app namespaces <em>enforce</em> baseline —
meaning a compromised app there simply cannot request privileged mode or a
hostPath mount. It&rsquo;s a floor. Floors are underrated.</p>
<h2 id="what-the-audit-was-really-about">What the audit was really about</h2>
<p>I went looking for an intruder and didn&rsquo;t find one — the logs were clean, the
front door held. What I found instead was that I&rsquo;d built something secure at
the perimeter and then never asked the uncomfortable follow-up: <em>what happens
after the perimeter?</em> The answer had been &ldquo;nothing happens, and no one is
told,&rdquo; and I just hadn&rsquo;t looked.</p>
<p>Three principles I&rsquo;m taking with me:</p>
<ul>
<li><strong>An alarm that can&rsquo;t reach you is decoration.</strong> Wire the notification first;
the rules are easy once something is listening.</li>
<li><strong>Don&rsquo;t expose it beats add more auth.</strong> Every hostname you take off the
public internet is a class of attack you no longer have to be clever about.</li>
<li><strong>Give the blast radius a floor.</strong> Assume one thing gets popped, and decide
in advance how far it gets.</li>
</ul>
<p>The best part: all of it is GitOps. The intrusion alerts, the un-exposing, the
pod-security floor — every change is a commit, reviewable and revertible, and
my cluster reconciles itself to match. The audit didn&rsquo;t just make the homelab
safer. It wrote down <em>why</em> it&rsquo;s safer, in a form the next version of me can
read.</p>
<p>Now if someone knocks, I&rsquo;ll know. And the web terminal isn&rsquo;t answering the
door anymore — because it&rsquo;s gone.</p>
]]></content:encoded></item><item><title>🎙️ Cloning My Own Voice for My Kid's Audiobooks</title><link>https://blog.hippotion.com/posts/clone-your-voice-hungarian-audiobooks/</link><pubDate>Fri, 13 Mar 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/clone-your-voice-hungarian-audiobooks/</guid><description>Zero-shot voice cloning with XTTS-v2 on a CPU-only k3s node: 26 seconds of phone audio in, a cloned-voice audiobook out — and an honest verdict from the bedtime jury. Every manual step, including the ones that went wrong.</description><content:encoded><![CDATA[<h2 id="the-problem-nobody-sells-a-fix-for">The problem nobody sells a fix for</h2>
<p>My kid loves audiobooks. The commercial platforms barely carry Hungarian
children&rsquo;s books, and none of them carry the one narrator my kid actually
prefers: me. I can&rsquo;t read aloud every evening — but my homelab doesn&rsquo;t have
that excuse.</p>
<p>The platform half (ebook → M4B → Audiobookshelf on k3s) is a story for
another post. This one is about the voice: how to go from a phone recording
to an audiobook narrated in your own voice, step by step, on hardware with
no GPU.</p>
<p>The short version: <strong>XTTS-v2 does zero-shot voice cloning from a ~20-second
sample.</strong> No training, no fine-tuning, no dataset. One clean recording and a
flag.</p>
<hr>
<h2 id="why-xtts-v2-in-2026">Why XTTS-v2, in 2026?</h2>
<p>It&rsquo;s not the best open TTS model anymore. Chatterbox beats ElevenLabs in
blind tests; F5-TTS sounds cleaner. But model selection for a small language
is constraint-first, not leaderboard-first: Chatterbox has <strong>no Hungarian</strong>,
NVIDIA&rsquo;s TTS NIMs have <strong>no Hungarian</strong>, Kokoro — no Hungarian. XTTS-v2
speaks Hungarian <em>and</em> clones voices <em>and</em> runs on CPU. That intersection
has exactly one resident.</p>
<p>I run it via <a href="https://github.com/DrewThomasson/ebook2audiobook">ebook2audiobook</a>,
which wraps XTTS with Calibre ingestion and M4B chaptering.</p>
<hr>
<h2 id="step-1--record-25-seconds-of-yourself">Step 1 — Record ~25 seconds of yourself</h2>
<p>Phone voice-memo app, quiet room, ~20 cm from your mouth. Mine came out as
28 seconds of stereo 48 kHz AAC. Two rules that matter more than gear:</p>
<ul>
<li><strong>Read the way you want the books narrated.</strong> The clone copies prosody —
energy, pacing, warmth — not just timbre. A flat recital clones into a
flat narrator. I read a children&rsquo;s tale the way I&rsquo;d read it at bedtime.</li>
<li><strong>Don&rsquo;t peak the mic.</strong> My sample hit −0.1 dB max volume — right at the
clipping ceiling. It worked, but quieter is safer. Check yours:</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">ffmpeg -i janos.m4a -af volumedetect -f null - 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="p">|</span> grep volume
</span></span><span class="line"><span class="cl"><span class="c1"># mean_volume: -21.4 dB   ← fine</span>
</span></span><span class="line"><span class="cl"><span class="c1"># max_volume:  -0.1 dB    ← living dangerously</span>
</span></span></code></pre></div><hr>
<h2 id="step-2--normalize-to-what-xtts-wants">Step 2 — Normalize to what XTTS wants</h2>
<p>XTTS expects a mono WAV; 24 kHz matches its internal rate. Trim the silence
off both ends while you&rsquo;re at it:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">ffmpeg -i janos.m4a <span class="se">\
</span></span></span><span class="line"><span class="cl">  -af <span class="s2">&#34;silenceremove=start_periods=1:start_threshold=-45dB:start_silence=0.2,\
</span></span></span><span class="line"><span class="cl"><span class="s2">areverse,silenceremove=start_periods=1:start_threshold=-45dB:start_silence=0.2,\
</span></span></span><span class="line"><span class="cl"><span class="s2">areverse&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="cl">  -ar <span class="m">24000</span> -ac <span class="m">1</span> janos.wav
</span></span></code></pre></div><p>(The double-<code>areverse</code> is the classic trick: <code>silenceremove</code> only trims the
front, so you flip the audio, trim the front again, flip it back.)</p>
<p>Drop the result where your TTS stack looks for voices. In ebook2audiobook
that&rsquo;s the <code>voices/</code> tree, organised by language:</p>
<pre tabindex="0"><code>voices/hun/adult/male/janos.wav
</code></pre><hr>
<h2 id="step-3--synthesize">Step 3 — Synthesize</h2>
<p>One flag does the cloning. Headless run on the k3s pod:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">kubectl <span class="nb">exec</span> -n web-audiobooks deploy/ebook2audiobook -- sh -c <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="s1">&#39;cd /app &amp;&amp; python app.py --headless \
</span></span></span><span class="line"><span class="cl"><span class="s1">     --ebook &#34;/app/ebooks/tale.txt&#34; \
</span></span></span><span class="line"><span class="cl"><span class="s1">     --language hun \
</span></span></span><span class="line"><span class="cl"><span class="s1">     --tts_engine xtts \
</span></span></span><span class="line"><span class="cl"><span class="s1">     --device cpu \
</span></span></span><span class="line"><span class="cl"><span class="s1">     --voice /app/voices/hun/adult/male/janos.wav \
</span></span></span><span class="line"><span class="cl"><span class="s1">     --output_format m4b \
</span></span></span><span class="line"><span class="cl"><span class="s1">     --output_dir /app/audiobooks&#39;</span>
</span></span></code></pre></div><p>On my 12-core CPU node this runs at roughly 3× real-time — a 2-minute tale
takes ~8 minutes, a full children&rsquo;s book is an overnight job. The first run
computes speaker latents from your WAV; after that it&rsquo;s ordinary synthesis
with your voice as the reference.</p>
<hr>
<h2 id="step-4--ab-before-you-batch">Step 4 — A/B before you batch</h2>
<p>Render one <em>short</em> book twice — stock narrator and cloned voice — and put
both in front of the household jury. Cloning quality is personal in the most
literal sense: MOS scores won&rsquo;t tell you whether it sounds like <em>you</em>. My
benchmark has strong opinions and goes to bed at eight.</p>
<p>Only after the clone passes do you re-render the library with <code>--voice</code>.</p>
<p><img alt="Audiobookshelf library with the same tale twice: stock narrator and the &ldquo;apa hangján&rdquo; clone, side by side for the jury" loading="lazy" src="/posts/clone-your-voice-hungarian-audiobooks/abs-ab.png"></p>
<hr>
<h2 id="the-manual-steps-that-earn-the-word-manual">The manual steps that earn the word &ldquo;manual&rdquo;</h2>
<p>Things the tutorials skip, learned the slow way:</p>
<ul>
<li><strong>Long conversions die with the browser tab.</strong> Gradio-style web UIs tie
the job to the open page; close the laptop and you get &ldquo;Conversion
cancelled&rdquo; half a book in. Anything longer than ~15 minutes of audio runs
headless under <code>nohup</code>.</li>
<li><strong>CPU synthesis leaks memory over hours.</strong> My pod has a hard 6 Gi limit on
a 16 Gi node, and a 6-hour run will hit it. Keep the cap (it protects the
other 30 namespaces), and rely on the tool&rsquo;s <code>--session &lt;id&gt;</code> resume — it
picks up at the exact sentence. One catch: headless resume still asks an
interactive <code>Resume? [y]es</code> — pipe <code>echo y |</code> into it.</li>
<li><strong>The per-chapter FLACs survive a crash.</strong> If the final M4B muxing step
OOMs, don&rsquo;t re-synthesize: the chapters are sitting in the session&rsquo;s tmp
directory, and <code>ffmpeg</code> will assemble them into a chaptered M4B with a
hand-written FFMETADATA file in about two minutes, at near-zero memory.</li>
</ul>
<p>None of this is hard. It&rsquo;s just undocumented — which is the gap between
&ldquo;there&rsquo;s a model for that&rdquo; and your kid pressing play.</p>
<hr>
<h2 id="postscript-the-jury-came-back">Postscript: the jury came back</h2>
<p>The clone failed. Recognizably my timbre, nowhere near natural — I wouldn&rsquo;t
play it to my kid, which is the only metric that exists for this project.</p>
<p>Worth being precise about <em>what</em> failed: the stock XTTS-v2 narrator passed
the ear test and the library keeps growing with it. Zero-shot <strong>cloning</strong> is
the part that fell short — a 2023 model conditioning on 26 seconds of a
voice it has never seen, in a language that was never its strong suit. The
pipeline above is still the right pipeline; the model isn&rsquo;t there yet on
CPU-class options.</p>
<p>The next experiment is already picked: <a href="https://huggingface.co/Maxdorger29/f5-tts-hungarian">F5-TTS Hungarian</a>,
a 2026 fine-tune on 280 hours of actual Hungarian speech, built precisely
for short-sample cloning. It needs CUDA, which my node doesn&rsquo;t have — but a
rented spot GPU tests it for the price of an espresso. If it passes the
bedtime jury, that&rsquo;ll be its own post.</p>
<p>Negative results are results. The jury reconvenes when the GPU shows up.</p>
]]></content:encoded></item><item><title>🫙 I Built a Tracker for My Kombucha. The Data Model Was the Hard Part.</title><link>https://blog.hippotion.com/posts/kombucha-tracker/</link><pubDate>Fri, 02 Jan 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/kombucha-tracker/</guid><description>Brewing kombucha looks simple until you try to model it: one batch splits into many flavored bottles, every jar generates a stream of pH and taste readings, and a SCOBY has a lineage. Here&amp;rsquo;s the little app I built to keep track — and why the schema, not the code, was the real work.</description><content:encoded><![CDATA[<h2 id="i-brew-kombucha">I brew kombucha</h2>
<p>If you haven&rsquo;t fallen down this hole: kombucha is sweet tea fermented by a SCOBY (a rubbery pancake of yeast and bacteria) into something tart and fizzy. It&rsquo;s a <em>living</em> hobby — the culture is alive, every batch is a little different, and the only way to get good is to pay attention and remember what you did.</p>
<p>I was not remembering what I did. Brew dates lived in my head, taste notes lived nowhere, and &ldquo;which jar was the ginger one again?&rdquo; was a genuine question I asked myself out loud, to a fridge.</p>
<p>So I built a tracker. It&rsquo;s called <strong>HipPotion</strong> — same family as everything else I run here. The brewing turned out to be the easy part. Modeling it was where it got interesting.</p>
<h2 id="why-a-simple-list-doesnt-fit">Why a simple list doesn&rsquo;t fit</h2>
<p>My first instinct was &ldquo;a batch is a row, log some notes.&rdquo; That falls apart fast, because kombucha isn&rsquo;t linear. It has two stages:</p>
<ul>
<li><strong>F1 (first ferment):</strong> the big jar of sweet tea + SCOBY, fermenting sour over a week or two. One vessel, one culture.</li>
<li><strong>F2 (second ferment):</strong> you split that sour base into bottles and flavor each one differently — ginger in this one, blackberry in that one, hibiscus in the next — then seal them to build carbonation.</li>
</ul>
<p>So <strong>one batch becomes many bottles, each with its own flavor, its own carbonation, its own outcome.</strong> A flat &ldquo;batch = row&rdquo; model can&rsquo;t express that. And on top of the branching, every jar and bottle produces a <em>stream</em> of observations over time: pH today, Brix tomorrow, &ldquo;tastes too sweet still&rdquo; the day after.</p>
<p>That&rsquo;s three different shapes at once — a lifecycle, a one-to-many split, and a time series — for what looks from the outside like &ldquo;I made some tea.&rdquo;</p>
<h2 id="the-model-i-landed-on">The model I landed on</h2>
<p>Six tables, each earning its place:</p>
<ul>
<li><strong><code>recipes</code></strong> — the templates. Tea blend, sugar ratio, target numbers. A batch points at one.</li>
<li><strong><code>batches</code></strong> — an actual F1 brew, with a lifecycle (<code>planned → active → conditioning → finished</code>) and a reference to its recipe.</li>
<li><strong><code>fermentation_log_entries</code></strong> — the time series. One row per observation per batch: pH, Brix, temperature, taste/smell notes, what I did. This is where the &ldquo;pay attention and remember&rdquo; lives.</li>
<li><strong><code>f2_variant_batches</code></strong> — the branch. Each is a flavored bottle split off a parent batch, tracked on its own.</li>
<li><strong><code>starter_log</code></strong> — SCOBY lineage. Cultures have parents; you grow new ones from old ones, and a sick culture ruins a batch, so the lineage matters.</li>
<li><strong><code>botanical_infusions</code></strong> — the flavoring ingredients, managed per recipe.</li>
</ul>
<p>The shape that took the longest to get right was the <strong>F1 → F2 split</strong>: a variant has to belong to its parent batch but live its own life. Once that relationship was clean, the whole thing clicked — the app finally matched how brewing <em>actually works</em> instead of how it&rsquo;s easy to store.</p>
<h2 id="the-stack-and-where-it-runs">The stack (and where it runs)</h2>
<p>Nothing exotic: React + Vite + TypeScript on the front (TanStack Query, shadcn/ui, Tailwind), a <a href="https://hono.dev">Hono</a> + Drizzle ORM API on the back, PostgreSQL underneath. Built with AI coding tools — I leaned on them hard for the React/shadcn front-end, less so for the schema, which I argued out by hand because it&rsquo;s the part that had to be <em>right</em>.</p>
<p>It runs on my k3s homelab like everything else: a Helm chart deploys the nginx frontend, the Hono API, and a Postgres StatefulSet, all reconciled by Argo CD from Git. Default-deny networking, secrets out of Git — the <a href="/posts/homelab-gitops/">usual platform defaults</a>. It&rsquo;s a hobby app, but it gets treated like a real one, because the platform doesn&rsquo;t know the difference and I don&rsquo;t want it to.</p>
<h2 id="it-became-an-api-for-something-else">It became an API for something else</h2>
<p>The unexpected payoff: because the data model was clean and the API was just a set of plain REST endpoints, it made a perfect target for an experiment. I later <a href="/posts/n8n-agent-cloud-vs-local/">pointed an AI agent at it from n8n</a> — &ldquo;what&rsquo;s fermenting right now?&rdquo;, &ldquo;log that this batch tastes tart&rdquo; — and the agent just called the same endpoints the UI does. A good schema is reusable in ways you don&rsquo;t plan for. The kombucha tracker quietly became a little knowledge base I can talk to.</p>
<h2 id="honest-notes">Honest notes</h2>
<p>This is a personal hobby app for an audience of one (me). It&rsquo;s AI-assisted, it has no tests, and the UI has rough edges. I&rsquo;m not pretending it&rsquo;s a product.</p>
<p>But the thing I keep coming back to: the hard, valuable part wasn&rsquo;t the framework or the deployment — it was sitting with a messy real-world process long enough to find the <em>shape</em> of it. The branching ferment, the time series, the lineage. Get the model honest and the rest is just typing. Get it wrong and no amount of nice UI saves you.</p>
<p>Also, the kombucha&rsquo;s been better since I started writing things down. Turns out the fridge wasn&rsquo;t a great database.</p>
]]></content:encoded></item><item><title>🧱 How Do You Isolate Two n8n Tenants on Kubernetes — and Prove Each Wall Holds?</title><link>https://blog.hippotion.com/posts/n8n-multitenant/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/n8n-multitenant/</guid><description>Multi-tenant isolation is easy to assert and hard to verify. Three walls — network, secret, resource — and the actual 403s, timeouts, and admission rejections that prove each one holds.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;You&rsquo;re running n8n for multiple customers on the same Kubernetes cluster. What stops Customer A from reading Customer B&rsquo;s API keys, calling Customer B&rsquo;s services, or starving Customer B&rsquo;s workflows by burning the whole node?&rdquo;</em></p>
<p>Three different walls, three different mechanisms. Most articles I&rsquo;ve read on K8s multi-tenancy list the primitives — namespaces, NetworkPolicies, ResourceQuotas, RBAC — without showing what each one actually catches when you try to cross it. This post does the second part. The receipts are the point.</p>
<p>The setup: two namespaces, <code>web-tenant-acme</code> and <code>web-tenant-globex</code>, each running their own n8n instance on the same node. The only thing keeping them apart is the walls we build around each namespace.</p>
<hr>
<h2 id="the-mental-model-subtractive-isolation">The mental model: subtractive isolation</h2>
<p>Kubernetes is a flat network with shared everything by default. You don&rsquo;t <em>add</em> isolation by writing allow rules. You <em>subtract</em> trust by adding default-deny rules, and then carefully allow back only the connections each tenant actually needs.</p>
<p>A tenant doesn&rsquo;t have access to another tenant because there is <em>no rule allowing it</em>. The absence of an allow rule is the wall.</p>
<p>Three of these absences make up the picture:</p>
<table>
	<thead>
			<tr>
					<th>Wall</th>
					<th>Primitive</th>
					<th>Failure mode when crossed</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Network</td>
					<td>Cilium NetworkPolicy, default-deny egress</td>
					<td>Connection times out (silent drop)</td>
			</tr>
			<tr>
					<td>Secret</td>
					<td>Vault Kubernetes-auth, per-tenant policy</td>
					<td><code>403 permission denied</code> from Vault itself</td>
			</tr>
			<tr>
					<td>Resource</td>
					<td>ResourceQuota + LimitRange</td>
					<td>Pod rejected at admission time</td>
			</tr>
	</tbody>
</table>
<p>Different layers, different error messages. That&rsquo;s how you can tell what stopped you.</p>
<hr>
<h2 id="wall-1--network-cilium-networkpolicy">Wall 1 — Network: Cilium NetworkPolicy</h2>
<p>n8n in <code>web-tenant-acme</code> can reach <code>whoami.web-tenant-acme.svc.cluster.local</code> (its own service in its own namespace) but not <code>whoami.web-tenant-globex.svc.cluster.local</code>. The same DNS shape, the same cluster, the same node. One succeeds, the other hangs.</p>
<p>The primitive is a default-deny egress policy applied to every pod in the namespace, with two narrow exceptions: intra-namespace traffic (so n8n can still reach its own service) and DNS to <code>kube-system</code> (otherwise nothing resolves anything).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Effective policy on every pod in web-tenant-acme:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">podSelector</span><span class="p">:</span><span class="w"> </span>{}<span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Egress, Ingress]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">egress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">to</span><span class="p">:</span><span class="w">                                     </span><span class="c"># intra-namespace traffic OK</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">podSelector</span><span class="p">:</span><span class="w"> </span>{}<span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">to</span><span class="p">:</span><span class="w">                                     </span><span class="c"># DNS to kube-dns OK</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">namespaceSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kubernetes.io/metadata.name</span><span class="p">:</span><span class="w"> </span><span class="l">kube-system</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">ports</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>{<span class="nt">port: 53, protocol</span><span class="p">:</span><span class="w"> </span><span class="l">UDP}]</span><span class="w">
</span></span></span></code></pre></div><p>There is no rule for <code>web-tenant-globex</code>. Cilium&rsquo;s eBPF datapath drops the SYN packet on the way out.</p>
<p><strong>The receipt</strong> — an n8n HTTP node configured to GET <code>http://whoami.web-tenant-globex.svc.cluster.local/</code>. It hangs for the full timeout, then errors with <code>AxiosError: timeout of 5000ms exceeded</code> / <code>code: ECONNABORTED</code>.</p>
<p>The interesting bit: <strong>DNS still works.</strong> kube-dns is allowed, so the cross-namespace Service still resolves. The TCP handshake is what gets dropped. That&rsquo;s a useful signal in real incident response — &ldquo;DNS resolves but the connection hangs&rdquo; almost always means a NetworkPolicy is the cause.</p>
<hr>
<h2 id="wall-2--secret-vault-kubernetes-auth--eso">Wall 2 — Secret: Vault Kubernetes-auth + ESO</h2>
<p>Now imagine Acme&rsquo;s n8n misbehaves: somebody pushes a workflow that tries to read Globex&rsquo;s API keys via an <code>ExternalSecret</code>. The network isn&rsquo;t the issue — both tenants need to reach Vault, so they both have an egress rule for <code>sys-vault</code>. The wall has to be at the identity layer.</p>
<p>Each tenant gets three things:</p>
<ol>
<li>A dedicated <code>ServiceAccount</code> (<code>n8n-acme</code>, <code>n8n-globex</code>).</li>
<li>A Vault Kubernetes-auth <code>role</code> bound to that SA in that namespace, mapped to a Vault <code>policy</code> that grants <code>read</code> on <em>only its own</em> KV path.</li>
<li>A namespaced External Secrets <code>SecretStore</code> that authenticates as the SA via the Kubernetes TokenRequest API.</li>
</ol>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="cl"><span class="c1"># Vault policy: tenant-acme can read its own secrets, nothing else.
</span></span></span><span class="line"><span class="cl"><span class="n">path &#34;secret/data/web-tenant-acme&#34;     { capabilities</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;read&#34;</span><span class="p">]</span> }
</span></span><span class="line"><span class="cl"><span class="n">path &#34;secret/metadata/web-tenant-acme&#34; { capabilities</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;read&#34;</span><span class="p">]</span> }
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">vault write auth/kubernetes/role/tenant-acme <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="nv">bound_service_account_names</span><span class="o">=</span>n8n-acme <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="nv">bound_service_account_namespaces</span><span class="o">=</span>web-tenant-acme <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="nv">policies</span><span class="o">=</span>tenant-acme <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="nv">ttl</span><span class="o">=</span>1h
</span></span></code></pre></div><p>When Acme&rsquo;s n8n tries an <code>ExternalSecret</code> pointing at <code>secret/web-tenant-globex/...</code>, ESO authenticates fine (the SA is valid), Vault recognises the caller, looks up the <code>tenant-acme</code> policy, and answers with the most satisfying line in this whole demo:</p>
<pre tabindex="0"><code>URL: GET http://sys-vault.sys-vault.svc.cluster.local:8200/v1/secret/data/web-tenant-globex
Code: 403. Errors:
* permission denied
</code></pre><p>This is the bit that separates &ldquo;namespace isolation&rdquo; from real multi-tenant secret isolation. Plain Kubernetes Secrets + RBAC stop a tenant from <em>listing</em> another tenant&rsquo;s Secret objects, but the moment you go upstream — to Vault, to a cloud KMS, to an SSM Parameter Store — the secret store needs to enforce identity itself. The network said yes; the secret store still says no.</p>
<hr>
<h2 id="wall-3--resource-resourcequota--limitrange">Wall 3 — Resource: ResourceQuota + LimitRange</h2>
<p>The third concern is the noisy neighbour: Acme&rsquo;s runaway workflow allocating a 4Gi pod and OOM-killing everything else on the node. The network policy doesn&rsquo;t catch this (no network call), and Vault doesn&rsquo;t catch this (no secret request). The kernel will, <em>eventually</em> — but you don&rsquo;t want eventually. You want admission-time rejection.</p>
<p>Two primitives:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ResourceQuota</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">name: tenant-quota, namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-tenant-acme }</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">hard</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">requests.cpu</span><span class="p">:</span><span class="w">    </span><span class="s2">&#34;1&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">requests.memory</span><span class="p">:</span><span class="w"> </span><span class="l">1Gi</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">limits.cpu</span><span class="p">:</span><span class="w">      </span><span class="s2">&#34;2&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">limits.memory</span><span class="p">:</span><span class="w">   </span><span class="l">2Gi</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">pods</span><span class="p">:</span><span class="w">            </span><span class="s2">&#34;10&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nn">---</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">LimitRange</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">name: tenant-limits, namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-tenant-acme }</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">limits</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">Container</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">default</span><span class="p">:</span><span class="w">        </span>{<span class="w"> </span><span class="nt">cpu: 500m, memory</span><span class="p">:</span><span class="w"> </span><span class="l">512Mi }</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">defaultRequest</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">cpu: 50m,  memory</span><span class="p">:</span><span class="w"> </span><span class="l">128Mi }</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">max</span><span class="p">:</span><span class="w">            </span>{<span class="w"> </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;2&#34;</span><span class="nt">,  memory</span><span class="p">:</span><span class="w"> </span><span class="l">1Gi }</span><span class="w">
</span></span></span></code></pre></div><p><code>ResourceQuota</code> caps the namespace total. <code>LimitRange</code> bounds any <em>individual</em> container and supplies defaults so pods that don&rsquo;t declare requests/limits still get reasonable ones — important because a missing limit on a single container can blow past the quota in one allocation.</p>
<p><strong>The receipt</strong> — a server-side dry-run of a single 4Gi pod, which never gets created:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ kubectl apply -n web-tenant-acme --dry-run<span class="o">=</span>server -f noisy-neighbor.yaml
</span></span><span class="line"><span class="cl">Error from server <span class="o">(</span>Forbidden<span class="o">)</span>: error when creating <span class="s2">&#34;STDIN&#34;</span>:
</span></span><span class="line"><span class="cl">pods <span class="s2">&#34;noisy-neighbor&#34;</span> is forbidden:
</span></span><span class="line"><span class="cl">  maximum memory usage per Container is 1Gi, but limit is 4Gi
</span></span></code></pre></div><p>Not a kernel OOMKill. Not a pod stuck in <code>Pending</code>. A flat refusal from the API server before the scheduler even sees the request.</p>
<hr>
<h2 id="what-this-does-not-prove">What this does <em>not</em> prove</h2>
<p>A homelab demo on one node with two synthetic tenants is not n8n Cloud. The honest gaps:</p>
<ul>
<li><strong>Execution sandboxing.</strong> A workflow can still run arbitrary code via the <code>Code</code> node or shell-outs. These walls stop <em>infrastructure</em> leakage; they don&rsquo;t sandbox what n8n itself executes. Real n8n Cloud needs more than namespace walls for that — gVisor / Firecracker / per-tenant worker pools are the usual answers, and n8n&rsquo;s <a href="https://docs.n8n.io/hosting/scaling/queue-mode/">queue mode</a> lends itself to the last.</li>
<li><strong>Pooled worker queues.</strong> Queue mode runs main/webhook/worker as separate deployments backed by Redis + Postgres. Two tenants sharing a worker pool need additional checks at the job-routing layer to keep workflows from accessing the wrong tenant&rsquo;s binary data. Out of scope for the homelab demo.</li>
<li><strong>Control plane.</strong> Both tenants reach the same API server. A cluster-admin-equivalent compromise breaks everything. This is the assumption every shared K8s setup makes.</li>
<li><strong>Node-level.</strong> Same kernel. Container escape, CPU side channels, the usual list — all apply. For paranoid tenants the answer is dedicated nodes via taints/tolerations or separate clusters entirely.</li>
</ul>
<p>The demo proves the <em>namespace-shaped</em> walls hold. It does not prove the whole stack is safe against a determined attacker already running code inside a tenant. That&rsquo;s a different post.</p>
<hr>
<p><em>Part of a Kubernetes-on-the-homelab series — previously: <a href="/posts/k8s-network-isolation/">preventing a compromised pod from calling your database</a>, <a href="/posts/k8s-gitops-secrets/">GitOps secrets</a>.</em></p>
]]></content:encoded></item><item><title>🍵 I A/B-Tested Cloud vs Local LLMs in One n8n Agent. The Local One Faked It.</title><link>https://blog.hippotion.com/posts/n8n-agent-cloud-vs-local/</link><pubDate>Fri, 07 Nov 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/n8n-agent-cloud-vs-local/</guid><description>I built an AI agent in self-hosted n8n over my kombucha-tracking app, then gave it two brains — NVIDIA&amp;rsquo;s 70B and a local Phi-3.5 — sharing the same tools. The cloud model called the tools and answered from real data. The local one couldn&amp;rsquo;t, so it made things up.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p>I run <a href="https://n8n.io">n8n</a> on my k3s homelab. Not docker-compose on a NUC — the full treatment: GitOps-reconciled, Vault-backed secrets, default-deny networking. The same boring platform everything else here runs on.</p>
<p>But &ldquo;I have n8n running&rdquo; proves nothing. I wanted to know if I actually understood it as an <em>agent platform</em>, and to answer a question I kept dodging: <strong>for agent work, do I need a cloud model, or is my local one good enough?</strong></p>
<p>So I built a real agent and gave it two brains.</p>
<h2 id="what-i-built">What I built</h2>
<p>A chat assistant over brew-buddy, my homemade kombucha-tracking app (React + a small API + Postgres). You ask it things in plain language; it calls the app&rsquo;s API and answers. The twist: the same question runs through <strong>two agents in parallel</strong> — one backed by NVIDIA&rsquo;s hosted <strong>Llama-3.3-70B</strong>, one by a local <strong>Phi-3.5-mini</strong> on CPU — and the workflow prints both answers side by side.</p>
<pre tabindex="0"><code>Chat ──▶ Agent (cloud: NVIDIA 70B) ──┐   tools (shared):
     └─▶ Agent (local: Phi-3.5)   ──┤     • get_all_batches
                                    │     • get_batch_detail
                                    │     • brewing_statistics
            (Merge) ──▶ both replies, labeled     • add_batch_log   ⟵ write
                                                  • create_batch    ⟵ write
</code></pre><p>Both agents share the same read tools. The two <em>write</em> tools are wired to the cloud agent only — more on that below.</p>
<p><img alt="The kombucha agent in n8n: a chat trigger fans out to two AI Agent nodes (cloud and local), both wired to the same brew-buddy tools, then merged so the two answers print side by side." loading="lazy" src="/posts/n8n-agent-cloud-vs-local/n8n.png"></p>
<p>The nice part: I didn&rsquo;t write a line of glue. n8n&rsquo;s stock <strong>OpenAI Chat Model</strong> node talks to anything OpenAI-compatible if you override the credential&rsquo;s Base URL — so one node points at <code>https://integrate.api.nvidia.com/v1</code>, the other at <code>http://llama-server.&lt;ns&gt;.svc:8080/v1</code> for the local server. Same node, two endpoints.</p>
<h2 id="the-infra-that-keeps-it-honest">The infra that keeps it honest</h2>
<p>I won&rsquo;t re-explain the platform here — it&rsquo;s in earlier posts: <a href="/posts/homelab-gitops/">GitOps</a>, <a href="/posts/k8s-gitops-secrets/">Vault-backed secrets</a>, <a href="/posts/k8s-network-isolation/">default-deny networking</a>, <a href="/posts/homelab-dual-path-tls/">dual-path TLS ingress</a>. But building the agent made one of them <em>tangible</em>.</p>
<p>n8n is, by design, a thing that makes arbitrary HTTP calls on a schedule. That&rsquo;s exactly what you want behind a default-deny network policy. n8n couldn&rsquo;t reach the brew-buddy API at all until I declared it — one line:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># n8n&#39;s namespace</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">allowEgressToNamespaces</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">web-ai-engine, web-brew-buddy]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c">#                                          ^ added this for the agent</span><span class="w">
</span></span></span></code></pre></div><p>(plus a matching ingress-allow on brew-buddy&rsquo;s side). That&rsquo;s the posture working as intended: the blast radius of a workflow tool is whatever I&rsquo;ve explicitly granted, and not one namespace more. Adding a capability is a reviewable one-liner in Git; Argo reconciles it. No <code>kubectl</code>, no guessing what n8n can reach.</p>
<h2 id="the-ab-same-agent-same-tools-two-brains">The A/B: same agent, same tools, two brains</h2>
<p><strong>Plain &ldquo;hi&rdquo;.</strong> Cloud answers in ~0.5s. Local takes noticeably longer — because even for &ldquo;hi&rdquo;, the agent feeds the model the full system prompt <em>plus the JSON schemas for every tool</em>, and Phi-3.5 has to chew through all of it on CPU before it can say a word. So far, the boring expected result: local is slower.</p>
<p>Then I asked a real question, and the result flipped in a way I didn&rsquo;t expect.</p>
<p><strong>&ldquo;What batches do I have?&rdquo;</strong></p>
<p>Cloud (70B) called <code>get_all_batches</code>, got the real rows, and answered:</p>
<blockquote>
<p>You have two batches: 2026-04-09-A (cold-crash, 3L) and 2026-04-09-W (cold-crash, 3L).</p>
</blockquote>
<p>Local (Phi-3.5) <strong>never called the tool.</strong> It didn&rsquo;t seem to realise it <em>had</em> tools. Instead it confidently explained how <em>I</em> could go find the data myself:</p>
<blockquote>
<p>To list all batches: 1. Access the brew-buddy app. 2. Look for a button labeled &ldquo;List Batches&rdquo;… <code>def get_all_batches(): …</code> … Remember, I&rsquo;m unable to directly interact with apps or databases.</p>
</blockquote>
<p>Fake instructions. Fake code. A polite apology. Everything except the actual answer it was sitting on top of.</p>
<p><strong>Writing data.</strong> I asked both to <em>log</em> an observation. Cloud called <code>add_batch_log</code> and wrote a real row to Postgres (&ldquo;I have recorded the observation…&rdquo;). Local bluffed again — &ldquo;here&rsquo;s how <em>you</em> can log it yourself.&rdquo;</p>
<h2 id="why-it-matters-capability-not-latency">Why it matters: capability, not latency</h2>
<p>The interesting finding isn&rsquo;t &ldquo;the big model is better.&rdquo; It&rsquo;s <em>how</em> the small one fails.</p>
<p>With a ~3.8B model on CPU, the bottleneck for agent work isn&rsquo;t speed — it&rsquo;s <strong>capability</strong>. Phi-3.5 couldn&rsquo;t reliably emit tool calls, so n8n&rsquo;s tools never fired, and the model degraded into a chatbot that <strong>hallucinates a plausible answer instead of fetching the real one.</strong> That failure mode is worse than an error: an error you catch, a confident wrong answer you ship.</p>
<p>A couple of measurements that sharpened it:</p>
<ul>
<li>NVIDIA 70B, <strong>plain chat</strong>: ~0.5s.</li>
<li>NVIDIA 70B, <strong>function-calling</strong> (with tool schemas): ~8.6s per round-trip — and an agent makes several round-trips per answer. That&rsquo;s real latency you have to budget a timeout for. (It&rsquo;s also why the cloud side initially <em>timed out</em> in n8n until I raised the model node&rsquo;s timeout — the model was fine, n8n was cutting it off.)</li>
</ul>
<p>So the snappy-vs-slow comparison <strong>flips depending on whether the question triggers tools</strong>. Plain chat: cloud wins on speed. Tool use: the local model is &ldquo;fast&rdquo; only because it skips the tools and makes something up. Speed was never the real axis.</p>
<p>The honest caveat: this is <em>this</em> small general model in a multi-tool agent loop. Purpose-built small models with tool-calling fine-tunes do better at narrow tasks — I run a 1.7B one elsewhere that emits a single structured tool call just fine. But for &ldquo;pick the right tool from several and chain them,&rdquo; 70B was in a different league.</p>
<h2 id="the-trust-boundary">The trust boundary</h2>
<p>I gave the write tools (<code>add_batch_log</code>, <code>create_batch</code>) to the cloud agent <strong>only</strong>. The local agent is read-only — not by instruction, by wiring. Even if Phi-3.5 <em>did</em> decide to call a write tool, the connection isn&rsquo;t there. The reliable model is the only one allowed to mutate real data, and that&rsquo;s enforced structurally, not by trusting a prompt.</p>
<h2 id="whats-toy-and-whats-real">What&rsquo;s toy and what&rsquo;s real</h2>
<p>Worth being straight: this is a <strong>single-node homelab</strong>. The agent and both model paths share one box. Running n8n on Kubernetes and swapping models isn&rsquo;t novel — <a href="https://docs.n8n.io/hosting/scaling/queue-mode/">n8n&rsquo;s own docs</a> cover queue mode, where a main instance fans work out to a pool of worker pods you scale horizontally, with external Postgres for state. That&rsquo;s the real production shape. Mine is one replica with an emptyDir&rsquo;s worth of ambition.</p>
<p>What I think <em>is</em> worth sharing is the finding (the capability cliff, and that its failure mode is confident fabrication) and the boring thing underneath it: because the platform is default-deny and GitOps-reconciled, running this experiment cost me one reviewable egress line and zero risk to anything else.</p>
<h2 id="the-boring-part-is-the-point">The boring part is the point</h2>
<p>The AI was the fun bit. But the reason I could bolt an agent onto a live cluster, point it at a real app, give it write access to one model and not the other, and tear it all down again — without worrying what it might touch — is that the infrastructure was already boring. Default-deny. Secrets out of Git. <code>git push</code>, Argo reconciles.</p>
<p>The model picks the tools. The platform decides what the tools can reach. Keep those two honest about each other and self-hosting an agent stops being scary and starts being just another app.</p>
]]></content:encoded></item><item><title>📦 Five Ways to Manage Kubernetes Manifests (and Why They're Not All Equal)</title><link>https://blog.hippotion.com/posts/gitops-manifest-approaches/</link><pubDate>Fri, 10 Oct 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/gitops-manifest-approaches/</guid><description>Raw YAML, Kustomize, Helm, Jsonnet — there&amp;rsquo;s more than one way to describe what you want running in a cluster. Here&amp;rsquo;s what each actually looks like in practice and where each one breaks.</description><content:encoded><![CDATA[<h2 id="the-problem-everyone-hits">The problem everyone hits</h2>
<p>You&rsquo;ve got a Kubernetes cluster. Now you need to describe what should run in it. You write some YAML, apply it, it works.</p>
<p>Then you need a second environment. Or a second service. Or someone else joins the project and asks &ldquo;how do I add an app to this?&rdquo; and you don&rsquo;t have a good answer.</p>
<p>This is the manifest management problem, and there are five common solutions — ranging from &ldquo;this works until it doesn&rsquo;t&rdquo; to &ldquo;this is what production platforms actually look like.&rdquo;</p>
<hr>
<h2 id="approach-1-raw-manifests">Approach 1: Raw manifests</h2>
<p>The starting point for almost everyone. Write a YAML file, <code>kubectl apply -f</code>, done.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">apps/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">selector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">myapp:v1.2.3</span><span class="w">
</span></span></span></code></pre></div><p><strong>Where it works:</strong> one service, one environment, learning Kubernetes. The feedback loop is immediate — write YAML, see what happens.</p>
<p><strong>Where it breaks:</strong></p>
<ul>
<li><strong>No templating.</strong> Want to change the image tag across ten services? Ten files, ten edits, ten chances to get it wrong.</li>
<li><strong>Live state leaks in.</strong> If you export existing resources with <code>kubectl get -o yaml</code>, you get <code>resourceVersion</code>, <code>generation</code>, <code>creationTimestamp</code>, and <code>managedFields</code> in the output. Commit that to Git and you&rsquo;ve created a permanent source of conflicts — ArgoCD compares what&rsquo;s in Git against what&rsquo;s in the cluster, sees stale version counters, and the diff never clears.</li>
<li><strong>Copy-paste hell.</strong> A Deployment, a Service, an IngressRoute, a ServiceAccount, a NetworkPolicy — five files per app. Add a new app, copy five files, change the names, forget to update one. This is how environments drift apart silently.</li>
</ul>
<p>The fix for the live-state problem is: only commit desired state. Strip every field that Kubernetes manages internally back to its clean spec. It&rsquo;s tedious and easy to forget, which is exactly why people move on from raw manifests.</p>
<hr>
<h2 id="approach-2-kustomize">Approach 2: Kustomize</h2>
<p>Kustomize is built into <code>kubectl</code> (<code>kubectl apply -k</code>) and natively supported by ArgoCD. The idea: you have a <code>base/</code> with your raw manifests, and overlays that patch on top of them for different environments.</p>
<pre tabindex="0"><code>app/
├── base/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
└── overlays/
    ├── staging/
    │   ├── kustomization.yaml    # patches replicas to 1, image to :staging
    └── production/
        └── kustomization.yaml    # patches replicas to 3, image to :v1.2.3
</code></pre><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># overlays/production/kustomization.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="l">../../base</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">patches</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">patch</span><span class="p">:</span><span class="w"> </span><span class="p">|-</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">      - op: replace
</span></span></span><span class="line"><span class="cl"><span class="sd">        path: /spec/replicas
</span></span></span><span class="line"><span class="cl"><span class="sd">        value: 3</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">target</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Deployment</span><span class="w">
</span></span></span></code></pre></div><p><strong>Where it works:</strong> multi-environment setups where the difference between environments is mostly configuration values, not structure. Kustomize is good at this — you write the base once and patch only what differs.</p>
<p><strong>Where it breaks:</strong></p>
<ul>
<li><strong>No real parameterization.</strong> Kustomize patches are surgical edits, not templates. If your base structure needs to vary (different resource shapes per environment, conditional blocks), you&rsquo;re fighting the tool.</li>
<li><strong>Patching deep structures is ugly.</strong> JSON patches on nested YAML are verbose and hard to read. You end up writing more patch YAML than it would take to just copy the file.</li>
<li><strong>Still repetitive across apps.</strong> Each app still gets its own base directory. You&rsquo;re not abstracting the shared patterns across apps, only the differences between environments of the same app.</li>
</ul>
<p>Kustomize is a significant step up from raw manifests for multi-environment setups. For complex templating or platform-level abstractions, it runs out of power quickly.</p>
<hr>
<h2 id="approach-3-helm">Approach 3: Helm</h2>
<p>Helm adds real templating. Charts are parameterized bundles — templates with variables, conditionals, and loops — and values files supply the parameters.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># templates/deployment.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">apps/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Values.name }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Release.Namespace }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Values.replicas | default 1 }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Values.name }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Values.image.repository }}:{{ .Values.image.tag }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>{{- <span class="l">if .Values.resources }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">resources</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Values.resources | toYaml | nindent 12 }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>{{- <span class="l">end }}</span><span class="w">
</span></span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># values-production.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">image</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">repository</span><span class="p">:</span><span class="w"> </span><span class="l">myorg/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">tag</span><span class="p">:</span><span class="w"> </span><span class="l">v1.2.3</span><span class="w">
</span></span></span></code></pre></div><p>Helm renders the templates at deploy time. What lands in the cluster is clean rendered YAML — no internal state, no conflicts.</p>
<p><strong>Where it works:</strong> almost everywhere. The Helm Hub has charts for most common software already. For custom apps, writing a chart once and parameterizing per-environment is straightforwardly better than copying YAML.</p>
<p><strong>Where it breaks:</strong></p>
<ul>
<li><strong>Chart authoring is verbose.</strong> Writing a Helm chart from scratch involves a lot of Go templating boilerplate. For a simple app, it can feel like more scaffolding than application.</li>
<li><strong>Debugging rendered output is annoying.</strong> <code>helm template</code> is your friend, but errors in templates produce unhelpful messages. The indentation rules (<code>nindent</code>, <code>indent</code>, <code>toYaml</code>) have sharp edges.</li>
<li><strong>Values files still pile up.</strong> If every app has its own values file and there&rsquo;s no shared structure between them, you&rsquo;re back to copy-paste but now in YAML-that-configures-YAML.</li>
</ul>
<p>Helm is the right tool for most Kubernetes deployments. The ecosystem support alone (upstream charts for Postgres, Redis, Vault, every CNCF project) makes it the pragmatic default.</p>
<hr>
<h2 id="approach-4-jsonnet--cue">Approach 4: Jsonnet / CUE</h2>
<p>For teams that need programmatic config generation — actual code, not templates — Jsonnet and CUE are the serious alternatives.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-jsonnet" data-lang="jsonnet"><span class="line"><span class="cl"><span class="c1">// deployment.jsonnet
</span></span></span><span class="line"><span class="cl"><span class="k">local</span><span class="w"> </span><span class="nv">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">import</span><span class="w"> </span><span class="s">&#34;k.libsonnet&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">local</span><span class="w"> </span><span class="nf">deployment</span><span class="p">(</span><span class="nv">name</span><span class="p">,</span><span class="w"> </span><span class="nv">image</span><span class="p">,</span><span class="w"> </span><span class="nv">replicas</span><span class="o">=</span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nv">k</span><span class="p">.</span><span class="nv">apps</span><span class="p">.</span><span class="nv">v1</span><span class="p">.</span><span class="nv">deployment</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="nv">name</span><span class="p">,</span><span class="w"> </span><span class="nv">replicas</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nv">k</span><span class="p">.</span><span class="nv">core</span><span class="p">.</span><span class="nv">v1</span><span class="p">.</span><span class="nv">container</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="nv">name</span><span class="p">,</span><span class="w"> </span><span class="nv">image</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="p">]);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nv">&#34;deployment.yaml&#34;</span><span class="p">:</span><span class="w"> </span><span class="nf">deployment</span><span class="p">(</span><span class="s">&#34;myapp&#34;</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;myorg/myapp:v1.2.3&#34;</span><span class="p">,</span><span class="w"> </span><span class="nv">replicas</span><span class="o">=</span><span class="mf">3</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p><strong>Where it works:</strong> large platforms where configuration is genuinely complex — many environments, many apps, deep interdependencies. Jsonnet lets you write real functions, share libraries, compose abstractions properly.</p>
<p><strong>Where it breaks:</strong></p>
<ul>
<li><strong>Steep learning curve.</strong> Jsonnet is a full language. CUE even more so — it has types, schemas, and a constraint system that takes time to internalise.</li>
<li><strong>Small community.</strong> Excellent tooling, but you&rsquo;re solving problems that have fewer Stack Overflow answers.</li>
<li><strong>Overkill for most setups.</strong> If you&rsquo;re not managing hundreds of services across multiple clusters, Helm is simpler and has everything you need.</li>
</ul>
<p>Jsonnet is used seriously at Google-scale infrastructure teams and in some CNCF projects. For a homelab or a small-to-medium platform, it&rsquo;s the right answer to a question you probably aren&rsquo;t asking yet.</p>
<hr>
<h2 id="approach-5-app-of-apps-with-generated-application-crds">Approach 5: App-of-apps with generated Application CRDs</h2>
<p>This is the ArgoCD-native meta-layer. Instead of managing manifests, you manage <code>Application</code> resources — and potentially use a chart or tool to generate those too.</p>
<p>A naive version: commit a folder of <code>Application</code> YAML files to Git, one per service. ArgoCD watches the folder and deploys each app.</p>
<p>A more sophisticated version: one &ldquo;root app&rdquo; that points to a chart, which generates all the other <code>Application</code> resources dynamically from a single config file.</p>
<p><strong>Where it works:</strong> at the platform level, not the individual app level. App-of-apps is how you manage what ArgoCD manages, not how you write the service manifests themselves. Combined with Helm, it gives you centralized control over the entire cluster&rsquo;s structure.</p>
<p><strong>Where it breaks:</strong></p>
<ul>
<li><strong>Manual <code>Application</code> CRDs are painful.</strong> If you&rsquo;re maintaining a folder of hand-written <code>Application</code> YAML files — one per service — you&rsquo;ve traded manifest copy-paste for Application copy-paste. Each app needs its own CRD with its repo URL, path, sync policy, project reference.</li>
<li><strong>Sync ordering matters.</strong> The root app must exist before children can sync. Get the wave ordering wrong and apps try to deploy before their namespaces exist.</li>
</ul>
<hr>
<h2 id="how-this-homelab-compares">How this homelab compares</h2>
<p>My setup sits at the far end of approach 5, using Helm throughout.</p>
<p>There&rsquo;s a single <code>applications.yml</code> file that describes every service in the cluster. A root Helm chart reads it and generates all the ArgoCD <code>Application</code> and <code>AppProject</code> CRDs automatically. Adding a service means adding an entry to that file — not touching five different places across five different files.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># applications.yml — this is the entire service catalog</span><span class="w">
</span></span></span><span class="line"><span class="cl">- <span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-vaultwarden</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">networkPolicies</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">profile</span><span class="p">:</span><span class="w"> </span><span class="l">web-app</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">applications</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">applicationCode</span><span class="p">:</span><span class="w"> </span><span class="l">web-vaultwarden</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">helm-charts/extra-objects</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">autoSync</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p>That one entry generates: a Namespace, an ArgoCD AppProject, an ArgoCD Application, a set of Cilium NetworkPolicies (deny-all with ingress from Traefik and DNS/HTTPS egress), and a ServiceAccount. Nothing is written by hand.</p>
<p>The actual service manifests live in an <code>extra-objects</code> chart — a thin wrapper that renders raw YAML from values files. No templating in the service manifests themselves (they&rsquo;re simple enough not to need it), but the infrastructure scaffolding around each app is entirely generated.</p>
<p>The result: every service gets the same operational properties. Same GitOps workflow, same secret management, same network isolation, same TLS termination. The platform work was done once. Adding a new app is writing manifests for the app&rsquo;s specific behavior, not recreating the scaffolding.</p>
<hr>
<h2 id="the-honest-spectrum">The honest spectrum</h2>
<table>
	<thead>
			<tr>
					<th>Approach</th>
					<th>Templating</th>
					<th>Abstraction</th>
					<th>Ecosystem</th>
					<th>Complexity</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Raw manifests</td>
					<td>None</td>
					<td>None</td>
					<td>None</td>
					<td>Low</td>
			</tr>
			<tr>
					<td>Kustomize</td>
					<td>Patches only</td>
					<td>Overlays</td>
					<td>Medium</td>
					<td>Low-medium</td>
			</tr>
			<tr>
					<td>Helm</td>
					<td>Full</td>
					<td>Per-chart</td>
					<td>Large</td>
					<td>Medium</td>
			</tr>
			<tr>
					<td>Jsonnet/CUE</td>
					<td>Full + typed</td>
					<td>Libraries</td>
					<td>Small</td>
					<td>High</td>
			</tr>
			<tr>
					<td>App-of-apps</td>
					<td>Depends</td>
					<td>Platform-level</td>
					<td>ArgoCD-native</td>
					<td>High</td>
			</tr>
	</tbody>
</table>
<p>Most setups should start at Helm. Kustomize if you&rsquo;re multi-environment and comfortable with patching. App-of-apps when you&rsquo;re managing the platform layer, not individual services. Jsonnet/CUE when you know you&rsquo;ve outgrown Helm — which is a specific and relatively rare problem to have.</p>
<p>Raw manifests are fine for learning. They&rsquo;re the wrong answer for anything you intend to maintain.</p>
<hr>
<p><em>More on how the homelab is structured: <a href="/posts/homelab-gitops/">My Homelab Runs on GitOps</a>.</em></p>
]]></content:encoded></item><item><title>🔒 Building a PII Guardrail Proxy for Cloud LLM Calls</title><link>https://blog.hippotion.com/posts/ai-pii-guardrail-proxy/</link><pubDate>Fri, 26 Sep 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/ai-pii-guardrail-proxy/</guid><description>A local model classifies every prompt before it leaves the cluster. If it&amp;rsquo;s sensitive, it&amp;rsquo;s blocked. If it&amp;rsquo;s clean, it goes to NVIDIA NIM. 150 lines of FastAPI, deployed on k3s.</description><content:encoded><![CDATA[<h2 id="the-problem-with-cloud-llm-access">The problem with cloud LLM access</h2>
<p>Running a local model is great for privacy. But local models hit a ceiling — for the heavy lifting, you want a cloud API like NVIDIA NIM with Llama 3.3 70B.</p>
<p>The moment you open that channel, you have a new risk: what if someone (or some automation) accidentally pastes a password, a private key, or someone&rsquo;s personal data into the chat? It leaves the cluster. It&rsquo;s logged somewhere you don&rsquo;t control.</p>
<p>The standard answer is &ldquo;train your users.&rdquo; I&rsquo;d rather have a technical control.</p>
<h2 id="the-architecture">The architecture</h2>
<pre tabindex="0"><code>Open WebUI → ai-guard proxy
                 │
        ┌────────┴────────┐
        │                 │
  llama-server       if SAFE:
  (classify)         forward to NVIDIA NIM
        │
   if SENSITIVE:
   block + explain
</code></pre><p>Every request to NVIDIA NIM goes through ai-guard first. ai-guard pulls the user message, sends it to the local llama.cpp server with a classification prompt, and makes a binary decision:</p>
<ul>
<li><code>SAFE</code> → forward to NVIDIA NIM with the real API key (which ai-guard holds, not the client)</li>
<li><code>SENSITIVE: &lt;reason&gt;</code> → return HTTP 400, log the block, nothing leaves the cluster</li>
</ul>
<p>The local model is already running for inference — this reuses it as a privacy gatekeeper at zero extra infrastructure cost.</p>
<h2 id="the-implementation">The implementation</h2>
<p>The proxy is ~150 lines of FastAPI. The classifier call:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">CLASSIFIER_PROMPT</span> <span class="o">=</span> <span class="s2">&#34;&#34;&#34;You are a data security classifier. Check if the text below contains sensitive information:
</span></span></span><span class="line"><span class="cl"><span class="s2">passwords, API keys, tokens, credentials, personal identifiable information (names, emails, phone numbers, SSNs, addresses), financial data (card numbers, bank accounts), or private keys.
</span></span></span><span class="line"><span class="cl"><span class="s2">
</span></span></span><span class="line"><span class="cl"><span class="s2">Reply with ONLY one of:
</span></span></span><span class="line"><span class="cl"><span class="s2">SAFE
</span></span></span><span class="line"><span class="cl"><span class="s2">SENSITIVE: &lt;one-line reason&gt;
</span></span></span><span class="line"><span class="cl"><span class="s2">
</span></span></span><span class="line"><span class="cl"><span class="s2">Text to check:
</span></span></span><span class="line"><span class="cl"><span class="s2">&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">async</span> <span class="k">def</span> <span class="nf">classify</span><span class="p">(</span><span class="n">text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">bool</span><span class="p">,</span> <span class="nb">str</span><span class="p">]:</span>
</span></span><span class="line"><span class="cl">    <span class="k">async</span> <span class="k">with</span> <span class="n">httpx</span><span class="o">.</span><span class="n">AsyncClient</span><span class="p">(</span><span class="n">timeout</span><span class="o">=</span><span class="mi">60</span><span class="p">)</span> <span class="k">as</span> <span class="n">client</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">resp</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="o">.</span><span class="n">post</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">LLAMA_BASE</span><span class="si">}</span><span class="s2">/chat/completions&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="n">json</span><span class="o">=</span><span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="s2">&#34;model&#34;</span><span class="p">:</span> <span class="s2">&#34;phi-3.5-mini&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                <span class="s2">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="n">CLASSIFIER_PROMPT</span> <span class="o">+</span> <span class="n">text</span><span class="p">[:</span><span class="mi">3000</span><span class="p">]}],</span>
</span></span><span class="line"><span class="cl">                <span class="s2">&#34;max_tokens&#34;</span><span class="p">:</span> <span class="mi">30</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                <span class="s2">&#34;temperature&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                <span class="s2">&#34;stream&#34;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="cl">            <span class="n">headers</span><span class="o">=</span><span class="p">{</span><span class="s2">&#34;Authorization&#34;</span><span class="p">:</span> <span class="s2">&#34;Bearer sk-no-key&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">answer</span> <span class="o">=</span> <span class="n">resp</span><span class="o">.</span><span class="n">json</span><span class="p">()[</span><span class="s2">&#34;choices&#34;</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="s2">&#34;message&#34;</span><span class="p">][</span><span class="s2">&#34;content&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">answer</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s2">&#34;SENSITIVE&#34;</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">reason</span> <span class="o">=</span> <span class="n">answer</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&#34;:&#34;</span><span class="p">,</span> <span class="mi">1</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">if</span> <span class="s2">&#34;:&#34;</span> <span class="ow">in</span> <span class="n">answer</span> <span class="k">else</span> <span class="s2">&#34;sensitive content detected&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">True</span><span class="p">,</span> <span class="n">reason</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">False</span><span class="p">,</span> <span class="s2">&#34;&#34;</span>
</span></span></code></pre></div><p><code>temperature=0</code> and <code>max_tokens=30</code> keep the response deterministic and fast. The model only needs to output one word or one line.</p>
<p>The main handler:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nd">@app.post</span><span class="p">(</span><span class="s2">&#34;/v1/chat/completions&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">async</span> <span class="k">def</span> <span class="nf">proxy_chat</span><span class="p">(</span><span class="n">request</span><span class="p">:</span> <span class="n">Request</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">body</span> <span class="o">=</span> <span class="k">await</span> <span class="n">request</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">user_text</span> <span class="o">=</span> <span class="n">extract_user_text</span><span class="p">(</span><span class="n">body</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;messages&#34;</span><span class="p">,</span> <span class="p">[]))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">user_text</span><span class="o">.</span><span class="n">strip</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">        <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">is_sensitive</span><span class="p">,</span> <span class="n">reason</span> <span class="o">=</span> <span class="k">await</span> <span class="n">classify</span><span class="p">(</span><span class="n">user_text</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">exc</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">log</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="s2">&#34;classifier error: </span><span class="si">%s</span><span class="s2"> — allowing request through&#34;</span><span class="p">,</span> <span class="n">exc</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="n">is_sensitive</span> <span class="o">=</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">is_sensitive</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="n">JSONResponse</span><span class="p">(</span><span class="n">status_code</span><span class="o">=</span><span class="mi">400</span><span class="p">,</span> <span class="n">content</span><span class="o">=</span><span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="s2">&#34;error&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="s2">&#34;message&#34;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&#34;Request blocked by ai-guard: </span><span class="si">{</span><span class="n">reason</span><span class="si">}</span><span class="s2">. Remove sensitive content before sending to external models.&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                    <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;content_policy_violation&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="p">})</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Safe — forward to upstream with streaming support</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span></code></pre></div><p>Fail-open: if the classifier itself errors (llama-server down, timeout), the request goes through and the error is logged. Fail-closed would be safer for high-stakes environments, but this is a homelab and I&rsquo;d rather not block all cloud LLM access because the local model is warming up.</p>
<h2 id="kubernetes-deployment">Kubernetes deployment</h2>
<p>ai-guard runs in the same namespace as llama-server and Open WebUI (<code>web-ai-engine</code>). Intra-namespace traffic is always allowed in Cilium, so no new network policy needed.</p>
<p>Open WebUI uses semicolon-separated lists for multiple API backends:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl">- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">OPENAI_API_BASE_URLS</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">value</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;http://llama-server.web-ai-engine.svc:8080/v1;http://ai-guard.web-ai-engine.svc:8080/v1&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl">- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">OPENAI_API_KEYS</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">value</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;sk-no-key;sk-no-key&#34;</span><span class="w">
</span></span></span></code></pre></div><p>The second entry is ai-guard. Open WebUI passes <code>sk-no-key</code> as the API key — ai-guard ignores it and uses its own <code>UPSTREAM_API_KEY</code> from a Kubernetes Secret (pulled from Vault via External Secrets Operator). The real NVIDIA API key never touches the client.</p>
<h2 id="the-latency-tradeoff">The latency tradeoff</h2>
<p>The classification step adds 5–15 seconds on CPU inference. That&rsquo;s the cost of keeping the check fully private — the classifier never sends data anywhere.</p>
<p>For a personal homelab assistant, this is fine. For a high-throughput production setup, you&rsquo;d want the classifier on a GPU or a dedicated smaller model purpose-built for classification.</p>
<h2 id="what-it-catches">What it catches</h2>
<p>The classifier prompt targets:</p>
<ul>
<li>Passwords, API keys, tokens, credentials</li>
<li>PII: names, emails, phone numbers, SSNs, addresses</li>
<li>Financial data: card numbers, bank accounts</li>
<li>Private keys</li>
</ul>
<p>False negatives are possible — no classifier is perfect. This is a first line of defense, not a compliance control. The value is catching the obvious, accidental leaks.</p>
<h2 id="source">Source</h2>
<p><a href="https://github.com/janos-gyorgy/ai-guard">github.com/janos-gyorgy/ai-guard</a> — MIT licensed, Kubernetes manifests included.</p>
]]></content:encoded></item><item><title>🕵️ Privacy-Preserving LLM Pipelines: Anonymize Before You Send</title><link>https://blog.hippotion.com/posts/llm-anonymizer-privacy-pipeline/</link><pubDate>Fri, 12 Sep 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/llm-anonymizer-privacy-pipeline/</guid><description>Replace PII with semantically realistic fakes before sending to a cloud LLM, then restore the originals from the response. Started with a general model and prompt engineering — then upgraded to a purpose-built 1.7B fine-tune via Ollama.</description><content:encoded><![CDATA[<h2 id="the-problem-with-blocking">The problem with blocking</h2>
<p>The <a href="/posts/ai-pii-guardrail-proxy/">PII guardrail proxy I built last week</a> works by classifying prompts and blocking the sensitive ones. That&rsquo;s fine for a chat interface where a human can rephrase. It doesn&rsquo;t work for automated pipelines.</p>
<p>If a Jira ticket contains someone&rsquo;s name and an internal hostname, you don&rsquo;t want the agent to fail — you want it to process the ticket without exposing that data. Blocking is the wrong primitive for pipelines. Anonymization is the right one.</p>
<h2 id="the-pattern">The pattern</h2>
<pre tabindex="0"><code>Input text
  → anonymizer: extract PII, replace with semantic fakes
  → &#34;Nathan Chen from DataSoft LLC needs ProjectX fixed on dev.internal.net&#34;
  + mapping: {&#34;Nathan Chen&#34; → &#34;John Smith&#34;, &#34;DataSoft LLC&#34; → &#34;ACME&#34;, ...}
  → cloud LLM: processes coherent text, never sees real values
  → &#34;Nathan Chen should check the ProjectX docs with the DataSoft LLC team&#34;
  → string substitution with reverse mapping
  → &#34;John Smith should check the OAuth docs with the ACME team&#34;
</code></pre><p>Two things that make this work:</p>
<p><strong>Deanonymization needs no LLM.</strong> Once you have the mapping, restoring is pure string substitution. The model call only happens on the way in.</p>
<p><strong>Semantic fakes beat placeholder tokens.</strong> An earlier version of this used <code>[PERSON_1]</code>, <code>[ORG_1]</code> tokens. The problem: cloud models see bracketed text and subtly change behaviour — shorter responses, hedging, dropped context. When the cloud model sees <code>Nathan Chen from DataSoft LLC</code>, it treats it as real text and responds naturally. Quality is noticeably better.</p>
<h2 id="prior-art--what-already-exists">Prior art — what already exists</h2>
<p>This is a well-established pattern. Worth knowing what&rsquo;s out there:</p>
<p><strong><a href="https://llm-guard.com/output_scanners/deanonymize/">LLM Guard</a></strong> (Protect AI) — the most complete open-source implementation. Anonymize + Deanonymize scanner pair with a Vault for the mapping. Production-grade, actively maintained. Start here if you&rsquo;re building this for anything serious.</p>
<p><strong><a href="https://techcommunity.microsoft.com/blog/azuredevcommunityblog/introducing-pii-shield-a-privacy-proxy-for-every-llm-call/4514726">Microsoft PII Shield</a></strong> — session-based proxy. Returns a session ID with the anonymized text, uses it to deanonymize the response.</p>
<p><strong><a href="https://github.com/fsndzomga/anonLLM">anonLLM</a></strong> — uses GLiNER (a proper NER model) + Faker for realistic replacements. Better accuracy than a general chat model.</p>
<p><strong><a href="https://ieeexplore.ieee.org/document/11140717/">REDACT</a></strong> — IEEE paper describing a system using Ollama for PII redaction in documents.</p>
<p><strong><a href="https://huggingface.co/blog/pratyushrt/anonymizerslm">HuggingFace Anonymizer SLM series</a></strong> — purpose-built models (0.6B/1.7B/4B) fine-tuned specifically for anonymization. 9.20/10 quality score for 1.7B, close to GPT-4.1&rsquo;s 9.77.</p>
<p>That last one is what this implementation actually uses.</p>
<h2 id="the-model-anonymizer-17b">The model: Anonymizer-1.7B</h2>
<p><a href="https://huggingface.co/eternisai/Anonymizer-1.7B">eternisai/Anonymizer-1.7B</a> is a Qwen3-1.7B fine-tune trained on ~30k anonymization samples using GRPO with GPT-4.1 as judge. It outputs structured tool calls instead of free text:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;replace_entities&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;arguments&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;replacements&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl">      <span class="p">{</span><span class="nt">&#34;original&#34;</span><span class="p">:</span> <span class="s2">&#34;John Smith&#34;</span><span class="p">,</span> <span class="nt">&#34;replacement&#34;</span><span class="p">:</span> <span class="s2">&#34;Nathan Chen&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">      <span class="p">{</span><span class="nt">&#34;original&#34;</span><span class="p">:</span> <span class="s2">&#34;ACME Corp&#34;</span><span class="p">,</span> <span class="nt">&#34;replacement&#34;</span><span class="p">:</span> <span class="s2">&#34;DataSoft LLC&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">      <span class="p">{</span><span class="nt">&#34;original&#34;</span><span class="p">:</span> <span class="s2">&#34;auth.acme.internal&#34;</span><span class="p">,</span> <span class="nt">&#34;replacement&#34;</span><span class="p">:</span> <span class="s2">&#34;dev.internal.net&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="p">]</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>No prompt engineering needed. The model knows exactly what it&rsquo;s doing and outputs a structured contract. Compare that to the first version of this service, which sent a long JSON-format prompt to Phi-3.5-mini and hoped the output parsed correctly.</p>
<p>The model runs via Ollama (which handles the Qwen3 chat template and tool calling natively), pointed at the GGUF version from HuggingFace: <code>hf.co/gabriellarson/Anonymizer-1.7B-GGUF</code>.</p>
<h2 id="the-implementation">The implementation</h2>
<p><code>llm-anonymizer</code> is a FastAPI service with two endpoints.</p>
<p><strong><code>POST /anonymize</code></strong> — calls Ollama with the tool definition, parses the response:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">TOOLS</span> <span class="o">=</span> <span class="p">[{</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;function&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;function&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;replace_entities&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;description&#34;</span><span class="p">:</span> <span class="s2">&#34;Replace PII entities with anonymized versions&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;parameters&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;object&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="s2">&#34;properties&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="s2">&#34;replacements&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;array&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                    <span class="s2">&#34;items&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                        <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;object&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                        <span class="s2">&#34;properties&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                            <span class="s2">&#34;original&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">                            <span class="s2">&#34;replacement&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">                        <span class="p">},</span>
</span></span><span class="line"><span class="cl">                        <span class="s2">&#34;required&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;original&#34;</span><span class="p">,</span> <span class="s2">&#34;replacement&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="cl">                    <span class="p">},</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="cl">            <span class="s2">&#34;required&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;replacements&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">}]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">resp</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">OLLAMA_BASE</span><span class="si">}</span><span class="s2">/api/chat&#34;</span><span class="p">,</span> <span class="n">json</span><span class="o">=</span><span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;model&#34;</span><span class="p">:</span> <span class="n">MODEL</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl">        <span class="p">{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="n">SYSTEM_PROMPT</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">        <span class="p">{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="n">text</span> <span class="o">+</span> <span class="s2">&#34;</span><span class="se">\n</span><span class="s2">/no_think&#34;</span><span class="p">},</span>  <span class="c1"># skip Qwen3 thinking mode</span>
</span></span><span class="line"><span class="cl">    <span class="p">],</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;tools&#34;</span><span class="p">:</span> <span class="n">TOOLS</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;stream&#34;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">})</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">tool_calls</span> <span class="o">=</span> <span class="n">resp</span><span class="o">.</span><span class="n">json</span><span class="p">()[</span><span class="s2">&#34;message&#34;</span><span class="p">][</span><span class="s2">&#34;tool_calls&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="n">replacements</span> <span class="o">=</span> <span class="n">tool_calls</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s2">&#34;function&#34;</span><span class="p">][</span><span class="s2">&#34;arguments&#34;</span><span class="p">][</span><span class="s2">&#34;replacements&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Build reverse mapping: replacement → original (for deanonymization)</span>
</span></span><span class="line"><span class="cl"><span class="n">anonymized</span> <span class="o">=</span> <span class="n">text</span>
</span></span><span class="line"><span class="cl"><span class="n">mapping</span> <span class="o">=</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl"><span class="k">for</span> <span class="n">pair</span> <span class="ow">in</span> <span class="n">replacements</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">anonymized</span> <span class="o">=</span> <span class="n">anonymized</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="n">pair</span><span class="p">[</span><span class="s2">&#34;original&#34;</span><span class="p">],</span> <span class="n">pair</span><span class="p">[</span><span class="s2">&#34;replacement&#34;</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">    <span class="n">mapping</span><span class="p">[</span><span class="n">pair</span><span class="p">[</span><span class="s2">&#34;replacement&#34;</span><span class="p">]]</span> <span class="o">=</span> <span class="n">pair</span><span class="p">[</span><span class="s2">&#34;original&#34;</span><span class="p">]</span>
</span></span></code></pre></div><p>The <code>/no_think</code> suffix tells the model to skip its chain-of-thought — faster response, same accuracy for this task.</p>
<p><strong><code>POST /deanonymize</code></strong> — no model call, just substitution:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">for</span> <span class="n">replacement</span><span class="p">,</span> <span class="n">original</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">mapping</span><span class="o">.</span><span class="n">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">reverse</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">text</span> <span class="o">=</span> <span class="n">text</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="n">replacement</span><span class="p">,</span> <span class="n">original</span><span class="p">)</span>
</span></span></code></pre></div><p>Sorted by length descending so longer tokens don&rsquo;t get partially overwritten by shorter ones.</p>
<h2 id="the-kubernetes-stack">The Kubernetes stack</h2>
<p>Ollama runs as a separate deployment in the same namespace as everything else (<code>web-ai-engine</code>). Intra-namespace traffic is always allowed — no new network policies.</p>
<pre tabindex="0"><code>llm-anonymizer (FastAPI) → Ollama (port 11434) → Anonymizer-1.7B GGUF
</code></pre><p>One-time model pull after first deploy:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">kubectl <span class="nb">exec</span> -n web-ai-engine deploy/ollama -- <span class="se">\
</span></span></span><span class="line"><span class="cl">  ollama pull hf.co/gabriellarson/Anonymizer-1.7B-GGUF
</span></span></code></pre></div><p>Ollama caches it on a 10Gi PVC, so pod restarts don&rsquo;t re-download.</p>
<h2 id="the-n8n-pipeline">The n8n pipeline</h2>
<p>Five-node chain triggered by webhook:</p>
<pre tabindex="0"><code>Webhook → /anonymize → NVIDIA NIM → /deanonymize → Respond
</code></pre><p>The NVIDIA NIM call includes a system prompt instructing it to treat the text as normal input. No mention of tokens, no special handling — because the text looks like real text.</p>
<p>Wire any upstream source to the webhook: Jira event, Slack slash command, a scheduled job that processes internal docs. The pipeline is source-agnostic.</p>
<h2 id="the-caveats">The caveats</h2>
<p><strong>1.7B isn&rsquo;t GPT-4.1.</strong> The model scores 9.20/10 on the benchmark — which means roughly 1 in 10 cases has a missed or incorrect entity. Test with real examples from your domain before depending on it.</p>
<p><strong>Deanonymization breaks on heavy rephrasing.</strong> If the cloud model restructures a sentence enough that the fake value no longer appears verbatim, the substitution silently misses it. The prompt helps but doesn&rsquo;t eliminate the risk.</p>
<p><strong>Ollama adds a deployment.</strong> It&rsquo;s ~500MB image + the model weights (~1GB Q4). On a constrained single-node cluster that&rsquo;s real overhead. llama-server already covers general chat; Ollama is purely for this model&rsquo;s tool-calling support.</p>
<h2 id="source">Source</h2>
<p><a href="https://github.com/janos-gyorgy/llm-anonymizer">github.com/janos-gyorgy/llm-anonymizer</a> — MIT licensed, Kubernetes manifests and n8n workflow included.</p>
]]></content:encoded></item><item><title>📈 Observing Local LLM Inference: llama.cpp's Built-in Prometheus Metrics</title><link>https://blog.hippotion.com/posts/llm-observability-llamacpp-prometheus/</link><pubDate>Fri, 29 Aug 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/llm-observability-llamacpp-prometheus/</guid><description>llama.cpp&amp;rsquo;s inference server ships a /metrics endpoint. One flag, Prometheus scraping, a Grafana dashboard loaded via ConfigMap sidecar — AI observability without a proxy layer.</description><content:encoded><![CDATA[<h2 id="what-operating-an-llm-actually-means">What &ldquo;operating an LLM&rdquo; actually means</h2>
<p>Running a local model is easy. Understanding what it&rsquo;s doing is less so.</p>
<p>After deploying llama.cpp + Open WebUI on k3s (<a href="/posts/local-llm-k8s-no-gpu/">previous post</a>), I had a chat interface backed by a local model. What I didn&rsquo;t have: any visibility into how the model was behaving — whether requests were queuing, how fast tokens were being generated, how much of the context window was in use.</p>
<p>The instinct for this kind of problem is usually &ldquo;add a proxy layer.&rdquo; There are several tools in this space — LiteLLM being the most popular — that sit between the client and the inference server and record token counts, latency, and spend. I tried this first. LiteLLM OOMed at startup on a node already at 76% memory. Heavy Python import tree, not a lot of headroom.</p>
<p>The thing I&rsquo;d missed: llama.cpp ships a Prometheus metrics endpoint. No proxy required.</p>
<hr>
<h2 id="--metrics"><code>--metrics</code></h2>
<p>One additional argument to the inference server:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">args</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- -<span class="l">m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="l">/models/Phi-3.5-mini-instruct-Q4_K_M.gguf</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- --<span class="l">host</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;0.0.0.0&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- --<span class="l">port</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;8080&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- --<span class="l">ctx-size</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;4096&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- --<span class="kc">n</span>-<span class="l">predict</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;1024&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- --<span class="l">parallel</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;1&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- --<span class="l">metrics       </span><span class="w"> </span><span class="c"># ← this</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- --<span class="l">log-disable</span><span class="w">
</span></span></span></code></pre></div><p>After restart, <code>GET /metrics</code> on port 8080 returns valid Prometheus exposition format:</p>
<pre tabindex="0"><code># HELP llamacpp:tokens_predicted_total Number of generation tokens processed.
# TYPE llamacpp:tokens_predicted_total counter
llamacpp:tokens_predicted_total 0

# HELP llamacpp:predicted_tokens_seconds Average generation throughput in tokens/s.
# TYPE llamacpp:predicted_tokens_seconds gauge
llamacpp:predicted_tokens_seconds 0

# HELP llamacpp:requests_processing Number of requests processing.
# TYPE llamacpp:requests_processing gauge
llamacpp:requests_processing 0
</code></pre><p>The full set of metrics:</p>
<table>
	<thead>
			<tr>
					<th>Metric</th>
					<th>Type</th>
					<th>What it measures</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td><code>llamacpp:prompt_tokens_total</code></td>
					<td>counter</td>
					<td>Input tokens processed (cumulative)</td>
			</tr>
			<tr>
					<td><code>llamacpp:tokens_predicted_total</code></td>
					<td>counter</td>
					<td>Output tokens generated (cumulative)</td>
			</tr>
			<tr>
					<td><code>llamacpp:prompt_tokens_seconds</code></td>
					<td>gauge</td>
					<td>Current prompt throughput (tok/s)</td>
			</tr>
			<tr>
					<td><code>llamacpp:predicted_tokens_seconds</code></td>
					<td>gauge</td>
					<td>Current generation throughput (tok/s)</td>
			</tr>
			<tr>
					<td><code>llamacpp:tokens_predicted_seconds_total</code></td>
					<td>counter</td>
					<td>Total time spent generating</td>
			</tr>
			<tr>
					<td><code>llamacpp:prompt_seconds_total</code></td>
					<td>counter</td>
					<td>Total time spent on prompts</td>
			</tr>
			<tr>
					<td><code>llamacpp:requests_processing</code></td>
					<td>gauge</td>
					<td>Requests currently being processed</td>
			</tr>
			<tr>
					<td><code>llamacpp:requests_deferred</code></td>
					<td>gauge</td>
					<td>Requests queued, waiting for a slot</td>
			</tr>
			<tr>
					<td><code>llamacpp:n_decode_total</code></td>
					<td>counter</td>
					<td>Total llama_decode() calls</td>
			</tr>
			<tr>
					<td><code>llamacpp:n_busy_slots_per_decode</code></td>
					<td>counter</td>
					<td>Slots active per decode call</td>
			</tr>
	</tbody>
</table>
<p>These cover the metrics that matter for a personal inference server: throughput, latency (derivable from total time / total tokens), and queue depth.</p>
<hr>
<h2 id="prometheus-scrape-config">Prometheus scrape config</h2>
<p>Adding a static scrape target in the existing Prometheus configuration:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">extraScrapeConfigs</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">  - job_name: llama-server
</span></span></span><span class="line"><span class="cl"><span class="sd">    static_configs:
</span></span></span><span class="line"><span class="cl"><span class="sd">      - targets:
</span></span></span><span class="line"><span class="cl"><span class="sd">          - llama-server.web-ai-engine.svc:8080
</span></span></span><span class="line"><span class="cl"><span class="sd">    metrics_path: /metrics</span><span class="w">
</span></span></span></code></pre></div><p>The only non-obvious thing here is the network policy: Prometheus lives in <code>dashboard-homelab</code>, and llama-server lives in <code>web-ai-engine</code>. With Cilium network policies enforcing namespace isolation, the dashboard namespace needs to be allowed to make inbound connections to the AI engine namespace. In <code>applications.yml</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl">- <span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-ai-engine</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">networkPolicies</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">allowIngressFromNamespaces</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">dashboard-homelab]</span><span class="w">
</span></span></span></code></pre></div><p>Without this, Prometheus scrape attempts fail silently with a timeout.</p>
<hr>
<h2 id="grafana-dashboard-via-configmap">Grafana dashboard via ConfigMap</h2>
<p>Rather than importing a dashboard JSON manually through the Grafana UI, the Grafana sidecar handles it automatically. Any ConfigMap with the label <code>grafana_dashboard: &quot;1&quot;</code> is picked up, loaded, and available in Grafana — across all namespaces by default.</p>
<p>The dashboard ConfigMap lives in <code>web-ai-engine</code>, not <code>dashboard-homelab</code>. The sidecar finds it regardless:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ConfigMap</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">grafana-dashboard-llm</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-ai-engine</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">grafana_dashboard</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;1&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">llm-metrics.json</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">    {
</span></span></span><span class="line"><span class="cl"><span class="sd">      &#34;title&#34;: &#34;LLM Metrics&#34;,
</span></span></span><span class="line"><span class="cl"><span class="sd">      &#34;uid&#34;: &#34;llm-metrics&#34;,
</span></span></span><span class="line"><span class="cl"><span class="sd">      ...
</span></span></span><span class="line"><span class="cl"><span class="sd">    }</span><span class="w">
</span></span></span></code></pre></div><p>Argo CD reconciles the ConfigMap. The Grafana sidecar picks it up. The dashboard appears. No manual steps, no Grafana UI interaction, no state outside Git.</p>
<p>This means the dashboard is version-controlled, reproducible on cluster rebuild, and consistent across environments. The same YAML that describes the app&rsquo;s Kubernetes resources also describes what the monitoring looks like.</p>
<hr>
<h2 id="what-the-dashboard-shows">What the dashboard shows</h2>
<p>After sending a few messages through Open WebUI:</p>
<p><strong>Generation throughput</strong> — the <code>llamacpp:predicted_tokens_seconds</code> gauge drops to 0 between requests and spikes during generation. On this hardware (Intel N100, CPU-only inference, Phi-3.5-mini Q4_K_M), it reads 3–5 tok/s during active generation. This is the number to watch if you&rsquo;re comparing models or quantisation levels.</p>
<p><strong>Cumulative tokens</strong> — <code>llamacpp:prompt_tokens_total</code> and <code>llamacpp:tokens_predicted_total</code> both increase monotonically. The ratio between them is roughly the input/output ratio of your usage pattern. For conversational use it&rsquo;s typically 3:1 prompt to generation; for summarisation tasks it flips.</p>
<p><strong>Queue depth</strong> — <code>llamacpp:requests_deferred</code> is 0 almost always, which is expected with <code>--parallel 1</code>. If it&rsquo;s consistently above 0, you have more concurrent users than the server can handle with the current slot configuration.</p>
<p><strong>ms/token</strong> — derived from <code>rate(llamacpp:tokens_predicted_seconds_total[5m]) / rate(llamacpp:tokens_predicted_total[5m]) * 1000</code>. This is the per-token latency, which is the number that governs whether the response feels fast or slow. 200–300ms/token feels instant; above 400ms you start noticing.</p>
<hr>
<h2 id="whats-missing-compared-to-a-proxy-layer">What&rsquo;s missing compared to a proxy layer</h2>
<p>LiteLLM and similar proxies give you things this setup doesn&rsquo;t:</p>
<ul>
<li><strong>Per-model routing</strong> — if you&rsquo;re running multiple models, a proxy can route requests to the right one. With a single model, irrelevant.</li>
<li><strong>Virtual API keys</strong> — per-user or per-application key scoping. Not needed when the whole thing is behind SSO.</li>
<li><strong>Spend tracking</strong> — meaningful when you&rsquo;re paying per token. For a local model, the cost is electricity, which Prometheus already covers through the power monitoring dashboard.</li>
</ul>
<p>For a single-model homelab, the native metrics are sufficient. If I add more models later or need per-user attribution, a proxy layer becomes worth the RAM.</p>
<hr>
<h2 id="the-pattern">The pattern</h2>
<p>The broader point is that the observable unit here isn&rsquo;t the proxy — it&rsquo;s the inference server itself. Scraping llama.cpp directly means the metrics survive proxy changes, backend swaps, or routing redesigns. The inference server is the thing doing the work; it&rsquo;s the right place to measure.</p>
<p>Starter manifests with the metrics configuration included: <a href="https://github.com/janos-gyorgy/homelab-ai-inference-starter">homelab-ai-inference-starter</a></p>
]]></content:encoded></item><item><title>🤖 Local LLM Inference on Kubernetes, No GPU Required</title><link>https://blog.hippotion.com/posts/local-llm-k8s-no-gpu/</link><pubDate>Fri, 15 Aug 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/local-llm-k8s-no-gpu/</guid><description>A CPU-only self-hosted LLM stack running on k3s: llama.cpp as the inference server, Open WebUI as the chat interface, deployed as a single Git push.</description><content:encoded><![CDATA[<h2 id="the-gpu-assumption">The GPU assumption</h2>
<p>Most write-ups about self-hosting LLMs start with a GPU. A 3090, an A100, at minimum something with CUDA. The implication is that without one you&rsquo;re wasting your time — inference will be too slow to be useful.</p>
<p>That&rsquo;s not been my experience.</p>
<p>I&rsquo;ve been running a local LLM stack on a ThinkCentre mini PC (Intel N100, 16 GB RAM, no discrete GPU) for a few months. The model is Phi-3.5-mini-instruct, 3.8 billion parameters, 4-bit quantised. Response time is 3–6 tokens per second on CPU — slow enough that you notice it, fast enough that you use it. For the things I actually reach for a local model to do — rephrase something, summarise a document, explain a config option without sending it to an external API — the latency is fine.</p>
<p>The point isn&rsquo;t that CPU inference beats GPU inference. It&rsquo;s that &ldquo;good enough for personal use&rdquo; is a much lower bar than &ldquo;production LLM serving&rdquo;, and the hardware you already have probably clears it.</p>
<hr>
<h2 id="the-stack">The stack</h2>
<p>Two components:</p>
<p><strong>llama.cpp</strong> (<code>ghcr.io/ggml-org/llama.cpp:server</code>) — inference server that loads a GGUF model file and exposes an OpenAI-compatible REST API. No Python, no framework overhead, minimal memory footprint beyond the model itself.</p>
<p><strong>Open WebUI</strong> (<code>ghcr.io/open-webui/open-webui</code>) — a polished chat interface that speaks OpenAI API format. It points at the llama-server endpoint as its backend, handles conversation history, and supports RAG file uploads out of the box.</p>
<p>The architecture is simple on purpose:</p>
<pre tabindex="0"><code>Browser → Open WebUI (:80)
              │
              │  OpenAI-compatible API
              ▼
         llama-server (:8080)
              │
              │  reads GGUF model file
              ▼
         hostPath /srv/ai-models
</code></pre><p>Open WebUI doesn&rsquo;t know or care that the backend is llama.cpp running on CPU. It sees an OpenAI-compatible API. This matters: if I swap llama-server for Ollama, vLLM, or a cloud endpoint, the frontend doesn&rsquo;t change. The interface is the standard.</p>
<hr>
<h2 id="model-choice">Model choice</h2>
<p>GGUF models on Hugging Face are available at multiple quantisation levels. The trade-off is quality vs. RAM:</p>
<table>
	<thead>
			<tr>
					<th>Model</th>
					<th>Quant</th>
					<th>Size</th>
					<th>RAM at runtime</th>
					<th>Notes</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Llama-3.2-3B</td>
					<td>Q4_K_M</td>
					<td>~2 GB</td>
					<td>~3 GB</td>
					<td>Fastest, lowest quality</td>
			</tr>
			<tr>
					<td>Phi-3.5-mini</td>
					<td>Q4_K_M</td>
					<td>~2.4 GB</td>
					<td>~3–4 GB</td>
					<td>Good balance — what I use</td>
			</tr>
			<tr>
					<td>Mistral-7B-Instruct</td>
					<td>Q4_K_M</td>
					<td>~4.1 GB</td>
					<td>~5–6 GB</td>
					<td>Noticeably better, needs more RAM</td>
			</tr>
			<tr>
					<td>Llama-3.1-8B</td>
					<td>Q4_K_M</td>
					<td>~4.7 GB</td>
					<td>~6–8 GB</td>
					<td>High quality, stretches 16 GB with other workloads</td>
			</tr>
	</tbody>
</table>
<p>On 16 GB RAM with a full k3s stack running alongside (Argo CD, Traefik, Vault, Prometheus, etc.), Phi-3.5-mini leaves enough headroom that the cluster stays stable. Mistral-7B works too, but it&rsquo;s tighter.</p>
<p>Models live in <code>/srv/ai-models</code> on the node, mounted into the pod as a <code>hostPath</code> volume. Single-node homelab, so there&rsquo;s no scheduling concern. Download once with <code>wget</code>, done.</p>
<hr>
<h2 id="key-configuration-choices">Key configuration choices</h2>
<p><strong>Context size (<code>--ctx-size 4096</code>):</strong> How many tokens the model holds in its attention window. Larger context = more RAM + slower inference. 4096 is fine for conversational use. If you&rsquo;re summarising long documents, bump to 8192 and watch your RAM usage.</p>
<p><strong>Max output tokens (<code>--n-predict 1024</code>):</strong> Hard cap on response length. llama.cpp will stop there even mid-sentence. 1024 is usually enough; increase if you find it cutting off long explanations.</p>
<p><strong>Parallel slots (<code>--parallel 1</code>):</strong> How many concurrent inference requests the server handles. On CPU there&rsquo;s no benefit to more than 1 — each slot competes for the same cores. Leave it at 1.</p>
<p><strong>Memory limits:</strong> Set the container limit to roughly 2× the model&rsquo;s file size. A 2.4 GB GGUF typically uses 3–4 GB at runtime with context loaded.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">requests</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="l">500m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l">1Gi</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">limits</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l">6Gi</span><span class="w">
</span></span></span></code></pre></div><p>No CPU limit. llama-server will use however many cores are available during inference — that&rsquo;s what makes it usable. A CPU limit would throttle inference to unusable speeds.</p>
<hr>
<h2 id="deployment-as-a-gitops-push">Deployment as a GitOps push</h2>
<p>The whole stack lives in one YAML values file, deployed through the <a href="https://github.com/janos-gyorgy/gitops-extra-objects-chart">extra-objects chart</a> that I use for raw manifests across the cluster. Argo CD watches the repo and reconciles automatically.</p>
<p>Nothing was <code>kubectl apply</code>-ed. The deployment happened by pushing to Git.</p>
<p>What that means in practice: when I bumped the Open WebUI image version, I changed one line, pushed, and Argo CD rolled the pod. No manual steps, no SSH, no <code>kubectl</code>. The same process I use for any other service in the cluster.</p>
<p>The namespace, network policies, service account, and RBAC all generate from a single entry in <code>applications.yml</code> — same as every other app. The AI inference stack isn&rsquo;t special from an operations perspective.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># applications.yml excerpt</span><span class="w">
</span></span></span><span class="line"><span class="cl">- <span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-ai-engine</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">applications</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">applicationCode</span><span class="p">:</span><span class="w"> </span><span class="l">web-ai-engine</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">helm-charts/extra-objects</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">autoSync</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><hr>
<h2 id="access-and-auth">Access and auth</h2>
<p>The service is exposed at <code>ai.hippotion.com</code> through the same dual-path ingress setup I use everywhere: Cloudflare Tunnel for external access, direct-to-server via Pi-hole DNS for local access, Traefik handling both with a wildcard Let&rsquo;s Encrypt cert. See <a href="/posts/homelab-dual-path-tls/">that post</a> for the full explanation.</p>
<p>Auth is handled by Traefik&rsquo;s ForwardAuth middleware pointing at an oauth2-proxy backed by GitLab. Open WebUI&rsquo;s own auth is disabled (<code>WEBUI_AUTH: false</code>) — the OAuth layer upstream handles it. One login covers every service in the cluster.</p>
<p>The <code>WEBUI_SECRET_KEY</code> (used to sign Open WebUI sessions) comes from Vault via External Secrets Operator. Nothing sensitive in Git.</p>
<hr>
<h2 id="what-the-day-to-day-is-actually-like">What the day-to-day is actually like</h2>
<p>Slow is the obvious caveat. Phi-3.5-mini at 3–6 tok/s means a paragraph-length response takes 20–30 seconds. For coding help where you&rsquo;re reading what came before while it generates, that&rsquo;s fine. For quick factual lookups, it&rsquo;s a little tedious.</p>
<p>The useful cases for a local model, for me:</p>
<ul>
<li><strong>Rephrasing or editing text</strong> — paste something, ask it to tighten it. No data leaves the house.</li>
<li><strong>Config explanation</strong> — paste a Kubernetes manifest or a Traefik config block, ask what it does. Again, stays local.</li>
<li><strong>Quick summaries</strong> — short documents, log snippets, error messages.</li>
<li><strong>Experimentation</strong> — trying prompting techniques, testing system prompts, benchmarking quantisation levels without API costs.</li>
</ul>
<p>For longer reasoning tasks I use a cloud model. The local stack is for the cases where I want the answer to stay on-premises, or where I&rsquo;m iterating and don&rsquo;t want to pay per token.</p>
<hr>
<h2 id="the-starting-point-if-you-want-to-try-it">The starting point if you want to try it</h2>
<p>The manifests are on GitHub: <a href="https://github.com/janos-gyorgy/homelab-ai-inference-starter">homelab-ai-inference-starter</a></p>
<p>It includes the llama-server and Open WebUI deployments, resource configuration, and ingress options for Traefik and nginx. The README walks through downloading a model, applying the manifests, and the configuration knobs worth knowing.</p>
<p>No GPU required. The ThinkCentre in the corner of my desk does the job.</p>
]]></content:encoded></item><item><title>🚨 Don't Restart the Node. Quarantine It First.</title><link>https://blog.hippotion.com/posts/dont-restart-quarantine-first/</link><pubDate>Fri, 01 Aug 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/dont-restart-quarantine-first/</guid><description>Rebooting a misbehaving node feels productive. It isn&amp;rsquo;t. You&amp;rsquo;re erasing your evidence and skipping the lesson.</description><content:encoded><![CDATA[<h2 id="the-reflex">The reflex</h2>
<p>Something&rsquo;s wrong. A GitLab runner stops picking up jobs. An event processor starts dropping messages. A pod restarts in a loop. The node looks healthy — CPU fine, memory fine — but something is clearly off.</p>
<p>The reflex: restart the node, see if it clears.</p>
<p>Sometimes it does clear, and you move on. But you didn&rsquo;t fix anything. You reset the state and crossed your fingers. If it happens again in two weeks, you&rsquo;ll do the same thing. After enough iterations you have a &ldquo;flaky node&rdquo; that everyone reboots periodically and nobody understands.</p>
<p>There&rsquo;s a better sequence. It takes twenty minutes instead of two, and you come out with either a real fix or actual knowledge of what happened.</p>
<hr>
<h2 id="step-one-quarantine-dont-kill">Step one: quarantine, don&rsquo;t kill</h2>
<p>Before you touch anything, take the node out of rotation without destroying its current state.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">kubectl cordon &lt;node&gt;
</span></span></code></pre></div><p>Cordon marks the node as unschedulable. No new pods land on it. Existing pods keep running. If you need the workloads somewhere else immediately:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">kubectl drain &lt;node&gt; --ignore-daemonsets --delete-emptydir-data
</span></span></code></pre></div><p>Now you&rsquo;ve removed the node from production traffic without rebooting. The node is still alive. Everything that happened on it is still there: logs, open files, kernel ring buffer, running processes, memory state.</p>
<p>This is the difference. A reboot wipes that. A cordon preserves it.</p>
<hr>
<h2 id="step-two-look-at-whats-actually-there">Step two: look at what&rsquo;s actually there</h2>
<p>SSH in. Don&rsquo;t grep for anything specific yet — do a pass for anything unusual.</p>
<p><strong>Kernel messages first.</strong> The kernel will often tell you exactly what went wrong before any application did.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">dmesg -T --level<span class="o">=</span>err,warn <span class="p">|</span> tail -50
</span></span></code></pre></div><p>OOM kills show up here. Disk errors show up here. CPU soft lockups show up here. If you&rsquo;ve got any of those, you have your answer before you&rsquo;ve even looked at application logs.</p>
<p><strong>Check for filesystem problems.</strong></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">df -h          <span class="c1"># is anything full?</span>
</span></span><span class="line"><span class="cl">dmesg <span class="p">|</span> grep -i <span class="s2">&#34;ext4\|xfs\|btrfs\|i/o error\|ata&#34;</span>
</span></span></code></pre></div><p>A filesystem at 100% is silent until it isn&rsquo;t. A flaky drive starts dropping I/O errors into dmesg long before SMART reports anything. Application developers rarely think about this case — their app just starts writing logs that say &ldquo;failed to write&rdquo; without specifying that the disk is full or dying.</p>
<p><strong>System resource pressure.</strong></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">vmstat <span class="m">1</span> <span class="m">5</span>          <span class="c1"># is there swap activity?</span>
</span></span><span class="line"><span class="cl">iostat -x <span class="m">1</span> <span class="m">5</span>       <span class="c1"># is a disk saturated?</span>
</span></span><span class="line"><span class="cl">cat /proc/pressure/io   <span class="c1"># kernel PSI — pressure stall info</span>
</span></span></code></pre></div><p>PSI is underused. It tells you whether processes were actually stalled waiting for I/O, not just whether throughput was high. A disk at 80% utilisation might be fine; a disk with 40% I/O PSI pressure is actively hurting performance.</p>
<p><strong>What were the pods doing right before things went sideways?</strong></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">kubectl describe node &lt;node&gt;    <span class="c1"># events section at the bottom</span>
</span></span><span class="line"><span class="cl">kubectl get events --field-selector involvedObject.kind<span class="o">=</span>Pod -A <span class="p">|</span> sort -k1
</span></span></code></pre></div><p>Look for OOMKilled exits, failed liveness probes, and throttling events. Kubernetes events expire after an hour by default — another reason not to reboot immediately; those events are still there if you look now.</p>
<hr>
<h2 id="a-real-example-the-gitlab-runner">A real example: the GitLab runner</h2>
<p>A GitLab runner pod stops picking up jobs. It looks alive — the process is running, no crashes in the pod logs. Jobs sit in the queue.</p>
<p>Restart reflex: delete the pod, let it reschedule, it picks up jobs again.</p>
<p>But why did it stop?</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">journalctl -u gitlab-runner --since <span class="s2">&#34;1 hour ago&#34;</span>
</span></span><span class="line"><span class="cl"><span class="c1"># or, if it&#39;s a container:</span>
</span></span><span class="line"><span class="cl">kubectl logs &lt;runner-pod&gt; --previous
</span></span></code></pre></div><p>In one instance: the runner&rsquo;s working directory was on a tmpfs that hit its size limit. The runner silently failed to create job workspaces and stopped accepting new jobs. The error was one line in the pod logs: <code>mkdir /builds: no space left on device</code>. The pod was healthy by every other metric.</p>
<p>Fix: bump the tmpfs size limit in the runner config. The restart would have cleared tmpfs temporarily, and the runner would have failed again the next time a large job filled it up.</p>
<p>The debug took five minutes. The permanent fix took two minutes. Without quarantining the node first, the evidence was gone.</p>
<hr>
<h2 id="another-one-the-event-consumer">Another one: the event consumer</h2>
<p>An event processor starts falling behind. Messages queue up. The pod shows no errors. Memory looks fine.</p>
<p>This one was subtler: the processor was connected to a downstream dependency over a persistent TCP connection. The connection had gone into a half-open state — the processor thought it was alive, the remote end had already dropped it. New messages were being sent into a dead socket and silently discarded.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">ss -tnp <span class="p">|</span> grep &lt;pid&gt;    <span class="c1"># look at the socket state</span>
</span></span></code></pre></div><p><code>CLOSE_WAIT</code> on a connection that should be <code>ESTABLISHED</code>. The application wasn&rsquo;t checking whether the connection was actually working before using it, just whether it existed.</p>
<p>Restart would have cleared the socket state, fixed the symptom, and left the bug in the code.</p>
<hr>
<h2 id="what-to-look-for--a-short-checklist">What to look for — a short checklist</h2>
<p>When a node is misbehaving, in order:</p>
<ol>
<li><code>dmesg -T --level=err,warn</code> — kernel errors, OOM kills, disk errors</li>
<li><code>df -h &amp;&amp; df -i</code> — full filesystems (space and inodes separately)</li>
<li><code>kubectl describe node &lt;node&gt;</code> — pressure conditions, recent events</li>
<li><code>kubectl logs &lt;pod&gt; --previous</code> — what the pod logged before it died or got stuck</li>
<li><code>ss -tnp</code> — socket states for network-adjacent issues</li>
<li><code>vmstat 1 5</code> + <code>iostat -x 1 5</code> — resource pressure</li>
<li><code>journalctl -p err -b</code> — system journal errors since last boot</li>
</ol>
<p>Most problems show up in the first three.</p>
<hr>
<h2 id="after-youve-found-something-or-not-found-something">After you&rsquo;ve found something (or not found something)</h2>
<p><strong>If you found the cause:</strong> fix it, test it, uncordon the node.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">kubectl uncordon &lt;node&gt;
</span></span></code></pre></div><p>Document what you found — a comment in the relevant config, a commit message, a note. &ldquo;Fixed runner tmpfs limit&rdquo; in the commit history is more useful than &ldquo;flaky runner, restarted.&rdquo;</p>
<p><strong>If you genuinely found nothing:</strong> that&rsquo;s information too. Cordon, reboot, uncordon, and note that the node rebooted clean with no identified cause. If it happens again, you have a pattern. Check whether anything changed in the workloads around that time. Check whether the reboot timing correlates with anything — cron jobs, backups, maintenance windows.</p>
<p>A reboot you can explain is a fix. A reboot you can&rsquo;t explain is a time bomb.</p>
<hr>
<h2 id="why-this-matters-on-a-single-node-cluster">Why this matters on a single-node cluster</h2>
<p>In a multi-node setup you can afford to be lazier — cordon, drain, reboot, let the scheduler handle it, look at it later. On a single node there&rsquo;s no &ldquo;later.&rdquo; The node coming back is all you&rsquo;ve got.</p>
<p>But the habit is worth building regardless of node count. The engineers who understand their systems are the ones who looked before they rebooted.</p>
<hr>
<h2 id="the-actual-rule">The actual rule</h2>
<p><strong>Quarantine first. Debug second. Restart third (if you still need to).</strong></p>
<p>A restart takes two minutes. The evidence it destroys might take two hours to reconstruct — or might be gone for good. The cordon costs you nothing.</p>
]]></content:encoded></item><item><title>⚡ Your Deployment Causes 30 Seconds of Downtime. What Went Wrong?</title><link>https://blog.hippotion.com/posts/k8s-zero-downtime/</link><pubDate>Fri, 20 Jun 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-zero-downtime/</guid><description>Kubernetes rolling updates don&amp;rsquo;t give you zero-downtime for free. There are four separate things you have to get right, and most clusters get at least one wrong.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;How do you achieve zero-downtime deployments in Kubernetes?&rdquo;</em></p>
<p>The expected answer: rolling updates. That&rsquo;s correct but incomplete. Rolling updates are the mechanism. They don&rsquo;t give you zero downtime automatically — they give you a framework in which zero downtime is achievable, if you configure everything correctly.</p>
<p>Most clusters cause brief downtime on every deployment. Usually 5–30 seconds. Usually blamed on &ldquo;the load balancer&rdquo; or &ldquo;DNS&rdquo;. Almost always caused by one of four missing pieces.</p>
<hr>
<h2 id="the-rolling-update-baseline">The rolling update baseline</h2>
<p>Kubernetes replaces pods in waves. You control the pace:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">strategy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">RollingUpdate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">rollingUpdate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">maxSurge</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">        </span><span class="c"># how many extra pods can exist during update</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">maxUnavailable</span><span class="p">:</span><span class="w"> </span><span class="m">0</span><span class="w">  </span><span class="c"># how many pods can be unavailable during update</span><span class="w">
</span></span></span></code></pre></div><p><code>maxUnavailable: 0</code> means Kubernetes never terminates a pod until a replacement is ready. This prevents the obvious failure mode where you have zero running pods mid-deployment.</p>
<p><code>maxSurge: 1</code> means one extra pod beyond the desired count runs during the update. For a deployment with 3 replicas, you&rsquo;ll briefly have 4 pods running.</p>
<p>This alone doesn&rsquo;t prevent downtime.</p>
<hr>
<h2 id="piece-1-the-readiness-probe-the-most-common-missing-piece">Piece 1: The readiness probe (the most common missing piece)</h2>
<p>Kubernetes considers a pod &ldquo;ready&rdquo; when all its containers pass their readiness probes. If you don&rsquo;t define a readiness probe, Kubernetes considers the pod ready as soon as the container starts. Containers start before applications are ready to serve traffic.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Without this, traffic arrives before your app is listening</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">readinessProbe</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">httpGet</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">/healthz</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">initialDelaySeconds</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">periodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">failureThreshold</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span></code></pre></div><p>What happens without it: Kubernetes starts the new pod, marks it ready immediately, adds it to the Service endpoints, routes traffic to it — while your app is still initialising (loading config, connecting to the database, warming caches). The first few requests to the new pod fail or time out.</p>
<p>The fix: define a readiness probe that actually checks application readiness. An HTTP endpoint that returns 200 only after the app has finished starting is the minimum. A deeper check that verifies the database connection is better.</p>
<p>Common mistake: using the same endpoint for liveness and readiness with the same thresholds. They serve different purposes:</p>
<ul>
<li><strong>Readiness</strong>: &ldquo;am I ready to accept traffic?&rdquo; — controls whether traffic is sent</li>
<li><strong>Liveness</strong>: &ldquo;am I still alive?&rdquo; — controls whether the pod is restarted</li>
</ul>
<p>A pod can fail its readiness probe (temporarily overloaded, warming up) without failing its liveness probe. If you make liveness too aggressive, Kubernetes restarts pods that would have recovered on their own.</p>
<hr>
<h2 id="piece-2-the-termination-grace-period-the-other-common-missing-piece">Piece 2: The termination grace period (the other common missing piece)</h2>
<p>When Kubernetes wants to terminate a pod, it sends <code>SIGTERM</code>. Your application has <code>terminationGracePeriodSeconds</code> (default: 30) to finish in-flight requests and shut down cleanly. After that, Kubernetes sends <code>SIGKILL</code>.</p>
<p>The problem: there&rsquo;s a race condition. Kubernetes removes the pod from the Service endpoints and sends <code>SIGTERM</code> roughly simultaneously. The endpoint update has to propagate through the control plane, kube-proxy, and the load balancer. During that propagation window — typically 1–10 seconds — traffic can still arrive at a pod that has already started shutting down.</p>
<p>The fix is a <code>preStop</code> hook that adds a short sleep before the termination sequence:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">lifecycle</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">preStop</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">exec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&#34;sleep&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;5&#34;</span><span class="p">]</span><span class="w">
</span></span></span></code></pre></div><p>This gives the endpoint removal time to propagate before your app receives <code>SIGTERM</code>. The total shutdown sequence is then:</p>
<ol>
<li>Kubernetes removes pod from endpoints</li>
<li><code>preStop</code> hook runs (sleep 5s — enough for endpoint propagation)</li>
<li><code>SIGTERM</code> is sent</li>
<li>App drains in-flight requests and shuts down</li>
<li>If still running after <code>terminationGracePeriodSeconds</code>: <code>SIGKILL</code></li>
</ol>
<p>Set <code>terminationGracePeriodSeconds</code> to cover the sleep plus your app&rsquo;s actual shutdown time:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">terminationGracePeriodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">60</span><span class="w">  </span><span class="c"># 5s preStop + up to 55s for app shutdown</span><span class="w">
</span></span></span></code></pre></div><p>Without the sleep: requests fail during the propagation window. With it: the window is covered.</p>
<hr>
<h2 id="piece-3-poddisruptionbudgets-for-node-maintenance">Piece 3: PodDisruptionBudgets (for node maintenance)</h2>
<p>Rolling updates handle normal deployments. Node drains (<code>kubectl drain</code>, cloud provider maintenance windows, k3s upgrades) are a different code path that bypasses your rolling update strategy entirely.</p>
<p>When a node is drained, Kubernetes evicts all pods on it as fast as it can. Without constraints, it will evict all replicas of your deployment simultaneously if they all happen to land on the same node.</p>
<p>A <code>PodDisruptionBudget</code> sets a floor:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">policy/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">PodDisruptionBudget</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-pdb</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">minAvailable</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">   </span><span class="c"># at least 1 replica must stay up during disruption</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">selector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span></code></pre></div><p>Now node drain will evict pods one at a time, waiting for replacement pods to come up before evicting the next one. If no replacement can be scheduled (e.g., you&rsquo;re draining the only node), the drain will block rather than cause downtime.</p>
<p><code>minAvailable: 1</code> is the minimum. For production with 3+ replicas, <code>minAvailable: 2</code> or <code>maxUnavailable: 1</code> is more appropriate.</p>
<hr>
<h2 id="piece-4-minreadyseconds-the-one-everyone-forgets">Piece 4: minReadySeconds (the one everyone forgets)</h2>
<p>Even with a correct readiness probe, there&rsquo;s a subtle risk: a pod that passes its readiness probe briefly and then fails due to a transient startup issue (flapping). Kubernetes would add it to the endpoint pool, route traffic to it, watch it fail the readiness probe, remove it — and during that window, some requests fail.</p>
<p><code>minReadySeconds</code> says: a pod must pass its readiness probe continuously for this many seconds before Kubernetes considers it &ldquo;available&rdquo; and allows the next pod in the rolling update to be terminated:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">minReadySeconds</span><span class="p">:</span><span class="w"> </span><span class="m">10</span><span class="w">
</span></span></span></code></pre></div><p>This slows deployments slightly but catches flapping probes before they cause production traffic to hit an unstable pod.</p>
<hr>
<h2 id="the-complete-deployment-snippet">The complete deployment snippet</h2>
<p>Putting it together:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">apps/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">minReadySeconds</span><span class="p">:</span><span class="w"> </span><span class="m">10</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">strategy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">RollingUpdate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">rollingUpdate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">maxSurge</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">maxUnavailable</span><span class="p">:</span><span class="w"> </span><span class="m">0</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">selector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">terminationGracePeriodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">60</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">myapp:latest</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">lifecycle</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">preStop</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">exec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&#34;sleep&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;5&#34;</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">readinessProbe</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">httpGet</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">/healthz</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">initialDelaySeconds</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">periodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">failureThreshold</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">livenessProbe</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">httpGet</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">/healthz</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">initialDelaySeconds</span><span class="p">:</span><span class="w"> </span><span class="m">15</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">periodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">10</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">failureThreshold</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span></code></pre></div><p>And the PDB alongside it:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">policy/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">PodDisruptionBudget</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-pdb</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">minAvailable</span><span class="p">:</span><span class="w"> </span><span class="m">2</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">selector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span></code></pre></div><hr>
<h2 id="what-interviewers-are-actually-testing">What interviewers are actually testing</h2>
<p>The follow-up is usually: <em>&ldquo;What if your new version has a bug that isn&rsquo;t caught immediately — how do you roll back?&rdquo;</em></p>
<p><code>kubectl rollout undo deployment/myapp</code> reverts to the previous ReplicaSet. Kubernetes stores the last few ReplicaSets by default (<code>revisionHistoryLimit</code>, default 10). The rollback uses the same rolling update mechanism, so it&rsquo;s also zero-downtime.</p>
<p>The harder follow-up: <em>&ldquo;What if the bug only shows up after 10 minutes of load?&rdquo;</em> That&rsquo;s where you need a canary deployment — send a small percentage of traffic to the new version, observe, then shift the rest. Argo Rollouts handles this natively. Without it, you&rsquo;re doing it manually with two Deployments and weighted Services.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Previously: <a href="/posts/k8s-gitops-secrets/">secrets in a GitOps repo</a>. Next: <a href="/posts/k8s-network-isolation/">network isolation between services</a>.</em></p>
]]></content:encoded></item><item><title>🔄 Someone kubectl apply'd a Hotfix Directly. How Do You Detect and Prevent It?</title><link>https://blog.hippotion.com/posts/k8s-config-drift/</link><pubDate>Fri, 06 Jun 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-config-drift/</guid><description>Manual kubectl in production is the Kubernetes equivalent of SSH&amp;rsquo;ing into a server and editing files. It works until it doesn&amp;rsquo;t, and when it doesn&amp;rsquo;t, nobody knows why.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;How do you prevent configuration drift in a Kubernetes cluster?&rdquo;</em></p>
<p>Configuration drift: the cluster&rsquo;s actual state diverges from what&rsquo;s declared in your source of truth. Someone runs <code>kubectl edit deployment myapp</code> to bump a memory limit during an incident. Someone adds a debug sidecar directly. Someone applies a YAML file from their laptop that was never committed to Git. The fix works. It goes undocumented. Six months later, a new deployment overwrites it. The incident recurs.</p>
<p>There are two distinct problems here that require different solutions:</p>
<ol>
<li><strong>Detection and remediation</strong>: how do you notice drift and revert it?</li>
<li><strong>Prevention</strong>: how do you stop non-compliant resources from being created in the first place?</li>
</ol>
<hr>
<h2 id="detection-and-remediation-argo-cd-selfheal">Detection and remediation: Argo CD selfHeal</h2>
<p>If you&rsquo;re using GitOps with Argo CD, detection and remediation are handled for you:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">syncPolicy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">automated</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">prune</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">selfHeal</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p><code>selfHeal: true</code> means Argo CD continuously compares the cluster state to the Git repo and reverts any divergence. Someone runs <code>kubectl edit deployment myapp</code> and changes the replica count? Argo CD detects the diff on its next reconciliation cycle (default: every 3 minutes) and reverts it.</p>
<p><code>prune: true</code> means resources that exist in the cluster but not in Git are deleted. Someone <code>kubectl apply</code>&rsquo;d a debug pod directly? Gone on the next sync.</p>
<p>This is the audit trail story too. Every legitimate change is a Git commit with an author, a timestamp, and a commit message. Everything that isn&rsquo;t in Git doesn&rsquo;t survive past the next reconciliation. If you want to know what changed and when, <code>git log</code> is the answer.</p>
<hr>
<h2 id="the-gap-selfheal-doesnt-close">The gap selfHeal doesn&rsquo;t close</h2>
<p><code>selfHeal</code> reverts drift after the fact. There&rsquo;s a window — up to 3 minutes — where a drifted resource is serving traffic. For most changes, that&rsquo;s fine. For a bad resource (wrong RBAC, missing network policy, container running as root), 3 minutes is enough to be a problem.</p>
<p>The other gap: <code>selfHeal</code> doesn&rsquo;t tell you <em>who</em> made the change or generate an alert. It just silently fixes it. You need audit logging (<code>kube-apiserver --audit-log-path</code>) or an alerting rule on Argo CD&rsquo;s health events to know that drift happened.</p>
<hr>
<h2 id="prevention-kyverno">Prevention: Kyverno</h2>
<p>Kyverno is a policy engine that runs as a Kubernetes admission webhook. Every resource creation or modification goes through it before being persisted. If the resource violates a policy, Kyverno can reject it outright (enforce mode) or allow it with a warning (audit mode).</p>
<p>The policies are Kubernetes resources themselves — they live in Git, they&rsquo;re applied via GitOps, they&rsquo;re versioned. No separate policy language to learn.</p>
<p>A policy that requires readiness probes on all Deployments:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">require-readiness-probe</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">validationFailureAction</span><span class="p">:</span><span class="w"> </span><span class="l">Enforce</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">check-readiness-probe</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span>- <span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">validate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Deployments must define a readiness probe.&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">pattern</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                  </span>- <span class="nt">(name)</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;*&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                    </span><span class="nt">readinessProbe</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                      </span><span class="nt">(httpGet | tcpSocket | exec)</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;*&#34;</span><span class="w">
</span></span></span></code></pre></div><p>With this policy active: <code>kubectl apply -f deployment-without-probe.yaml</code> is rejected at the API server. The error message is the one you defined in <code>message</code>. The deployment never reaches etcd.</p>
<p>A policy that blocks containers running as root:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">disallow-root-containers</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">validationFailureAction</span><span class="p">:</span><span class="w"> </span><span class="l">Enforce</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">check-runAsNonRoot</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Deployment, StatefulSet, DaemonSet]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">validate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Containers must not run as root.&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">pattern</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                  </span>- <span class="nt">(name)</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;*&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                    </span><span class="nt">securityContext</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                      </span><span class="nt">runAsNonRoot</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p>A policy that enforces resource limits (common in multi-tenant clusters):</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">require-resource-limits</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">validationFailureAction</span><span class="p">:</span><span class="w"> </span><span class="l">Enforce</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">check-limits</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Deployment]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">validate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;CPU and memory limits are required.&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">pattern</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                  </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                      </span><span class="nt">limits</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                        </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;?*&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                        </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;?*&#34;</span><span class="w">
</span></span></span></code></pre></div><hr>
<h2 id="kyverno-can-also-mutate-and-generate">Kyverno can also mutate and generate</h2>
<p>Policies aren&rsquo;t only for validation. Kyverno can mutate incoming resources (add default labels, inject sidecars, set default resource requests) and generate new resources in response to events (create a NetworkPolicy whenever a new namespace is created).</p>
<p>Auto-add a standard label to every Deployment:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">add-labels</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">add-team-label</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Deployment]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">mutate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">patchStrategicMerge</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">managed-by</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno</span><span class="w">
</span></span></span></code></pre></div><p>Auto-create a default NetworkPolicy when a namespace is created:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">add-default-networkpolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">default-deny</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Namespace]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">generate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">NetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">default-deny-all</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;{{request.object.metadata.name}}&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">podSelector</span><span class="p">:</span><span class="w"> </span>{}<span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span>- <span class="l">Ingress</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span>- <span class="l">Egress</span><span class="w">
</span></span></span></code></pre></div><hr>
<h2 id="the-complete-drift-prevention-picture">The complete drift prevention picture</h2>
<pre tabindex="0"><code>Developer runs: kubectl apply -f bad-deployment.yaml
  → API server receives request
  → Kyverno admission webhook intercepts
  → Policy check: no readiness probe → Rejected
  → API server returns 403 with Kyverno&#39;s message
  → Resource never reaches etcd

Developer runs: kubectl edit deployment myapp (valid change, just not via Git)
  → Edit succeeds (no policy violation)
  → Argo CD reconciliation fires (within 3 minutes)
  → Diff detected: cluster state ≠ Git state
  → selfHeal: revert to Git state
  → If audit logging enabled: event recorded with username and timestamp
</code></pre><p>Git is the audit trail for what <em>should</em> be there. kube-apiserver audit logs are the trail for what <em>was attempted</em>. Kyverno is the enforcer at admission time. Argo CD is the continuous reconciler. Four layers, each with a different job.</p>
<hr>
<h2 id="what-interviewers-are-actually-testing">What interviewers are actually testing</h2>
<p>The follow-up is usually: <em>&ldquo;What&rsquo;s the difference between Kyverno and OPA Gatekeeper?&rdquo;</em></p>
<p>Both are admission webhook policy engines. The practical differences:</p>
<ul>
<li><strong>Kyverno</strong>: policies are k8s-native YAML, no separate language to learn. Generate and mutate policies built in. Easier to get started with.</li>
<li><strong>OPA Gatekeeper</strong>: policies are written in Rego, a purpose-built policy language that&rsquo;s more expressive but has a steeper learning curve. Better if you&rsquo;re already using OPA elsewhere (Terraform, microservice authorization).</li>
</ul>
<p>For a Kubernetes-only environment, Kyverno is the pragmatic choice. For a platform team that uses OPA across the stack, Gatekeeper gives you policy consistency.</p>
<p>The deeper follow-up: <em>&ldquo;How do you test policies before enforcing them?&rdquo;</em> Use <code>Audit</code> mode first (<code>validationFailureAction: Audit</code>). Violations are logged as PolicyReport objects but requests aren&rsquo;t rejected. Review the reports, fix the existing violations, then switch to <code>Enforce</code>. Never flip directly to Enforce in production — you&rsquo;ll break things that were already running.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Previously: <a href="/posts/k8s-network-isolation/">network isolation between services</a>.</em></p>
]]></content:encoded></item><item><title>🛡️ How Do You Prevent a Compromised Pod From Calling Your Database?</title><link>https://blog.hippotion.com/posts/k8s-network-isolation/</link><pubDate>Fri, 23 May 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-network-isolation/</guid><description>Default Kubernetes is a flat network. Every pod can reach every other pod. In a cluster with ten services, that&amp;rsquo;s ten potential blast radiuses instead of one.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;How do you enforce network isolation between services in a Kubernetes cluster?&rdquo;</em></p>
<p>The default Kubernetes network model is flat. Every pod can reach every other pod, in any namespace, on any port. There are no firewalls, no ACLs, no segmentation. A compromised frontend pod can connect directly to your PostgreSQL port, your Redis port, your internal admin API, and every other service in the cluster.</p>
<p>This is intentional — Kubernetes doesn&rsquo;t assume you want isolation, because not everyone does. But if you do want it, you need to add it.</p>
<hr>
<h2 id="networkpolicy-the-primitive">NetworkPolicy: the primitive</h2>
<p>A <code>NetworkPolicy</code> is a Kubernetes resource that selects a set of pods and defines what traffic is allowed to reach them (ingress) and what traffic they&rsquo;re allowed to send (egress). Traffic that isn&rsquo;t explicitly allowed is dropped.</p>
<p>The catch: <code>NetworkPolicy</code> resources have no effect unless your CNI plugin supports them. The default k3s CNI (Flannel) does not. Calico, Cilium, and Canal do. If you&rsquo;re running Flannel and you apply a NetworkPolicy, it will be silently ignored — no error, no warning.</p>
<hr>
<h2 id="the-default-deny-pattern">The default-deny pattern</h2>
<p>The correct starting point is a default-deny policy that blocks everything, applied to the namespace. You then add explicit allow policies for the traffic you actually need.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Block all ingress and egress in this namespace by default</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">networking.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">NetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">default-deny-all</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">podSelector</span><span class="p">:</span><span class="w"> </span>{}<span class="w">        </span><span class="c"># matches all pods in the namespace</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">Ingress</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">Egress</span><span class="w">
</span></span></span></code></pre></div><p>With this in place, your pods can&rsquo;t receive traffic and can&rsquo;t send traffic. You then add back what you need.</p>
<hr>
<h2 id="allowing-specific-traffic">Allowing specific traffic</h2>
<p>Allow the web frontend to receive traffic from the ingress controller:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">networking.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">NetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">allow-ingress-from-traefik</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">podSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">frontend</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">Ingress</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">ingress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">from</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">namespaceSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kubernetes.io/metadata.name</span><span class="p">:</span><span class="w"> </span><span class="l">sys-traefik</span><span class="w">
</span></span></span></code></pre></div><p>Allow the backend to talk to PostgreSQL:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">networking.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">NetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">allow-egress-to-postgres</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">podSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">backend</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">Egress</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">egress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">to</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">podSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">postgres</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">5432</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span></code></pre></div><p>After these two policies: the frontend receives traffic from Traefik, and the backend can reach Postgres. The frontend cannot reach Postgres. The backend cannot receive traffic from the ingress controller. Neither can call anything else.</p>
<hr>
<h2 id="the-dns-gotcha">The DNS gotcha</h2>
<p>Once you add a default-deny egress policy, DNS stops working. Your pods can no longer resolve service names because they can&rsquo;t reach <code>kube-dns</code> in the <code>kube-system</code> namespace.</p>
<p>You need to explicitly allow it:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">networking.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">NetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">allow-egress-dns</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">podSelector</span><span class="p">:</span><span class="w"> </span>{}<span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">Egress</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">egress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">to</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">namespaceSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kubernetes.io/metadata.name</span><span class="p">:</span><span class="w"> </span><span class="l">kube-system</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">53</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">UDP</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">53</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span></code></pre></div><p>Missing this is the most common reason &ldquo;everything broke after I added NetworkPolicies&rdquo;. Add it to every namespace that has a default-deny policy.</p>
<hr>
<h2 id="cilium-the-same-model-with-more-power">Cilium: the same model with more power</h2>
<p>Cilium implements the standard <code>NetworkPolicy</code> API and adds its own <code>CiliumNetworkPolicy</code> CRD with L7 capabilities.</p>
<p>Standard NetworkPolicy works at L3/L4 — IP addresses and ports. Cilium&rsquo;s CRD adds:</p>
<p><strong>L7 HTTP filtering</strong>: allow specific HTTP methods and paths, not just port 8080.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">cilium.io/v2</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">CiliumNetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">allow-api-reads</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">endpointSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">api</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">ingress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">fromEndpoints</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">frontend</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">toPorts</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span>- <span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;8080&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">http</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span>- <span class="nt">method</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;GET&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;/api/v1/.*&#34;</span><span class="w">
</span></span></span></code></pre></div><p><strong>DNS-based egress</strong>: allow egress to <code>github.com</code> by hostname rather than IP address. This matters for external services with dynamic IPs.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">egress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">toFQDNs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">matchName</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;github.com&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">toPorts</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;443&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span></code></pre></div><p><strong>Identity-based policies</strong>: Cilium assigns a cryptographic identity to each pod based on its labels. Policies are enforced by identity, not IP address. Pod restarts (which change IPs) don&rsquo;t break policy enforcement.</p>
<hr>
<h2 id="what-a-real-namespace-policy-set-looks-like">What a real namespace policy set looks like</h2>
<p>For a typical web app with frontend, backend, and database:</p>
<pre tabindex="0"><code>Namespace: myapp
├── default-deny-all (ingress + egress, all pods)
├── allow-egress-dns (egress, all pods, port 53)
├── allow-ingress-frontend (ingress frontend, from sys-traefik namespace)
├── allow-egress-frontend-to-backend (egress frontend, to backend:8080)
├── allow-ingress-backend (ingress backend, from frontend)
├── allow-egress-backend-to-postgres (egress backend, to postgres:5432)
└── allow-ingress-postgres (ingress postgres, from backend)
</code></pre><p>Eight policies. The database has exactly one inbound path: from the backend. The frontend has no path to the database at all. A compromised frontend pod cannot scan the internal network — egress to arbitrary destinations is blocked.</p>
<hr>
<h2 id="what-interviewers-are-actually-testing">What interviewers are actually testing</h2>
<p>The follow-up is usually: <em>&ldquo;How do you manage this at scale? Writing NetworkPolicies for every namespace by hand doesn&rsquo;t scale.&rdquo;</em></p>
<p>The answer: you don&rsquo;t write them by hand. You template them. In a GitOps setup, your namespace configuration declares what network access the service needs in a structured form, and a Helm chart or operator generates the actual NetworkPolicy resources from those declarations.</p>
<p>For example, an <code>applications.yml</code> entry might look like:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">networkPolicies</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">denyAll</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">allowIngressFromIngress</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">allowEgressToNamespaces</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&#34;sys-postgres&#34;</span><span class="p">]</span><span class="w">
</span></span></span></code></pre></div><p>And a Helm chart translates that into four concrete NetworkPolicy objects. The developer declares intent; the platform enforces it. No one writes raw YAML for each namespace.</p>
<p>The second follow-up: <em>&ldquo;What about east-west traffic between services in the same namespace?&rdquo;</em> Add <code>allowIntraNamespace: true</code> as a flag that generates a policy allowing all pod-to-pod traffic within the namespace, while still blocking cross-namespace traffic.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Previously: <a href="/posts/k8s-zero-downtime/">zero-downtime deployments</a>. Next: <a href="/posts/k8s-config-drift/">preventing configuration drift</a>.</em></p>
]]></content:encoded></item><item><title>🔑 Deploy to Kubernetes Without Storing Any Cluster Credentials in CI</title><link>https://blog.hippotion.com/posts/k8s-cicd-no-credentials/</link><pubDate>Fri, 09 May 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-cicd-no-credentials/</guid><description>A common interview question in 2026. If your answer is &amp;lsquo;kubeconfig in a CI secret&amp;rsquo;, you&amp;rsquo;re not wrong — but you&amp;rsquo;re also not getting the job.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;How would you design a CI/CD pipeline that deploys to Kubernetes without storing any cluster credentials anywhere?&rdquo;</em></p>
<p>The expected wrong answer: export your kubeconfig, base64-encode it, paste it into a CI secret named <code>KUBE_CONFIG</code>, and call it a day. This works. Most clusters that got hacked had this setup.</p>
<p>There are two correct answers in 2026, and which one you reach for depends on what you&rsquo;re actually deploying.</p>
<hr>
<h2 id="answer-1-gitops-the-one-your-interviewer-probably-wants">Answer 1: GitOps (the one your interviewer probably wants)</h2>
<p>In a GitOps setup, your CI pipeline never touches the cluster. It can&rsquo;t leak credentials it doesn&rsquo;t have.</p>
<p>The flow:</p>
<pre tabindex="0"><code>Developer pushes code
  → CI builds and tests
  → CI updates the image tag in the Git repo (a commit, not a kubectl command)
  → Argo CD detects the change
  → Argo CD applies it to the cluster
</code></pre><p>The cluster reaches out to Git. CI never reaches into the cluster. The only thing with cluster credentials is Argo CD itself — running inside the cluster, with no credentials to leak externally.</p>
<p>For self-hosted setups on Hetzner or Vultr, this is particularly clean because there&rsquo;s no cloud IAM to configure. You point Argo CD at your GitLab repo, tell it which branch to watch, and you&rsquo;re done.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># The Argo CD Application CRD — the only thing you need</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">argoproj.io/v1alpha1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Application</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">argocd</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">source</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">repoURL</span><span class="p">:</span><span class="w"> </span><span class="l">https://gitlab.example.com/myorg/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">targetRevision</span><span class="p">:</span><span class="w"> </span><span class="l">main</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">helm-charts/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">destination</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">server</span><span class="p">:</span><span class="w"> </span><span class="l">https://kubernetes.default.svc</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">syncPolicy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">automated</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">prune</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">selfHeal</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p><code>selfHeal: true</code> means if someone manually <code>kubectl apply</code>s something, Argo CD reverts it. The Git repo is the only source of truth.</p>
<p>The CI image-tag update step looks like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># .gitlab-ci.yml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">stage</span><span class="p">:</span><span class="w"> </span><span class="l">deploy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">script</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">      # Update the image tag in values.yaml and push
</span></span></span><span class="line"><span class="cl"><span class="sd">      sed -i &#34;s/tag: .*/tag: ${CI_COMMIT_SHORT_SHA}/&#34; values/myapp.yml
</span></span></span><span class="line"><span class="cl"><span class="sd">      git config user.email &#34;ci@example.com&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      git config user.name &#34;CI&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      git add values/myapp.yml
</span></span></span><span class="line"><span class="cl"><span class="sd">      git commit -m &#34;chore: bump myapp to ${CI_COMMIT_SHORT_SHA}&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      git push</span><span class="w">
</span></span></span></code></pre></div><p>CI needs write access to the Git repo — but that&rsquo;s a deploy key, not a cluster credential. If it leaks, someone can push code. You&rsquo;d rotate the deploy key and audit the commits. If a cluster credential leaks, someone owns your cluster.</p>
<hr>
<h2 id="answer-2-oidc-federation-for-when-you-genuinely-need-push-based">Answer 2: OIDC federation (for when you genuinely need push-based)</h2>
<p>Some operations don&rsquo;t fit the GitOps model. Infrastructure provisioning (<code>terraform apply</code>), one-off database migrations, or initial cluster bootstrapping — these need direct cluster access. The correct pattern here is OIDC federation.</p>
<p>The idea: your CI platform (GitLab, GitHub Actions) already issues JWT tokens to every job. These JWTs are signed by the CI platform and contain claims like which repo, which branch, which pipeline triggered the job. You configure your Kubernetes API server to trust those JWTs, and the CI job authenticates directly using the token it already has.</p>
<p>No stored credentials. Every job gets a fresh token. The token expires when the job ends.</p>
<p>For a self-hosted GitLab, configure your k8s API server to trust GitLab as an OIDC issuer:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># /etc/rancher/k3s/config.yaml (or kube-apiserver flags)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kube-apiserver-arg</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-issuer-url=https://gitlab.example.com&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-client-id=your_client_id&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-username-claim=sub&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-groups-claim=groups_direct&#34;</span><span class="w">
</span></span></span></code></pre></div><p>Then create a <code>ClusterRoleBinding</code> that maps a specific GitLab identity to a Kubernetes role:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">rbac.authorization.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterRoleBinding</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">gitlab-ci-deployer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">subjects</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">User</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;project_path:myorg/myapp:ref_type:branch:ref:main&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">apiGroup</span><span class="p">:</span><span class="w"> </span><span class="l">rbac.authorization.k8s.io</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">roleRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterRole</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">deploy-role</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">apiGroup</span><span class="p">:</span><span class="w"> </span><span class="l">rbac.authorization.k8s.io</span><span class="w">
</span></span></span></code></pre></div><p>The subject name is the <code>sub</code> claim from the GitLab JWT — it encodes the repo path and branch. Only jobs running on <code>main</code> in <code>myorg/myapp</code> get this binding. A job on a feature branch gets nothing.</p>
<p>In the CI job:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">stage</span><span class="p">:</span><span class="w"> </span><span class="l">deploy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">id_tokens</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">K8S_TOKEN</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">aud</span><span class="p">:</span><span class="w"> </span><span class="l">your_client_id</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">script</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl config set-credentials gitlab-ci \
</span></span></span><span class="line"><span class="cl"><span class="sd">        --token=&#34;${K8S_TOKEN}&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl config set-context deploy \
</span></span></span><span class="line"><span class="cl"><span class="sd">        --cluster=mycluster \
</span></span></span><span class="line"><span class="cl"><span class="sd">        --user=gitlab-ci
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl config use-context deploy
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl rollout restart deployment/myapp -n myapp</span><span class="w">
</span></span></span></code></pre></div><p>The token in <code>K8S_TOKEN</code> is injected by GitLab. It expires with the job. The API server validates the signature against GitLab&rsquo;s JWKS endpoint on every request.</p>
<hr>
<h2 id="which-one-to-use">Which one to use</h2>
<table>
	<thead>
			<tr>
					<th></th>
					<th>GitOps</th>
					<th>OIDC federation</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>CI needs cluster access</td>
					<td>No</td>
					<td>Yes (short-lived token)</td>
			</tr>
			<tr>
					<td>Audit trail</td>
					<td>Git history</td>
					<td>kube-apiserver audit log</td>
			</tr>
			<tr>
					<td>Revocability</td>
					<td>Revert the commit</td>
					<td>Token expires with the job</td>
			</tr>
			<tr>
					<td>Self-hosted setup effort</td>
					<td>Low</td>
					<td>Moderate (OIDC config)</td>
			</tr>
			<tr>
					<td>Works for infra provisioning</td>
					<td>Not really</td>
					<td>Yes</td>
			</tr>
	</tbody>
</table>
<p>For application deployments: GitOps. The cluster reconciles continuously, drift is impossible, and CI is completely decoupled from cluster state.</p>
<p>For infrastructure provisioning or one-off operations: OIDC federation. Short-lived credentials, branch-scoped permissions, nothing to rotate.</p>
<p>What you should never do: store a kubeconfig or a long-lived ServiceAccount token in CI secrets. Not because it&rsquo;s hard to make work — it&rsquo;s easy — but because the blast radius of a leak is unbounded, there&rsquo;s no audit trail, and there&rsquo;s no expiry. Everything that goes wrong with static secrets goes wrong eventually.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Next: <a href="/posts/k8s-gitops-secrets/">how to handle secrets in a GitOps repository</a>.</em></p>
]]></content:encoded></item><item><title>🤫 How Do You Handle Secrets in a GitOps Repository?</title><link>https://blog.hippotion.com/posts/k8s-gitops-secrets/</link><pubDate>Fri, 25 Apr 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-gitops-secrets/</guid><description>GitOps says Git is the source of truth. Secrets say don&amp;rsquo;t put them in Git. These two things appear to be in direct conflict. They&amp;rsquo;re not.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;You&rsquo;re using GitOps — everything goes through Git. How do you handle secrets?&rdquo;</em></p>
<p>The wrong answer: base64-encode them and commit them as Kubernetes <code>Secret</code> objects. Base64 is not encryption. Anyone with read access to the repo has your secrets. If the repo is public, everyone does.</p>
<p>The slightly better wrong answer: use a private repo and just not think about it. This works until a deploy key leaks, someone joins and then leaves the company, or you need to rotate one secret and have to find every place it&rsquo;s referenced.</p>
<p>There are three real answers. They make different tradeoffs.</p>
<hr>
<h2 id="the-constraint">The constraint</h2>
<p>The constraint is actually tighter than &ldquo;don&rsquo;t commit secrets&rdquo;. It&rsquo;s: <strong>your Git repo should be safe to make public at any point</strong>, and <strong>secrets must be rotatable without touching Git</strong>.</p>
<p>If rotating a password requires a new commit, someone has to be awake to merge and deploy it. That&rsquo;s not how you want to handle a 3am incident.</p>
<hr>
<h2 id="option-1-external-secrets-operator--vault">Option 1: External Secrets Operator + Vault</h2>
<p>This is the most robust pattern and the one worth knowing for interviews.</p>
<p>The idea: secrets live in a dedicated secret store (HashiCorp Vault, or a cloud equivalent). A Kubernetes operator called ESO watches <code>ExternalSecret</code> CRD objects in the cluster and syncs the referenced secret into a real Kubernetes <code>Secret</code>. The CRD is safe to commit — it says where the secret lives, not what it is.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># This lives in Git — safe to commit</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">external-secrets.io/v1beta1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ExternalSecret</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-db-credentials</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">refreshInterval</span><span class="p">:</span><span class="w"> </span><span class="l">1h</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">secretStoreRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">vault</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterSecretStore</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">target</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-db-credentials  </span><span class="w"> </span><span class="c"># the k8s Secret it creates</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">secretKey</span><span class="p">:</span><span class="w"> </span><span class="l">DB_PASSWORD</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">remoteRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">key</span><span class="p">:</span><span class="w"> </span><span class="l">secret/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">property</span><span class="p">:</span><span class="w"> </span><span class="l">db-password</span><span class="w">
</span></span></span></code></pre></div><p>Rotation: you update the secret in Vault. ESO syncs it to the cluster within <code>refreshInterval</code>. No Git commit, no deployment. The pod reads the updated <code>Secret</code> on the next restart (or immediately if you mount it as an env var and the app handles <code>SIGHUP</code>).</p>
<p>Audit trail: Vault logs every read and write. You know exactly which service account read which secret at what time.</p>
<p>The cost: you&rsquo;re running Vault. For a homelab or small team, that&rsquo;s an extra thing to operate. For production, it&rsquo;s worth it.</p>
<p>Self-hosted setup:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># ClusterSecretStore — connects ESO to your Vault instance</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">external-secrets.io/v1beta1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterSecretStore</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">vault</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">provider</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">vault</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">server</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;http://sys-vault.sys-vault.svc.cluster.local:8200&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;secret&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;v2&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">auth</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">kubernetes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">mountPath</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;kubernetes&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">role</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;external-secrets&#34;</span><span class="w">
</span></span></span></code></pre></div><p>ESO authenticates to Vault using the pod&rsquo;s Kubernetes ServiceAccount token. Vault validates it against the cluster&rsquo;s token review endpoint. No static credentials anywhere.</p>
<hr>
<h2 id="option-2-sealed-secrets">Option 2: Sealed Secrets</h2>
<p>Sealed Secrets uses asymmetric encryption. The cluster holds a private key. You use the <code>kubeseal</code> CLI to encrypt a secret with the cluster&rsquo;s public key. The resulting <code>SealedSecret</code> object is safe to commit — only the cluster can decrypt it.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># Encrypt a secret for committing to Git</span>
</span></span><span class="line"><span class="cl">kubectl create secret generic myapp-db <span class="se">\
</span></span></span><span class="line"><span class="cl">  --from-literal<span class="o">=</span><span class="nv">DB_PASSWORD</span><span class="o">=</span>hunter2 <span class="se">\
</span></span></span><span class="line"><span class="cl">  --dry-run<span class="o">=</span>client -o yaml <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="p">|</span> kubeseal <span class="se">\
</span></span></span><span class="line"><span class="cl">  &gt; sealed-secrets/myapp-db.yaml
</span></span></code></pre></div><p>The resulting YAML looks like:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">bitnami.com/v1alpha1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">SealedSecret</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-db</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">encryptedData</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">DB_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l">AgBy3i4OJSWK+PiTySYZZA9rO43cGDEq...</span><span class="w">
</span></span></span></code></pre></div><p>This gets committed. The Sealed Secrets controller in the cluster decrypts it and creates the real <code>Secret</code> automatically.</p>
<p>The tradeoff: rotation means re-sealing. You need the cluster&rsquo;s public key (which is public) and access to the plaintext secret. You commit a new <code>SealedSecret</code>. That&rsquo;s a Git commit, which means a review, a merge, and a deploy. For a 3am incident, that&rsquo;s a lot of friction.</p>
<p>Also: if the cluster&rsquo;s private key is lost, you can&rsquo;t decrypt any of your sealed secrets. Back up the private key.</p>
<p>Good fit for: small teams, homelab, situations where secrets change rarely and the GitOps review process is actually desirable.</p>
<hr>
<h2 id="option-3-sops">Option 3: SOPS</h2>
<p>SOPS (Secrets OPerationS) encrypts files at rest using age keys or cloud KMS. You commit encrypted files. CI decrypts them during deployment using a key it holds in memory (not stored in Git).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># Encrypt a file for Git</span>
</span></span><span class="line"><span class="cl">sops --encrypt --age age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8q <span class="se">\
</span></span></span><span class="line"><span class="cl">  secrets/myapp.yaml &gt; secrets/myapp.enc.yaml
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># In CI: decrypt to temp file, apply, delete</span>
</span></span><span class="line"><span class="cl">sops --decrypt secrets/myapp.enc.yaml <span class="p">|</span> kubectl apply -f -
</span></span></code></pre></div><p>The difference from Sealed Secrets: SOPS encrypts at the file level, not the k8s object level. You can use it outside of Kubernetes (application configs, Terraform variables). The key can live in the CI environment, a cloud KMS, or a personal age key.</p>
<p>The tradeoff: CI needs the decryption key, which puts you back in &ldquo;secret in CI&rdquo; territory — just for the encryption key rather than the actual secrets. If you use a cloud KMS, OIDC federation handles that (no stored key). If you use an age key, it lives in CI secrets.</p>
<p>Good fit for: teams already using Helm and Helm Secrets, polyglot environments where not everything is Kubernetes, small teams where Vault feels like overengineering.</p>
<hr>
<h2 id="comparison">Comparison</h2>
<table>
	<thead>
			<tr>
					<th></th>
					<th>ESO + Vault</th>
					<th>Sealed Secrets</th>
					<th>SOPS</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Rotation without Git commit</td>
					<td>Yes</td>
					<td>No</td>
					<td>Depends</td>
			</tr>
			<tr>
					<td>Audit trail</td>
					<td>Full (Vault)</td>
					<td>None</td>
					<td>Depends on KMS</td>
			</tr>
			<tr>
					<td>Complexity</td>
					<td>High</td>
					<td>Low</td>
					<td>Medium</td>
			</tr>
			<tr>
					<td>Works outside k8s</td>
					<td>With effort</td>
					<td>No</td>
					<td>Yes</td>
			</tr>
			<tr>
					<td>Recovery if key lost</td>
					<td>Vault backup</td>
					<td>Lose all secrets</td>
					<td>Key backup</td>
			</tr>
			<tr>
					<td>CI needs secret material</td>
					<td>No</td>
					<td>No</td>
					<td>Yes (decrypt key)</td>
			</tr>
	</tbody>
</table>
<hr>
<h2 id="what-interviewers-are-actually-testing">What interviewers are actually testing</h2>
<p>The interesting follow-up question is: <em>&ldquo;How do you rotate a secret without downtime?&rdquo;</em></p>
<p>The answer requires you to understand that pods mount <code>Secret</code> objects at startup. Updating the <code>Secret</code> in Kubernetes doesn&rsquo;t automatically restart the pod. Your options are:</p>
<ol>
<li>Mount the secret as a volume and have the app watch for file changes (good)</li>
<li>Restart the deployment after rotation (<code>kubectl rollout restart</code>, automatable)</li>
<li>Use a sidecar like Vault Agent Injector that handles refresh in-process (complex but zero-restart)</li>
</ol>
<p>The correct answer depends on the app. An API key that can be rotated gradually is different from a database password where the old one is invalidated immediately.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Previously: <a href="/posts/k8s-cicd-no-credentials/">deploying without cluster credentials</a>. Next: <a href="/posts/k8s-zero-downtime/">zero-downtime deployments</a>.</em></p>
]]></content:encoded></item><item><title>🔐 Same Hostname, Two Traffic Paths: Local HTTPS Without a VPN</title><link>https://blog.hippotion.com/posts/homelab-dual-path-tls/</link><pubDate>Fri, 11 Apr 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/homelab-dual-path-tls/</guid><description>No open ports. Real TLS at home. One IngressRoute per app. This is the networking setup I landed on after ruling out everything that required a compromise.</description><content:encoded><![CDATA[<h2 id="the-three-options-that-didnt-work">The three options that didn&rsquo;t work</h2>
<p>When I started the homelab I looked at the standard ways to make self-hosted services accessible and found the usual compromises:</p>
<p><strong>Open ports on the router.</strong> Point a DNS record at your public IP, forward 443 to your server. Simple. Also: your home IP is now publicly associated with your services, your ISP can see your traffic, and a misconfigured app means an open door. Hard pass.</p>
<p><strong>VPN for everything.</strong> WireGuard on the router, every device gets a tunnel. Secure, but every phone and laptop needs to be configured, split tunnelling adds complexity, and &ldquo;let me pull up the recipe&rdquo; becomes a two-tap operation that my family won&rsquo;t do. And I still want public access for some services.</p>
<p><strong>Cloudflare Tunnel, but broken HTTPS locally.</strong> This is the one I started with. The cloudflared pod dials out to Cloudflare, no open ports, external access works great. But locally, <code>*.hippotion.com</code> resolves to Cloudflare&rsquo;s anycast IP, traffic leaves the house, Cloudflare terminates TLS, traffic comes back in through the tunnel. Every local request makes a round trip to a Cloudflare edge node. Worse: browsers cache HSTS for <code>hippotion.com</code>, so <code>http://</code> URLs on the local network silently upgrade to <code>https://</code>, which fails because there&rsquo;s no local certificate. Intermittent, confusing, and hard to explain to anyone else on the network.</p>
<p>What I wanted: the tunnel for external access, direct-to-server for local access, real TLS in both cases, and one configuration per application.</p>
<hr>
<h2 id="the-insight-pi-hole-already-controls-local-dns">The insight: Pi-hole already controls local DNS</h2>
<p>My network already runs Pi-hole for ad blocking. Pi-hole uses dnsmasq under the hood and can resolve any hostname to any IP you want. One config line in Pi-hole&rsquo;s values:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">dnsmasq</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">customDnsEntries</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">address=/hippotion.com/192.168.0.109</span><span class="w">
</span></span></span></code></pre></div><p>The <code>address=</code> directive is a wildcard. Every device on the LAN that uses Pi-hole for DNS — which is all of them, because the router hands out Pi-hole&rsquo;s IP via DHCP — will now resolve <code>anything.hippotion.com</code> to <code>192.168.0.109</code>, the server&rsquo;s LAN address. External traffic still goes to Cloudflare&rsquo;s IP because it uses the public authoritative DNS. The split is automatic; no per-device configuration.</p>
<p>Local browser → Pi-hole → server&rsquo;s LAN IP directly.</p>
<p>That solves routing. Now TLS.</p>
<hr>
<h2 id="why-http-01-wont-work-here-and-why-dns-01-will">Why HTTP-01 won&rsquo;t work here, and why DNS-01 will</h2>
<p>The standard way to get a Let&rsquo;s Encrypt certificate is the HTTP-01 challenge: Let&rsquo;s Encrypt sends a request to <code>http://yourdomain.com/.well-known/acme-challenge/&lt;token&gt;</code> and your server responds. This requires Let&rsquo;s Encrypt&rsquo;s servers to reach your server over the public internet.</p>
<p>That doesn&rsquo;t work here. There are no open ports. Let&rsquo;s Encrypt can&rsquo;t reach the server. HTTP-01 is out.</p>
<p>DNS-01 is different. Instead of proving you control a server, you prove you control the DNS zone by creating a temporary TXT record at <code>_acme-challenge.yourdomain.com</code>. Let&rsquo;s Encrypt checks DNS, finds the record, issues the cert. No inbound connection required — just API access to your DNS provider.</p>
<p><code>hippotion.com</code> is on Cloudflare. cert-manager has a Cloudflare DNS solver that calls the Cloudflare API to create and delete the TXT record automatically. The certificate request flow:</p>
<ol>
<li>cert-manager creates an ACME Order with Let&rsquo;s Encrypt</li>
<li>cert-manager calls the Cloudflare API: add <code>_acme-challenge.hippotion.com TXT &lt;token&gt;</code></li>
<li>Let&rsquo;s Encrypt queries DNS, finds the record, issues the cert</li>
<li>cert-manager deletes the TXT record, writes the cert to a Kubernetes Secret</li>
<li>cert-manager renews automatically ~30 days before expiry</li>
</ol>
<p>The Cloudflare API token needs <code>Zone:DNS:Edit</code> permission for <code>hippotion.com</code>. It lives in Vault and syncs to the <code>sys-cert-manager</code> namespace via External Secrets Operator — same pattern as every other secret in the cluster, nothing in Git.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">cert-manager.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Certificate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">hippotion-wildcard</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">sys-traefik</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">secretName</span><span class="p">:</span><span class="w"> </span><span class="l">hippotion-wildcard-tls</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">issuerRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">letsencrypt-cloudflare</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterIssuer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">dnsNames</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="s2">&#34;hippotion.com&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="s2">&#34;*.hippotion.com&#34;</span><span class="w">
</span></span></span></code></pre></div><p>One certificate. Every subdomain. cert-manager stores it as <code>hippotion-wildcard-tls</code> in the <code>sys-traefik</code> namespace, where Traefik can read it.</p>
<hr>
<h2 id="two-traefik-entrypoints">Two Traefik entrypoints</h2>
<p>Traefik has three entrypoints configured:</p>
<table>
	<thead>
			<tr>
					<th>Entrypoint</th>
					<th>Port</th>
					<th>Purpose</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td><code>web</code></td>
					<td>80</td>
					<td>Redirects all traffic to <code>websecure</code></td>
			</tr>
			<tr>
					<td><code>websecure</code></td>
					<td>443</td>
					<td>Local HTTPS, serves the wildcard cert</td>
			</tr>
			<tr>
					<td><code>cloudflare</code></td>
					<td>7080</td>
					<td>Receives plain HTTP from the cloudflared pod</td>
			</tr>
	</tbody>
</table>
<p>The <code>cloudflare</code> entrypoint is the key piece. Cloudflare Tunnel terminates TLS at Cloudflare&rsquo;s edge and forwards plain HTTP to the cluster. If that plain HTTP landed on <code>web</code> (port 80), it would get redirected to <code>websecure</code> (port 443), which would fail because cloudflared isn&rsquo;t sending HTTPS. A separate entrypoint on a separate port handles tunnel traffic without redirection.</p>
<p>Traefik is configured to use the wildcard cert as its default:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">tlsStore</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">default</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">defaultCertificate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">secretName</span><span class="p">:</span><span class="w"> </span><span class="l">hippotion-wildcard-tls</span><span class="w">
</span></span></span></code></pre></div><p>Any <code>websecure</code> request that doesn&rsquo;t match a more specific TLS configuration gets the wildcard cert. No per-app certificate configuration.</p>
<hr>
<h2 id="one-ingressroute-handles-both-paths">One IngressRoute handles both paths</h2>
<p>Every application gets a single IngressRoute with both entrypoints:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">traefik.io/v1alpha1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">IngressRoute</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">entryPoints</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">cloudflare  </span><span class="w"> </span><span class="c"># plain HTTP from cloudflared</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">websecure   </span><span class="w"> </span><span class="c"># local HTTPS with wildcard cert</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">routes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">match</span><span class="p">:</span><span class="w"> </span><span class="l">Host(`myapp.hippotion.com`)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Rule</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">middlewares</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">oauth-auth</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">sys-oauth2-gitlab</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span></code></pre></div><p>That&rsquo;s it. The same hostname, the same routing rule, the same middleware — served correctly on both paths. No conditional logic, no separate ingress for local vs external.</p>
<p>The OAuth middleware (<code>oauth-auth</code>) works on both paths too. Local browsers get redirected to GitLab for authentication the same way external browsers do. The SSO cookie is set on <code>hippotion.com</code>, so it works across all subdomains regardless of which path the traffic came through.</p>
<hr>
<h2 id="what-the-two-traffic-paths-look-like-end-to-end">What the two traffic paths look like end to end</h2>
<pre tabindex="0"><code>External browser (anywhere):
  Browser
    → Cloudflare DNS (hippotion.com → Cloudflare anycast IP)
    → Cloudflare edge (TLS terminated, certificate managed by Cloudflare)
    → cloudflared pod in cluster (plain HTTP)
    → Traefik :7080 (cloudflare entrypoint)
    → app pod

Local browser (home WiFi):
  Browser
    → Pi-hole DNS (*.hippotion.com → 192.168.0.109)
    → Traefik :443 (websecure entrypoint)
    → Traefik serves hippotion-wildcard-tls (Let&#39;s Encrypt cert, trusted by browser)
    → app pod
</code></pre><p>Both paths hit the same Traefik IngressRoute rule. The app sees an HTTP request either way. TLS is handled at the edge — Cloudflare for external traffic, Traefik for local traffic.</p>
<hr>
<h2 id="the-hsts-detail">The HSTS detail</h2>
<p>Cloudflare likely has HSTS enabled for your domain. Browsers cache this: once they see an HSTS header for <code>hippotion.com</code>, they&rsquo;ll refuse to load any <code>http://</code> URL under that domain for the duration of the max-age. They silently upgrade to <code>https://</code> and fail if there&rsquo;s no cert.</p>
<p>This is why the original setup — tunnel only, no local cert — felt unreliable locally. The browser was doing the right thing (enforcing HTTPS) but the cert didn&rsquo;t exist. The wildcard cert fixes this because HTTPS now actually works locally. The HSTS enforcement is fine once TLS is real.</p>
<hr>
<h2 id="what-its-like-to-operate">What it&rsquo;s like to operate</h2>
<p>Adding a new service means writing one IngressRoute with two entrypoints and pushing to Git. No DNS records to create (cloudflared picks up hostnames from a config list), no certificates to request (the wildcard covers everything), no VPN profiles to distribute. The platform handles it.</p>
<p>Local access works when the internet is down. The Pi-hole DNS and the wildcard cert are entirely on-premises — as long as the server is up, the services are reachable, Cloudflare outage or not. I noticed this during a brief Cloudflare incident a few months ago: external access went down, everything inside the house kept working without interruption.</p>
<p>I&rsquo;m not a networking expert. I just followed the constraint — no open ports, no VPN — and the DNS-01 + split DNS solution fell out naturally. It turned out to be simpler to configure than the alternatives, and cleaner to operate.</p>
]]></content:encoded></item><item><title>🏗️ My Homelab Runs on GitOps. Here's What That Actually Means.</title><link>https://blog.hippotion.com/posts/homelab-gitops/</link><pubDate>Fri, 28 Mar 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/homelab-gitops/</guid><description>I wanted to learn production-grade Kubernetes patterns without breaking production. One node, a full GitOps stack, and a hard rule: no manual kubectl after bootstrap.</description><content:encoded><![CDATA[<h2 id="why-this-exists">Why this exists</h2>
<p>I&rsquo;ve been working in DevOps and platform engineering long enough to know what I don&rsquo;t know. The patterns that separate robust infrastructure from &ldquo;it works on my machine&rdquo; infrastructure — GitOps, admission policies, network segmentation, secrets management — are easy to read about. They&rsquo;re harder to actually internalise without running them yourself.</p>
<p>So I built a homelab. An old ThinkCentre I had sitting around, k3s, and a rule I set for myself before writing a single line of configuration: <strong>GitLab is the only source of truth. No manual <code>kubectl</code> after bootstrap. All changes go through <code>git push</code>.</strong></p>
<p>That rule turned out to be more consequential than I expected.</p>
<hr>
<h2 id="the-stack">The stack</h2>
<p>The cluster runs about thirty services across two categories: infrastructure that makes the platform work, and applications that actually do things.</p>
<p>Infrastructure:</p>
<ul>
<li><strong>k3s</strong> — lightweight Kubernetes, single-node</li>
<li><strong>Cilium</strong> — CNI with NetworkPolicy support (Flannel, k3s&rsquo;s default, silently ignores NetworkPolicies)</li>
<li><strong>Argo CD</strong> — GitOps reconciler, watches the repo, applies changes</li>
<li><strong>Traefik</strong> — ingress controller, two entrypoints</li>
<li><strong>Cloudflare tunnel</strong> — external access without open ports</li>
<li><strong>cert-manager</strong> — wildcard TLS cert via Let&rsquo;s Encrypt DNS-01</li>
<li><strong>oauth2-proxy</strong> — GitLab SSO protecting everything by default</li>
<li><strong>Vault + External Secrets Operator</strong> — secrets management</li>
<li><strong>Pi-hole</strong> — local DNS for <code>*.hippotion.com</code></li>
</ul>
<p>Applications: a media server (Jellyfin, *arr stack), Immich for photos, Vaultwarden for passwords, Home Assistant, n8n for automation, a Hugo blog, Obsidian via browser-based KasmVNC, and a few custom-built things I&rsquo;ll get to below.</p>
<hr>
<h2 id="traffic-reaches-the-cluster-in-two-ways">Traffic reaches the cluster in two ways</h2>
<p>External traffic (from anywhere on the internet) goes through a Cloudflare tunnel. The cloudflared pod dials out to Cloudflare — no open ports on the server, no firewall rules, no exposed IP. Cloudflare terminates TLS and forwards plain HTTP to Traefik on port 7080. Cloudflare handles the certificate for external visitors.</p>
<p>Local traffic (home WiFi) goes through Pi-hole, which resolves <code>*.hippotion.com</code> to the server&rsquo;s LAN IP. Traefik receives HTTPS on port 443, served with a wildcard certificate that cert-manager issues from Let&rsquo;s Encrypt via DNS-01 challenge. Port 80 redirects to 443; the <code>cloudflare</code> entrypoint on 7080 does not redirect, because it&rsquo;s already receiving plain HTTP from cloudflared.</p>
<p>The result: the same IngressRoute handles both paths.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">entryPoints</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">cloudflare  </span><span class="w"> </span><span class="c"># plain HTTP from the cloudflared pod</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">websecure   </span><span class="w"> </span><span class="c"># local HTTPS with wildcard cert</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">routes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">match</span><span class="p">:</span><span class="w"> </span><span class="l">Host(`myapp.hippotion.com`)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Rule</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">middlewares</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">oauth-auth</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">sys-oauth2-gitlab</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span></code></pre></div><p>Every IngressRoute has both entrypoints. If you forget one, the service is unreachable from half your access paths. Learned that the first time I added an app and couldn&rsquo;t reach it from the phone.</p>
<hr>
<h2 id="one-file-generates-everything">One file generates everything</h2>
<p>The centrepiece of the setup is <code>applications.yml</code> — a single file that is the complete list of everything running in the cluster. Every entry generates a Namespace, an Argo CD AppProject, an Application, NetworkPolicies, and RBAC. Nothing is created anywhere else.</p>
<p>An entry looks like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl">- <span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-vaultwarden</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">networkPolicies</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">profile</span><span class="p">:</span><span class="w"> </span><span class="l">web-app</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">applications</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">applicationCode</span><span class="p">:</span><span class="w"> </span><span class="l">web-vaultwarden</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">helm-charts/extra-objects</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">autoSync</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p>Six lines. That deploys a namespace, an Argo CD app that watches <code>helm-charts/extra-objects/values-web-vaultwarden.yml</code>, a full set of Cilium NetworkPolicies based on the <code>web-app</code> profile (deny-all with ingress from Traefik and egress to external), and a ServiceAccount. Adding a new service to the cluster is this file plus a values file with the actual Kubernetes manifests.</p>
<p>The <code>profile: web-app</code> notation deserves a word. Raw NetworkPolicy YAML is repetitive and error-prone — every namespace needs a deny-all base plus specific allows. I template it. A Helm chart maps profile names to concrete policy sets. <code>web-app</code> means: deny all ingress except from the ingress namespace, deny all egress except DNS and external HTTPS. <code>web-app-internal</code> means the same but no external egress — suitable for services that only talk to other in-cluster services. <code>media-server</code> adds port 6881 for BitTorrent. The policies are generated; no one writes them by hand.</p>
<hr>
<h2 id="secrets-without-storing-them-in-git">Secrets without storing them in Git</h2>
<p>Kubernetes <code>Secret</code> objects are not secrets. They&rsquo;re base64-encoded blobs in etcd, and base64 is not encryption. Committing them to a Git repo — even a private one — is the wrong answer.</p>
<p>The setup here uses HashiCorp Vault as the actual secret store, with External Secrets Operator syncing Vault paths to Kubernetes Secrets. What lives in Git is an <code>ExternalSecret</code> CRD:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">external-secrets.io/v1beta1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ExternalSecret</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-credentials</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">secretStoreRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">vault</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterSecretStore</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">target</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-credentials</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">secretKey</span><span class="p">:</span><span class="w"> </span><span class="l">DB_PASSWORD</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">remoteRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">key</span><span class="p">:</span><span class="w"> </span><span class="l">secret/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">property</span><span class="p">:</span><span class="w"> </span><span class="l">db-password</span><span class="w">
</span></span></span></code></pre></div><p>This is safe to commit. It says where the secret lives, not what it is. Vault contains the actual value. ESO syncs it to the cluster and refreshes every hour. Rotation means updating the value in Vault — no Git commit, no deployment.</p>
<p>Vault runs in-cluster with a sidecar that auto-unseals on restart. Not production-grade (the unseal key is on the same PVC as Vault itself), but pragmatic for a homelab where availability matters more than a sophisticated key management ceremony.</p>
<hr>
<h2 id="three-things-i-built-that-were-worth-building">Three things I built that were worth building</h2>
<h3 id="local-ai-inference">Local AI inference</h3>
<p>The cluster runs a local LLM. The <code>web-ai-engine</code> namespace has Open WebUI fronting a llama-server serving Phi-3.5 Mini in GGUF format. The model file lives on the node&rsquo;s filesystem, mounted as a hostPath volume.</p>
<p><code>web-openclaw</code> is a personal AI assistant UI that can route requests to either external providers (via NVIDIA&rsquo;s API) or the local llama-server, depending on the task. The local model handles things that don&rsquo;t need to leave the house; the external API handles things that do. The network policy for <code>web-openclaw</code> explicitly allows egress to <code>web-ai-engine</code> and nowhere else for local inference.</p>
<p>Running a 3.8B parameter model on homelab hardware is genuinely useful and costs nothing per query. It&rsquo;s not GPT-4, but for summarisation, first drafts, and things you don&rsquo;t want sending to a third-party API, it&rsquo;s more than good enough.</p>
<h3 id="brew-buddy">Brew Buddy</h3>
<p>I make kombucha. I was tracking fermentation batches in a notes app and getting annoyed at not being able to see history across batches. So I built a tracker.</p>
<p>Brew Buddy is a React frontend and a Go API backed by PostgreSQL, all running in the <code>web-brew-buddy</code> namespace. The images are built locally and imported into the cluster&rsquo;s container runtime with <code>k3s ctr images import</code>. It&rsquo;s deployed like any other app — a values file, an entry in <code>applications.yml</code>, a Vault secret for the database password.</p>
<p>The point isn&rsquo;t the app. The point is that the platform handles a custom hobby project with the same operational properties as Vaultwarden or Immich. Same GitOps workflow, same secret management, same network isolation, same TLS termination. Adding an app to this cluster takes an afternoon of writing manifests and a few seconds of git push. The platform work was done once.</p>
<h3 id="qr-device-login">QR device login</h3>
<p>This one has <a href="/posts/qr-device-login/">its own post</a> because it took three days and four complete rewrites of oauth2-proxy&rsquo;s session format to get right.</p>
<p>The short version: the Homer dashboard on the living room TV needed a way to log in without typing credentials on a TV keyboard. I built a device-flow OAuth service — phone scans QR, phone authenticates with GitLab, TV session is created. End session from the phone kills the TV&rsquo;s session immediately by deleting the oauth2-proxy Redis ticket.</p>
<p>It&rsquo;s the most overengineered solution to a problem I have, and I don&rsquo;t regret a minute of it.</p>
<hr>
<h2 id="what-operating-this-way-actually-changes">What operating this way actually changes</h2>
<p>The practical difference of the no-manual-kubectl rule is larger than it sounds.</p>
<p><strong>The audit trail is automatic.</strong> Every change to the cluster is a git commit with an author, a timestamp, and a diff. There&rsquo;s no &ldquo;what did I change last Tuesday?&rdquo; — I know exactly what changed last Tuesday, and I can revert it with <code>git revert</code>. The Argo CD UI shows the diff between what&rsquo;s in Git and what&rsquo;s running. If there&rsquo;s a diff, something went wrong.</p>
<p><strong>New services are cheap to add.</strong> The platform does the repetitive work — namespace, RBAC, network policies, TLS termination, OAuth protection. Adding a new app is writing the manifests and updating <code>applications.yml</code>. The infrastructure concerns are handled.</p>
<p><strong>Recovery is straightforward.</strong> If I rebuild the node (which I&rsquo;ve done), I run two bootstrap scripts, apply one Argo CD manifest, and the cluster reconciles itself from Git over the next few minutes. The only things that require manual work are the secrets that can&rsquo;t live in Git — two OAuth credentials and the Cloudflare tunnel token, all recreated by <code>scripts/create-secrets.sh</code>.</p>
<p><strong>Experimentation is safe.</strong> I run things on <code>toggleable: true</code> apps that I&rsquo;m not sure I&rsquo;ll keep. Turning them off is removing the entry from <code>applications.yml</code> and pushing. Turning them back on is adding it back.</p>
<hr>
<h2 id="what-it-doesnt-solve">What it doesn&rsquo;t solve</h2>
<p>Bootstrap is manual. The first <code>kubectl apply -f argocd/root-app.yaml</code> happens outside of GitOps by definition. The three bootstrap secrets can&rsquo;t be in Git. This is unavoidable — you need to trust something before GitOps can take over, and that something is a short manual procedure.</p>
<p>Some things fight the model. k3s&rsquo;s built-in addon controller rewrites the metrics-server Deployment on every k3s restart, removing a patch needed for Cilium compatibility. The fix is a pod that watches for the revert and reapplies the patch. It works, but it&rsquo;s a workaround for a component I don&rsquo;t control.</p>
<p>Single-node means single point of failure. For a homelab, that&rsquo;s acceptable. For anything important, it&rsquo;s not.</p>
<hr>
<h2 id="the-honest-summary">The honest summary</h2>
<p>I set out to learn production-grade Kubernetes patterns, and I did. The GitOps constraint turned out to be the best engineering decision in the project — not because it made things easier in the short term (it didn&rsquo;t), but because it forced every change through a path that is auditable, reversible, and consistent.</p>
<p>The cluster is a single ThinkCentre running about thirty services, secured by Cilium network policies, authenticated via GitLab SSO, with secrets managed by Vault and all configuration in a Git repo that I could hand to someone tomorrow and they&rsquo;d understand what&rsquo;s running and why.</p>
<p>That&rsquo;s the goal. For a homelab, I&rsquo;ll call it achieved.</p>
]]></content:encoded></item><item><title>📱 Building a QR Code Login for a Homelab (And Learning oauth2-proxy's Session Format the Hard Way)</title><link>https://blog.hippotion.com/posts/qr-device-login/</link><pubDate>Fri, 14 Mar 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/qr-device-login/</guid><description>My homelab uses oauth2-proxy for GitLab SSO. I wanted a QR code login for the TV dashboard. Two days and four complete rewrites later, I knew more about oauth2-proxy&amp;rsquo;s session format than I ever planned to.</description><content:encoded><![CDATA[<h2 id="the-problem">The problem</h2>
<p>My homelab runs a single-node k3s cluster with a full GitOps stack — Argo CD, Traefik, oauth2-proxy for GitLab SSO, the usual over-engineered personal project. One thing that always bothered me: when I want to show the Homer dashboard on the living room TV, I have to type my credentials on a keyboard that wasn&rsquo;t designed for the living room.</p>
<p>The obvious fix is a QR code. Phone scans it, phone authenticates, TV unlocks. Conceptually simple. In practice, a two-day debugging adventure that took me deep into oauth2-proxy&rsquo;s source code.</p>
<hr>
<h2 id="the-design">The design</h2>
<p>The flow I wanted:</p>
<ol>
<li>TV opens <code>qr.hippotion.com</code>, shows a QR code and polls for completion</li>
<li>Phone scans, opens the device URL, taps &ldquo;Continue with GitLab&rdquo;</li>
<li>Phone completes GitLab OAuth</li>
<li>Server marks the session as ready</li>
<li>TV&rsquo;s poll fires, gets redirected to Homer</li>
<li>Later: phone taps &ldquo;End Session&rdquo;, TV locks immediately</li>
</ol>
<p>This is the <a href="https://datatracker.ietf.org/doc/html/rfc8628">OAuth 2.0 Device Authorization Grant</a> pattern adapted for a single trusted user. I wrote it in Go with Redis for session storage. The service generates a device token, stores it with a 5-minute TTL, and uses it as the OAuth <code>state</code> parameter. The phone completes GitLab OAuth and the callback handler links the resulting session to the device token. The TV&rsquo;s poll loop picks it up and redirects.</p>
<p>That part was straightforward. The hard part was making the TV&rsquo;s session work for <em>all</em> protected apps on the domain, not just the QR page.</p>
<hr>
<h2 id="the-oauth2-proxy-problem">The oauth2-proxy problem</h2>
<p>My homelab uses oauth2-proxy as a ForwardAuth backend for Traefik. Every protected app (<code>home.hippotion.com</code>, <code>argo.hippotion.com</code>, <code>grafana.hippotion.com</code>, etc.) sends unauthenticated requests through oauth2-proxy, which redirects to GitLab if no valid <code>_oauth2_proxy</code> session cookie is present.</p>
<p>The QR auth service creates its own session cookie (<code>qr_session</code>), but oauth2-proxy knows nothing about it. After QR login, clicking any link from Homer would immediately ask for GitLab credentials again.</p>
<p>The obvious solution: after the phone authenticates, set a valid <code>_oauth2_proxy</code> cookie on the TV&rsquo;s browser. If I can forge a cookie that oauth2-proxy accepts, all apps work instantly.</p>
<p>How hard can it be?</p>
<hr>
<h2 id="attempt-1-aes-gcm--json">Attempt 1: AES-GCM + JSON</h2>
<p>I looked at the oauth2-proxy source and found what looked like the session format: a JSON struct with short field names (<code>&quot;e&quot;</code> for email, <code>&quot;ca&quot;</code> for created-at, etc.), encrypted with AES-GCM, base64url-encoded.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">oauthSession</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">CreatedAt</span><span class="w"> </span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span><span class="w"> </span><span class="s">`json:&#34;ca&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">ExpiresOn</span><span class="w"> </span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span><span class="w"> </span><span class="s">`json:&#34;ea&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">Email</span><span class="w">     </span><span class="kt">string</span><span class="w">     </span><span class="s">`json:&#34;e&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">User</span><span class="w">      </span><span class="kt">string</span><span class="w">     </span><span class="s">`json:&#34;u&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>SHA256-hash the cookie secret → 32-byte AES key → GCM encrypt → base64url encode. Set as <code>_oauth2_proxy</code> cookie. Clean, simple, wrong.</p>
<p>oauth2-proxy returned 302 every time. I added debug logging to print the cookie value, copied it, and tested it directly against the ForwardAuth endpoint with curl. The logs revealed everything:</p>
<pre tabindex="0"><code>Error loading cookied session: cookie signature not valid, removing session
</code></pre><p><em>Cookie signature not valid.</em> Not &ldquo;decryption failed&rdquo;, not &ldquo;session expired&rdquo;. A signature check.</p>
<hr>
<h2 id="finding-the-real-format">Finding the real format</h2>
<p>The error came from <code>pkg/middleware/stored_session.go:94</code>. I fetched the source:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">encryption</span><span class="p">.</span><span class="nf">Validate</span><span class="p">(</span><span class="nx">c</span><span class="p">,</span><span class="w"> </span><span class="nx">secret</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">Cookie</span><span class="p">.</span><span class="nx">Expire</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nf">New</span><span class="p">(</span><span class="s">&#34;cookie signature not valid&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p><code>encryption.Validate</code> splits the cookie value on <code>|</code> and expects three parts. Looking at <code>utils.go</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">Validate</span><span class="p">(</span><span class="nx">cookie</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Cookie</span><span class="p">,</span><span class="w"> </span><span class="nx">seed</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">expiration</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Duration</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">value</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">parts</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nf">Split</span><span class="p">(</span><span class="nx">cookie</span><span class="p">.</span><span class="nx">Value</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;|&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">parts</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">return</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">if</span><span class="w"> </span><span class="nf">checkSignature</span><span class="p">(</span><span class="nx">parts</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span><span class="w"> </span><span class="nx">seed</span><span class="p">,</span><span class="w"> </span><span class="nx">cookie</span><span class="p">.</span><span class="nx">Name</span><span class="p">,</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="c1">// ...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>The cookie format is <code>encryptedValue|timestamp|hmac</code>. My cookie was just <code>encryptedValue</code>. Three-part, not one. First problem found.</p>
<p>For the HMAC, I needed to verify against a real cookie to get the key format right. oauth2-proxy sets <code>_oauth2_proxy_csrf</code> cookies during the login flow — I captured one from a 302 response and reverse-engineered it in Python:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">key</span> <span class="o">=</span> <span class="n">secret_raw</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span>  <span class="c1"># raw string, not decoded</span>
</span></span><span class="line"><span class="cl"><span class="n">data</span> <span class="o">=</span> <span class="p">(</span><span class="n">cookie_name</span> <span class="o">+</span> <span class="n">enc_val</span> <span class="o">+</span> <span class="n">ts</span><span class="p">)</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span>  <span class="c1"># concatenated, NO separators</span>
</span></span><span class="line"><span class="cl"><span class="n">sig</span> <span class="o">=</span> <span class="n">base64</span><span class="o">.</span><span class="n">urlsafe_b64encode</span><span class="p">(</span><span class="n">hmac</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">hashlib</span><span class="o">.</span><span class="n">sha256</span><span class="p">)</span><span class="o">.</span><span class="n">digest</span><span class="p">())</span>
</span></span></code></pre></div><p>Two surprises: the HMAC key is the <strong>raw cookie secret string</strong> (not base64-decoded), and the input is a <strong>bare concatenation</strong> with no <code>|</code> separators between fields.</p>
<p>I ran the test. The CSRF cookie&rsquo;s signature matched. I had the format.</p>
<p>But oauth2-proxy still rejected the session.</p>
<hr>
<h2 id="the-wrong-cipher">The wrong cipher</h2>
<p>I switched from AES-GCM to the correct HMAC format and tried again. Still 302. <code>cookie signature not valid</code> again.</p>
<p>Wait — was it even getting to the signature check? If decryption failed first, it wouldn&rsquo;t reach that error. I added more debug logging to print the full cookie value and tested it with Python&rsquo;s <code>cryptography</code> library:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">candidates</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="s1">&#39;24-byte std-b64 decode&#39;</span><span class="p">:</span>  <span class="n">base64</span><span class="o">.</span><span class="n">b64decode</span><span class="p">(</span><span class="n">secret_str</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="s1">&#39;32-byte raw string&#39;</span><span class="p">:</span>      <span class="n">secret_str</span><span class="o">.</span><span class="n">encode</span><span class="p">(),</span>
</span></span><span class="line"><span class="cl">    <span class="s1">&#39;32-byte sha256 of b64&#39;</span><span class="p">:</span>   <span class="n">hashlib</span><span class="o">.</span><span class="n">sha256</span><span class="p">(</span><span class="n">base64</span><span class="o">.</span><span class="n">b64decode</span><span class="p">(</span><span class="n">secret_str</span><span class="p">))</span><span class="o">.</span><span class="n">digest</span><span class="p">(),</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="k">for</span> <span class="n">label</span><span class="p">,</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">candidates</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">pt</span> <span class="o">=</span> <span class="n">AESGCM</span><span class="p">(</span><span class="n">key</span><span class="p">)</span><span class="o">.</span><span class="n">decrypt</span><span class="p">(</span><span class="n">nonce</span><span class="p">,</span> <span class="n">ct_tag</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;SUCCESS [</span><span class="si">{</span><span class="n">label</span><span class="si">}</span><span class="s1">]: </span><span class="si">{</span><span class="n">pt</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;FAIL    [</span><span class="si">{</span><span class="n">label</span><span class="si">}</span><span class="s1">]: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</span></span></code></pre></div><p>The 24-byte base64-decoded key decrypted successfully. The cookie was correctly decrypted. But still rejected. Which meant the signature check was passing but <em>something else</em> was wrong upstream — it wasn&rsquo;t even getting to the signature.</p>
<p>I went back to the source. <code>session_store.go</code> → <code>NewCookieSessionStore</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="nx">cipher</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">encryption</span><span class="p">.</span><span class="nf">NewCFBCipher</span><span class="p">(</span><span class="nx">encryption</span><span class="p">.</span><span class="nf">SecretBytes</span><span class="p">(</span><span class="nx">secret</span><span class="p">))</span><span class="w">
</span></span></span></code></pre></div><p><strong>AES-CFB. Not GCM.</strong> The cookie session store uses CFB. GCM exists in the codebase for a different purpose (the Redis ticket store, which I hadn&rsquo;t discovered yet). I had been encrypting with the wrong cipher the entire time.</p>
<p>And <code>SecretBytes</code> — a function I&rsquo;d been reading but not understanding:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">SecretBytes</span><span class="p">(</span><span class="nx">secret</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">base64</span><span class="p">.</span><span class="nx">RawURLEncoding</span><span class="p">.</span><span class="nf">DecodeString</span><span class="p">(</span><span class="nx">strings</span><span class="p">.</span><span class="nf">TrimRight</span><span class="p">(</span><span class="nx">secret</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;=&#34;</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="p">[]</span><span class="kt">int</span><span class="p">{</span><span class="mi">16</span><span class="p">,</span><span class="w"> </span><span class="mi">24</span><span class="p">,</span><span class="w"> </span><span class="mi">32</span><span class="p">}</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">return</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">secret</span><span class="p">)</span><span class="w">  </span><span class="c1">// fallback: raw string</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>The cookie secret <code>q7OF9sK2/Pnt9QKNoBBmxWRL3GAbWzvj</code> contains <code>/</code>. That&rsquo;s valid standard base64 but not URL-safe base64 — <code>RawURLEncoding</code> fails. Fallback to raw string: 32 bytes, valid AES-256 key. My Python test had used standard base64 decoding, which <em>did</em> succeed (and produced a different 24-byte key). My Go implementation had done the same. Both were deriving the wrong key.</p>
<p>I rewrote the cipher to AES-CFB with the raw-string key. New test. Same error. Still rejecting.</p>
<hr>
<h2 id="messagepack-and-lz4">MessagePack and LZ4</h2>
<p>Back to the source. <code>EncodeSessionState</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">SessionState</span><span class="p">)</span><span class="w"> </span><span class="nf">EncodeSessionState</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="nx">encryption</span><span class="p">.</span><span class="nx">Cipher</span><span class="p">,</span><span class="w"> </span><span class="nx">compress</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">packed</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">msgpack</span><span class="p">.</span><span class="nf">Marshal</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="c1">// ...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">compressed</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">lz4Compress</span><span class="p">(</span><span class="nx">packed</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="c1">// ...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nf">Encrypt</span><span class="p">(</span><span class="nx">compressed</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p><strong>MessagePack. LZ4 compression. Then AES-CFB.</strong></p>
<p>I had been encrypting raw JSON. The whole time.</p>
<p>The struct tags confirmed it:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">type</span><span class="w"> </span><span class="nx">SessionState</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">CreatedAt</span><span class="w"> </span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span><span class="w"> </span><span class="s">`msgpack:&#34;ca,omitempty&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">ExpiresOn</span><span class="w"> </span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span><span class="w"> </span><span class="s">`msgpack:&#34;eo,omitempty&#34;`</span><span class="w">  </span><span class="c1">// &#34;eo&#34;, not &#34;ea&#34; as I&#39;d assumed</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">AccessToken</span><span class="w"> </span><span class="kt">string</span><span class="w">   </span><span class="s">`msgpack:&#34;at,omitempty&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">Email</span><span class="w">      </span><span class="kt">string</span><span class="w">    </span><span class="s">`msgpack:&#34;e,omitempty&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">User</span><span class="w">       </span><span class="kt">string</span><span class="w">    </span><span class="s">`msgpack:&#34;u,omitempty&#34;`</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Even the ExpiresOn field name was different from what I&rsquo;d guessed (<code>&quot;eo&quot;</code> not <code>&quot;ea&quot;</code>).</p>
<p>I added the <code>vmihailenco/msgpack</code> and <code>pierrec/lz4</code> dependencies, rewrote the encoding pipeline: msgpack → lz4 → AES-CFB(raw-string key) → base64url(encrypted) → sign with HMAC.</p>
<p>Ran the curl test. <strong>HTTP 200.</strong></p>
<p>After three days and four complete rewrites of the encoding logic, oauth2-proxy accepted the forged session.</p>
<hr>
<h2 id="the-access-token-problem">The access token problem</h2>
<p>Celebrating was premature. The browser test worked from curl, but real ForwardAuth requests kept failing intermittently. Looking at the logs:</p>
<pre tabindex="0"><code>Error loading cookied session: session is invalid
</code></pre><p>This came from <code>validateSession</code> in the storedSessionLoader — after successfully loading the session, it was calling the provider&rsquo;s <code>ValidateSession</code> method and getting false back. I checked the GitLab provider:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">p</span><span class="w"> </span><span class="o">*</span><span class="nx">GitLabProvider</span><span class="p">)</span><span class="w"> </span><span class="nf">ValidateSession</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">Context</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">sessions</span><span class="p">.</span><span class="nx">SessionState</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">return</span><span class="w"> </span><span class="nf">validateToken</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">p</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">AccessToken</span><span class="p">,</span><span class="w"> </span><span class="nf">makeOIDCHeader</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">IDToken</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>oauth2-proxy calls GitLab&rsquo;s <code>/oauth/token/info</code> endpoint with the access token to verify the session is still active. My forged session had an empty <code>AccessToken</code> field. Empty access token → <code>validateToken</code> returns false immediately → session rejected.</p>
<p>The fix: during the phone&rsquo;s GitLab OAuth flow, <code>exchangeCode</code> was already calling GitLab&rsquo;s token endpoint and receiving an access token, but I&rsquo;d been discarding it. I changed the function signature to return it, stored it in the session, included it in the forged session&rsquo;s <code>at</code> field.</p>
<p>The token was issued for my qr-auth GitLab app, not oauth2-proxy&rsquo;s app. But GitLab&rsquo;s <code>/oauth/token/info</code> endpoint doesn&rsquo;t check the issuing application — it just validates the token is active and returns 200. oauth2-proxy only checks for a 200 response. The token worked.</p>
<p>Everything worked.</p>
<hr>
<h2 id="the-end-session-problem--three-attempts">The End Session problem — three attempts</h2>
<h3 id="attempt-1-delete-qr_session-lock-the-qr-page">Attempt 1: Delete qr_session, lock the QR page</h3>
<p>The first End Session implementation deleted the <code>qr_session</code> key from Redis. To make this actually lock the screen, I restored the Homer proxy at <code>qr.hippotion.com</code> — the TV would show Homer via an ExternalName Kubernetes service pointing at the Homer pod, guarded by a Traefik ForwardAuth middleware that checked the <code>qr_session</code> cookie. Homer makes status API calls every ~30 seconds, which re-triggered ForwardAuth, and deleting <code>qr_session</code> meant the screen would lock within 30 seconds automatically.</p>
<p>This worked for <code>qr.hippotion.com</code>, but the <code>_oauth2_proxy</code> cookie was stateless — a signed, self-contained encrypted blob in the browser. There was no server-side record to delete. Other apps (<code>argo.hippotion.com</code>, <code>grafana.hippotion.com</code>, etc.) kept working until the 8-hour cookie expiry.</p>
<p>The TV screen was locked. The session wasn&rsquo;t.</p>
<h3 id="attempt-2-shorter-cookie-ttl">Attempt 2: Shorter cookie TTL</h3>
<p>The tempting quick fix: reduce the forged cookie&rsquo;s TTL from 8 hours to something shorter, like 30 minutes. End Session would lock the TV immediately. Other apps would expire within 30 minutes on their own.</p>
<p>Rejected. 30 minutes of residual access on a shared TV is too long, and the TTL is arbitrary — it doesn&rsquo;t match what End Session is supposed to mean.</p>
<h3 id="attempt-3-redis-backed-oauth2-proxy-sessions">Attempt 3: Redis-backed oauth2-proxy sessions</h3>
<p>The correct fix is what oauth2-proxy calls <em>persistence tickets</em>. Instead of encoding the entire session into the cookie, oauth2-proxy stores the session in Redis and puts only a ticket reference in the cookie. When the ticket is deleted from Redis, the session is gone on the next request.</p>
<p>The ticket format, from <code>pkg/sessions/persistence/ticket.go</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="c1">// ticketID format: &#34;_oauth2_proxy-&lt;hex(16 random bytes)&gt;&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nx">ticketID</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;%s-%s&#34;</span><span class="p">,</span><span class="w"> </span><span class="nx">cookieOpts</span><span class="p">.</span><span class="nx">Name</span><span class="p">,</span><span class="w"> </span><span class="nx">hex</span><span class="p">.</span><span class="nf">EncodeToString</span><span class="p">(</span><span class="nx">rawID</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// ticket string in the cookie: &#34;v2.&lt;base64url(ticketID)&gt;.&lt;base64url(ticketSecret)&gt;&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">ticket</span><span class="p">)</span><span class="w"> </span><span class="nf">encodeTicket</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nf">Sprintf</span><span class="p">(</span><span class="s">&#34;v2.%s.%s&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nx">base64</span><span class="p">.</span><span class="nx">RawURLEncoding</span><span class="p">.</span><span class="nf">EncodeToString</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">)),</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nx">base64</span><span class="p">.</span><span class="nx">RawURLEncoding</span><span class="p">.</span><span class="nf">EncodeToString</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">secret</span><span class="p">))</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">// session stored in Redis, encrypted with the *ticket* secret (not the cookie secret)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">ticket</span><span class="p">)</span><span class="w"> </span><span class="nf">saveSession</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">sessions</span><span class="p">.</span><span class="nx">SessionState</span><span class="p">,</span><span class="w"> </span><span class="nx">saver</span><span class="w"> </span><span class="nx">saveFunc</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">c</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">encryption</span><span class="p">.</span><span class="nf">NewGCMCipher</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">secret</span><span class="p">)</span><span class="w">  </span><span class="c1">// GCM, not CFB</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="c1">// ...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">ciphertext</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nf">EncodeSessionState</span><span class="p">(</span><span class="nx">c</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span><span class="w">  </span><span class="c1">// msgpack, NO lz4</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">return</span><span class="w"> </span><span class="nf">saver</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">ciphertext</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">options</span><span class="p">.</span><span class="nx">Expire</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>This is a completely different format from the cookie session:</p>
<table>
	<thead>
			<tr>
					<th></th>
					<th>Cookie session</th>
					<th>Redis session (ticket)</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Cipher</td>
					<td>AES-CFB</td>
					<td>AES-128-GCM</td>
			</tr>
			<tr>
					<td>Key</td>
					<td>cookie secret (raw string)</td>
					<td>per-session ticket secret</td>
			</tr>
			<tr>
					<td>Serialization</td>
					<td>msgpack</td>
					<td>msgpack</td>
			</tr>
			<tr>
					<td>Compression</td>
					<td>lz4</td>
					<td><strong>none</strong></td>
			</tr>
			<tr>
					<td>Storage</td>
					<td>in the cookie</td>
					<td>Redis, keyed by ticket ID</td>
			</tr>
			<tr>
					<td>Revocable</td>
					<td>no</td>
					<td>yes</td>
			</tr>
	</tbody>
</table>
<p>I rewrote the session creation to generate a random ticket ID and secret, encrypt the msgpack session with AES-GCM using the ticket secret, store it in Redis, and set the signed ticket reference as the <code>_oauth2_proxy</code> cookie.</p>
<p>I stored the ticket ID alongside the <code>qr_session</code> in Redis:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;email&#34;</span><span class="p">:</span> <span class="s2">&#34;user@example.com&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;username&#34;</span><span class="p">:</span> <span class="s2">&#34;username&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;access_token&#34;</span><span class="p">:</span> <span class="s2">&#34;...&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;oauth2_ticket_id&#34;</span><span class="p">:</span> <span class="s2">&#34;_oauth2_proxy-eeeb18501625dee77f344c0a6193d0bc&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>End Session now does two Redis deletes:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">func</span><span class="w"> </span><span class="nf">handleLogout</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">sessionID</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nf">FormValue</span><span class="p">(</span><span class="s">&#34;session_id&#34;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">ctx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nf">Context</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">if</span><span class="w"> </span><span class="nx">raw</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">rdb</span><span class="p">.</span><span class="nf">Get</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;session:&#34;</span><span class="o">+</span><span class="nx">sessionID</span><span class="p">).</span><span class="nf">Result</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="kd">var</span><span class="w"> </span><span class="nx">sd</span><span class="w"> </span><span class="nx">sessionData</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">if</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nf">Unmarshal</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">raw</span><span class="p">),</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">sd</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">sd</span><span class="p">.</span><span class="nx">OAuth2TicketID</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s">&#34;&#34;</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nx">rdb</span><span class="p">.</span><span class="nf">Del</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">sd</span><span class="p">.</span><span class="nx">OAuth2TicketID</span><span class="p">)</span><span class="w">  </span><span class="c1">// kills oauth2-proxy session</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nx">rdb</span><span class="p">.</span><span class="nf">Del</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;session:&#34;</span><span class="o">+</span><span class="nx">sessionID</span><span class="p">)</span><span class="w">  </span><span class="c1">// kills qr session</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>I configured oauth2-proxy to use Redis session storage pointing at the same Redis instance, added the Cilium network policy to allow ingress from the oauth2-proxy namespace, and removed the Homer proxy from <code>qr.hippotion.com</code> — it was no longer needed.</p>
<p>One final gotcha: <code>session_store_type = &quot;redis&quot;</code> in oauth2-proxy&rsquo;s legacy config file does nothing. There&rsquo;s no error, no warning. It silently ignores the option. The flag only works when passed as an actual CLI argument via <code>extraArgs</code> in the Helm chart values:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">extraArgs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">session-store-type</span><span class="p">:</span><span class="w"> </span><span class="l">redis</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">redis-connection-url</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;redis://qr-auth-redis:6379&#34;</span><span class="w">
</span></span></span></code></pre></div><p>After that, End Session worked correctly. Phone taps the button, ticket is deleted from Redis, the next ForwardAuth request for any app on the domain immediately redirects to the QR lock screen.</p>
<hr>
<h2 id="what-the-final-architecture-looks-like">What the final architecture looks like</h2>
<pre tabindex="0"><code>Phone: scan QR
  → /device?token=xxx → intermediate page (&#34;Continue with GitLab&#34;)
  → GitLab OAuth on phone (already logged in → direct callback)
  → /callback: exchange code → get email + access token
  → create Redis ticket: AES-128-GCM(msgpack(session), ticketSecret)
  → store ticket in Redis at &#34;_oauth2_proxy-&lt;hex&gt;&#34;
  → mark device token as authed, store ticketID in qr session

TV: poll fires
  → read qr session from Redis (has email, accessToken, ticketID)
  → set _oauth2_proxy cookie: signed ticket reference
  → set qr_session cookie
  → redirect to home.hippotion.com

Any protected app (home, argo, grafana, ...):
  → Traefik ForwardAuth → oauth2-proxy
  → oauth2-proxy reads _oauth2_proxy cookie → decodes ticket
  → looks up &#34;_oauth2_proxy-&lt;hex&gt;&#34; in Redis → decrypts session
  → validates email, access token → 200 OK

Phone: &#34;End Session&#34;
  → POST /logout with session_id
  → delete &#34;session:&lt;id&gt;&#34; from Redis (qr session gone)
  → delete &#34;_oauth2_proxy-&lt;hex&gt;&#34; from Redis (oauth2 ticket gone)
  → next ForwardAuth on TV: Redis lookup fails → redirect to login
</code></pre><p>The intermediate page on the phone (&ldquo;Continue with GitLab&rdquo; button instead of auto-redirect) was an unexpected requirement. Mobile browsers opened by the camera app often don&rsquo;t share sessions with the browser where GitLab is logged in. When you auto-redirect to GitLab in a browser with no existing session, GitLab redirects to the sign-in page. The OAuth state is stored in a session cookie that GitLab sets during the initial authorize request. On mobile, the sign-in form submission can lose this cookie due to SameSite restrictions — after sign-in, GitLab can&rsquo;t resume the OAuth flow and falls back to <code>/users/sign_in</code> with no further redirect. The intermediate page gives the user a visible moment to confirm they&rsquo;re in a browser with an active GitLab session before initiating the OAuth redirect.</p>
<hr>
<h2 id="lessons">Lessons</h2>
<p><strong>Read the source, not the docs.</strong> The docs say &ldquo;AES encryption&rdquo; without specifying the mode or how the key is derived. The source has the answer in twenty lines.</p>
<p><strong>Test at the boundary.</strong> The curl test against the ForwardAuth endpoint was the most valuable debugging step. It isolated exactly which layer was failing and gave me the real error message instead of a browser redirect loop. Without it, I&rsquo;d still be guessing.</p>
<p><strong>Format assumptions are fragile.</strong> I assumed JSON because JSON is the default for everything. oauth2-proxy uses MessagePack because it produces smaller cookies. LZ4 because it decompresses fast. AES-CFB because that&rsquo;s what was chosen when the code was written. None of this is unreasonable, but none of it is obvious from the outside.</p>
<p><strong>Two formats, same codebase.</strong> Cookie sessions and Redis ticket sessions use different ciphers, different compression, different key derivation. The GCM cipher I found first is correct — but for Redis sessions, not cookie sessions. The CFB cipher is for cookie sessions. I had the right code in the wrong place.</p>
<p><strong>Config files can silently ignore options.</strong> <code>session_store_type = &quot;redis&quot;</code> in oauth2-proxy&rsquo;s legacy config file does nothing. <code>--session-store-type=redis</code> on the command line works. No error, no warning, no indication that the option was parsed but not applied.</p>
<p><strong>Revocability requires server-side state.</strong> A self-contained encrypted cookie cannot be revoked without adding a denylist (which has its own scaling problems). If you need End Session to mean something, you need a server-side session store. oauth2-proxy supports Redis sessions precisely for this reason — the ticket design is clean and the revocation path is a single Redis delete.</p>
<p>The code is at <a href="https://github.com/janos-gyorgy/qr-device-login">github.com/janos-gyorgy/qr-device-login</a>.</p>
]]></content:encoded></item></channel></rss>