<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Gitops on hippotion</title><link>https://blog.hippotion.com/tags/gitops/</link><description>Recent content in Gitops on hippotion</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 29 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.hippotion.com/tags/gitops/index.xml" rel="self" type="application/rss+xml"/><item><title>Every Robot in My House Can Text Me Now</title><link>https://blog.hippotion.com/posts/every-robot-texts-me/</link><pubDate>Fri, 29 May 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/every-robot-texts-me/</guid><description>My house is full of automation that never told me anything — until I gave it one push bus. The first thing I taught it to do was warn me before Claude Code cuts out mid-task.</description><content:encoded><![CDATA[<h2 id="the-silence">The silence</h2>
<p>My house runs on quiet little robots. A tracker watches my kombucha ferment. A
job narrates kids&rsquo; books in Hungarian. A media stack pulls and files things. Home
Assistant minds the sensors. A dozen services, all doing their jobs, all
completely mute. When a batch finished or an import failed, I found out the same
way every time: by going to look.</p>
<p>Then the silence got expensive. Claude Code stopped dead in the middle of a task
because I&rsquo;d burned through my plan&rsquo;s usage window — no warning, no countdown,
just a wall. The information <em>existed</em>; a dashboard in my own cluster was already
polling it. It just had no way to reach my pocket.</p>
<p>So I built one thing: a push bus. One place anything in the cluster can POST to,
that actually buzzes my phone. And the first job I gave it was to warn me before
my AI assistant goes dark.</p>
<hr>
<h2 id="the-boring-part-said-honestly">The boring part (said honestly)</h2>
<p>The bus is <a href="https://ntfy.sh">ntfy</a> — a self-hosted pub/sub notifier. Picking it
took about five minutes, because self-hosting ntfy for a homelab is a thoroughly
solved problem. There are at least three off-the-shelf bridges from Prometheus
Alertmanager to ntfy. I&rsquo;m not going to pretend the bus is the clever bit.</p>
<p>What I <em>did</em> do deliberately:</p>
<ul>
<li>📦 Deployed it <strong>GitOps-native</strong> — one entry in my app-of-apps, reconciled by
Argo CD, no <code>docker run</code> anywhere.</li>
<li>🔒 Locked it to <strong>deny-all auth</strong> with bearer tokens. Security alerts ride this
bus; a world-readable topic on a public URL was a non-starter. (Which also means
it sits <em>outside</em> my usual OAuth gate — the phone app can&rsquo;t do an interactive
login flow, so ntfy does its own token auth.)</li>
<li>🏷️ Topics by severity: <code>hl-crit</code>, <code>hl-warn</code>, <code>hl-info</code>, <code>hl-event</code>. Subscribe
and mute by how much I care.</li>
</ul>
<p>Then the interesting parts showed up at the edges, where they always do.</p>
<hr>
<h2 id="edge-one-my-own-firewall-403d-me">Edge one: my own firewall 403&rsquo;d me</h2>
<p>First test, the usage producer POSTing to <code>https://ntfy.hippotion.com</code>:</p>
<pre tabindex="0"><code>HTTP 403 Forbidden
error code: 1010
</code></pre><p>That <code>1010</code> looks like ntfy rejecting my token. It isn&rsquo;t. <strong>It&rsquo;s Cloudflare.</strong>
Error 1010 means &ldquo;your browser signature is banned&rdquo; — Cloudflare&rsquo;s bot protection
took one look at a Python script&rsquo;s <code>urllib</code> User-Agent and slammed the door.</p>
<p>My own producer couldn&rsquo;t reach my own bus, because the request left the cluster,
went all the way out to my own edge, and got flagged as a bot on the way back in.</p>
<p>The fix is the architecture I should&rsquo;ve had from the start: in-cluster producers
POST to the <strong>internal</strong> service address and never touch the public internet at
all.</p>
<pre tabindex="0"><code># wrong: out to Cloudflare and back, gets bot-blocked
https://ntfy.hippotion.com/hl-warn

# right: stays inside the cluster
http://ntfy.web-ntfy.svc.cluster.local/hl-warn
</code></pre><p>The phone still uses the public URL happily — the real ntfy app carries a
signature Cloudflare trusts. Only scripts trip 1010. <strong>Lesson: your own edge is
not your friend when you&rsquo;re a script. Keep cluster traffic in the cluster.</strong></p>
<hr>
<h2 id="edge-two-the-obvious-data-source-was-lying">Edge two: the obvious data source was lying</h2>
<p>To warn me about Claude usage, the naïve move is to parse Claude Code&rsquo;s local
logs — they sit right there in <code>~/.claude/projects/.../*.jsonl</code>, token counts and
all.</p>
<p>Don&rsquo;t. Those counts are <strong>unreliable for accounting</strong> — known to undercount,
wildly, in some cases by ~100x. Every tool that parses that JSONL inherits the
bug.</p>
<p>The number that&rsquo;s actually true lives in the claude.ai usage API — the same
<code>five_hour</code> and <code>seven_day</code> windows your plan enforces against. And I already had
a service polling exactly that. So the producer is just a tiny sidecar on that
existing pod, reading its <code>/api/usage</code> over <strong>localhost</strong> (same pod — no network
policy to negotiate, no second credential, nothing else hammering claude.ai):</p>
<ul>
<li>📈 ≥80% of a window → <code>hl-warn</code> (high).</li>
<li>🚨 ≥95% → <code>hl-crit</code> (urgent).</li>
<li>🔁 One ping per window per reset cycle, escalating warn→crit, keyed on the
reset timestamp so it never spams.</li>
</ul>
<p>The first time it mattered, my phone buzzed at 80% with hours of runway left
instead of a brick wall mid-task.</p>
<hr>
<h2 id="what-id-tell-past-me">What I&rsquo;d tell past me</h2>
<p>Three things, none of them about ntfy:</p>
<ol>
<li><strong>Reuse the signal you already have.</strong> I didn&rsquo;t build a usage poller — I bolted
a sidecar onto the one already running. The smallest producer is one that reads
localhost.</li>
<li><strong>Your own edge can betray you.</strong> A firewall that protects you from bots will
happily block your own automation. In-cluster talks in-cluster.</li>
<li><strong>Check whether your data source is telling the truth</strong> before you build an
alert on it. An alert you don&rsquo;t trust is worse than no alert — you&rsquo;ll learn to
ignore it, and then it&rsquo;ll be right once.</li>
</ol>
<p>Next, the high-leverage move: point Prometheus Alertmanager at the same bus, and
every infra alert I have — plus every one I&rsquo;ll ever add — lands on the phone
through one bridge. The kombucha ping can wait. The disk-full one can&rsquo;t.</p>
<p>The house is still full of quiet robots. The difference is now they know my
number.</p>
]]></content:encoded></item><item><title>Is Anyone Knocking? A Security Pass on My Homelab</title><link>https://blog.hippotion.com/posts/is-anyone-knocking/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/is-anyone-knocking/</guid><description>I set out to answer a simple worry — is someone trying to get into my server? — and found the scarier question underneath it: if they did, would I even know? My front door was solid. The inside had an alarm with the wires cut, a web terminal sitting on the open internet, and no floor under the blast radius. Here&amp;rsquo;s the audit, and the three things I fixed.</description><content:encoded><![CDATA[<h2 id="the-question-i-actually-had">The question I actually had</h2>
<p>It started as a nervous-Sunday kind of question: <em>is a third party trying to
get into my server — over SSH, or some other way?</em> I run a single-node
Kubernetes homelab that hosts a couple dozen little apps, some of them public.
You read about credential-stuffing bots and you start to wonder who&rsquo;s been
rattling the handle while you slept.</p>
<p>So I did the audit. The good news came first, and it&rsquo;s worth saying plainly
because it&rsquo;s the part most homelabs get wrong: <strong>the front door is solid.</strong>
Nothing is reachable from the internet except through a Cloudflare Tunnel —
an outbound-only connection, zero open inbound ports on my router. Almost
every service sits behind OAuth. The cluster has 140 network policies doing
real east-west segmentation. And the login history? Eleven straight weeks
where every single shell login came from one IP — my own workstation on the
LAN. No strangers. No 3 a.m. logins from a VPS in another hemisphere.</p>
<p>I could have stopped there feeling good. That would have been a mistake.</p>
<h2 id="the-scary-finding-wasnt-an-attacker">The scary finding wasn&rsquo;t an attacker</h2>
<p>The useful question turned out not to be <em>&ldquo;is someone knocking?&rdquo;</em> but
<em>&ldquo;if someone got in, would anything tell me?&rdquo;</em> And when I traced that wire,
it ended in the dark.</p>
<p>I have a full monitoring stack — Prometheus, Grafana, Alertmanager, the works.
Alertmanager was running. It was also configured to notify exactly <strong>no one</strong>:
no receivers, and upstream, <strong>no alert rules at all</strong>. It was a smoke detector
with the battery taken out and, for good measure, no smoke sensor either. If an
attacker had walked in, the alarm would have stayed perfectly, silently green.</p>
<p>That reframed the whole job. Three gaps, in priority order.</p>
<h2 id="gap-1--an-alarm-with-no-one-to-call">Gap 1 — an alarm with no one to call</h2>
<p>I built the missing chain end to end. A small exporter on the host parses the
SSH journal and <code>fail2ban</code> state and writes metrics into node_exporter&rsquo;s
textfile collector — so it rides the monitoring I already had instead of adding
a new moving part. On top sit the alert rules that were never there. The one
that matters most is blunt:</p>
<blockquote>
<p><strong>A shell login succeeded from a non-LAN IP.</strong></p>
</blockquote>
<p>That should be impossible in normal life, so if it ever fires, I want it
shouting. It now emails me the instant it happens, alongside quieter alerts for
brute-force spikes, distributed scans, <code>fail2ban</code> going down, and — the
meta-alert I&rsquo;m fondest of — <em>the watchdog itself going stale</em>, because a
security monitor that silently dies is worse than none. And <code>fail2ban</code> now
actually bans the bots, with escalating ban times and my LAN permanently on the
allow-list.</p>
<p>The honest lesson: I&rsquo;d been treating &ldquo;I have Prometheus&rdquo; as if it meant &ldquo;I have
monitoring.&rdquo; Dashboards you have to remember to look at are not monitoring.
<strong>Monitoring is the thing that interrupts you.</strong> Until an alert can reach your
phone, you don&rsquo;t have a security alarm — you have a security <em>museum</em>.</p>
<h2 id="gap-2--there-was-a-web-terminal-on-the-open-internet">Gap 2 — there was a web terminal on the open internet</h2>
<p>This is the one that made me wince. Among my public hostnames was <code>ttyd</code> — a
browser-based shell. A full terminal on my server, reachable from anywhere,
sitting behind a single OAuth proxy. One misconfiguration, one OAuth bypass,
and that&rsquo;s not &ldquo;an app is compromised,&rdquo; that&rsquo;s <em>root on the box from a browser
tab.</em></p>
<p>The fix here isn&rsquo;t more locks. It&rsquo;s the realization that <strong>the strongest
control is not exposing the thing at all.</strong> I deleted the web terminal
entirely — app, manifests, dashboard tile, all of it. Then I went down the
public hostname list and pulled everything with no business being public off
the tunnel: the secrets UI, the ingress dashboard, Prometheus, Alertmanager,
the network-observability console, the DNS admin. They still work — on my LAN,
over the same wildcard cert — they&rsquo;re just not the internet&rsquo;s business anymore.
A service that isn&rsquo;t exposed has no attack surface to harden.</p>
<h2 id="gap-3--no-floor-under-the-blast-radius">Gap 3 — no floor under the blast radius</h2>
<p>The network policies limit how far a compromised pod can talk sideways. But
nothing stopped a workload from running as root, mounting the host filesystem,
or grabbing the host network in the first place. So I turned on Kubernetes'
built-in Pod Security Admission: every namespace now at least <em>reports</em>
baseline violations, and the clean app namespaces <em>enforce</em> baseline —
meaning a compromised app there simply cannot request privileged mode or a
hostPath mount. It&rsquo;s a floor. Floors are underrated.</p>
<h2 id="what-the-audit-was-really-about">What the audit was really about</h2>
<p>I went looking for an intruder and didn&rsquo;t find one — the logs were clean, the
front door held. What I found instead was that I&rsquo;d built something secure at
the perimeter and then never asked the uncomfortable follow-up: <em>what happens
after the perimeter?</em> The answer had been &ldquo;nothing happens, and no one is
told,&rdquo; and I just hadn&rsquo;t looked.</p>
<p>Three principles I&rsquo;m taking with me:</p>
<ul>
<li><strong>An alarm that can&rsquo;t reach you is decoration.</strong> Wire the notification first;
the rules are easy once something is listening.</li>
<li><strong>Don&rsquo;t expose it beats add more auth.</strong> Every hostname you take off the
public internet is a class of attack you no longer have to be clever about.</li>
<li><strong>Give the blast radius a floor.</strong> Assume one thing gets popped, and decide
in advance how far it gets.</li>
</ul>
<p>The best part: all of it is GitOps. The intrusion alerts, the un-exposing, the
pod-security floor — every change is a commit, reviewable and revertible, and
my cluster reconciles itself to match. The audit didn&rsquo;t just make the homelab
safer. It wrote down <em>why</em> it&rsquo;s safer, in a form the next version of me can
read.</p>
<p>Now if someone knocks, I&rsquo;ll know. And the web terminal isn&rsquo;t answering the
door anymore — because it&rsquo;s gone.</p>
]]></content:encoded></item><item><title>I Run GitOps for My Brain</title><link>https://blog.hippotion.com/posts/gitops-for-my-brain/</link><pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/gitops-for-my-brain/</guid><description>An AI agent on a scheduled idle walk through my notes pointed out that I&amp;rsquo;d built the same architecture three times — at work, in my homelab, and in my second brain — and that the third copy was missing the part that makes GitOps work. It was right. So we shipped the missing piece the same day.</description><content:encoded><![CDATA[<h2 id="the-pattern-i-didnt-know-i-had">The pattern I didn&rsquo;t know I had</h2>
<p>This week an AI agent told me something about my own systems that I&rsquo;d never
noticed, and it was correct: I have one favorite architecture, and I&rsquo;ve built
it three times.</p>
<ul>
<li><strong>At work</strong>: git holds Terraform code → Terraform derives the S3 buckets.
Nobody clicks around in the AWS console; the repo is the truth.</li>
<li><strong>In the homelab</strong>: git holds Kubernetes manifests → ArgoCD derives the
cluster. Every app on my rack is a folder in a repo.</li>
<li><strong>In my second brain</strong>: a vault of markdown notes → an indexer derives the
search database (SQLite FTS + a link graph) that my AI tools query.</li>
</ul>
<p>Same shape everywhere: a plain-text source of truth in git, and a machine that
builds the real thing from it. Master copy, derived state. I never decided
this consciously — it&rsquo;s just how my hands build things now.</p>
<h2 id="gitops-isnt-the-git-part">GitOps isn&rsquo;t the git part</h2>
<p>Here&rsquo;s the thing that the third copy got wrong, and it took me embarrassingly
long to see because I <em>teach</em> this pattern at the infrastructure layer.</p>
<p>&ldquo;Configuration in git&rdquo; existed long before GitOps. What made GitOps an actual
shift was the <strong>reconciler</strong>: ArgoCD doesn&rsquo;t apply your manifests once and
wish you luck. It watches, continuously. When the cluster drifts from the
repo, you get an <code>OutOfSync</code> badge, and with <code>selfHeal</code> enabled it puts
reality back where the repo says it should be. The loop is the product. Git
is just where the loop points.</p>
<p>My vault had no loop. If I edited a note and forgot to rebuild the index, the
search results my AI agents rely on were silently stale — no badge, no error,
nothing. The only protection was a rule in the repo&rsquo;s agent instructions:
<em>&ldquo;if files and index disagree, the files win — run the indexer.&rdquo;</em></p>
<p>A policy that agents must remember. In other words: I was running Kubernetes
with a sticky note on the monitor that says <em>please redeploy after editing
the YAML</em>. I would never accept that on my cluster. My brain ran on it for
months.</p>
<h2 id="the-fix-took-an-afternoon">The fix took an afternoon</h2>
<p>Two pieces, both boring on purpose.</p>
<p><strong><code>exo status</code></strong> — the OutOfSync badge. The indexer now stores a content hash
per note; <code>status</code> re-hashes the vault and diffs:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;status&#34;</span><span class="p">:</span> <span class="s2">&#34;OutOfSync&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;modified&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;vault/10-notes/interests-themes.md&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;new&#34;</span><span class="p">:</span> <span class="p">[],</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;deleted&#34;</span><span class="p">:</span> <span class="p">[],</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;repair&#34;</span><span class="p">:</span> <span class="s2">&#34;exo index&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>Exit code 0 when synced, 1 when not — so scripts and CI can ask the question
too, exactly like <code>argocd app get</code>.</p>
<p><strong>Git hooks</strong> — the selfHeal. Versioned hooks (<code>core.hooksPath .githooks</code>) on
<code>post-commit</code> and <code>post-merge</code> rebuild the index after every commit and pull:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sh" data-lang="sh"><span class="line"><span class="cl"><span class="nb">command</span> -v exo &gt;/dev/null 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="o">||</span> <span class="nb">exit</span> <span class="m">0</span>
</span></span><span class="line"><span class="cl"><span class="nv">EXO_ROOT</span><span class="o">=</span><span class="s2">&#34;</span><span class="k">$(</span>git rev-parse --show-toplevel<span class="k">)</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">exo index &gt;/dev/null 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="o">&amp;&amp;</span> <span class="nb">echo</span> <span class="s2">&#34;exo: index reconciled (Synced)&#34;</span>
</span></span></code></pre></div><p>Now every <code>git commit</code> in the vault prints <code>exo: index reconciled (Synced)</code>
on its way out. The rule didn&rsquo;t change — <em>files win</em> — but it stopped being
something agents must remember and became something a machine enforces.
That&rsquo;s the entire difference between configuration management and GitOps,
replayed at the knowledge layer.</p>
<h2 id="the-part-where-it-gets-a-little-strange">The part where it gets a little strange</h2>
<p>The reason I&rsquo;m writing this post at all: I didn&rsquo;t have this idea. A scheduled
agent did, on what I can only describe as an idle walk.</p>
<p>My vault has a weekly cron job — we call it the Wanderer — that samples pairs
of notes that are <em>far apart</em>: different folders, different months, almost no
shared vocabulary. A headless Claude gets the pairs with exactly one task:
<em>read both notes in full and say whether anything genuinely connects. &ldquo;Nothing
connects&rdquo; is a successful run.</em> That last sentence is load-bearing — the run
always reports its result either way, so the agent never needs to manufacture
a finding to have done its job.</p>
<p>On its very first walk, it collided a work note about Terraform-driven S3
provisioning with the architecture map of the vault itself, and wrote: <em>same
sentence in different clothes — and the brain copy is missing its
reconciler.</em> Then it listed the two fixes you just read about.</p>
<p>Retrieval answers the questions you ask. Distant collisions surface the
questions you didn&rsquo;t know you had. It turns out my second brain didn&rsquo;t need
to get better at remembering — it needed to occasionally interrupt me.</p>
<h2 id="if-you-keep-a-vault">If you keep a vault</h2>
<p>Whatever your stack — Obsidian, org-mode, a folder of markdown — if anything
<em>derives</em> from your notes (an index, embeddings, a published site), then you
have source of truth and derived state, and the GitOps question applies: <strong>who
notices when they drift?</strong> If the answer is &ldquo;I do, hopefully,&rdquo; you&rsquo;re running
the sticky-note era. Give it a badge and a loop. It&rsquo;s an afternoon.</p>
]]></content:encoded></item><item><title>Mind the gap: I pointed monitoring at my own skill set</title><link>https://blog.hippotion.com/posts/mind-the-gap-skill-radar/</link><pubDate>Fri, 27 Mar 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/mind-the-gap-skill-radar/</guid><description>A rejection isn&amp;rsquo;t actionable data. So an n8n workflow now extracts skill demand from live job listings, diffs it against what I can prove, and renders the gap as a dashboard — deployed like everything else here: via git push.</description><content:encoded><![CDATA[<p>A while back I applied for a senior platform role at n8n and didn&rsquo;t land it. Fair enough — but
&ldquo;fair enough&rdquo; isn&rsquo;t actionable. Rejections come with no logs, no metrics, no trace. For someone
who runs thirty-odd services with full observability, having <em>vibes</em> as the only instrumentation
on my own career felt architecturally embarrassing.</p>
<p>So I built <strong>mind-the-gap</strong>: a pipeline that measures what the market demands, diffs it against
what I can prove, and renders the gap as a private dashboard on my cluster. The job hunt is now a
monitored system. This post is about the non-obvious decisions.</p>
<h2 id="demand-an-llm-reads-job-listings-so-i-dont-have-to">Demand: an LLM reads job listings so I don&rsquo;t have to</h2>
<p>I already had <a href="/posts/ats-job-poller/">a job poller</a> — an n8n workflow that polls the public ATS
APIs (Greenhouse / Lever / Ashby) of ~33 companies plus a broad remote-jobs feed every six hours.
A sibling workflow now re-fetches the same boards and, for every listing that passes the
role+location gate, asks a small hosted LLM (Llama-3.1-8B) for a structured extraction:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span><span class="nt">&#34;seniority&#34;</span><span class="p">:</span> <span class="s2">&#34;senior&#34;</span><span class="p">,</span> <span class="nt">&#34;skills&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;kubernetes&#34;</span><span class="p">,</span> <span class="nt">&#34;importance&#34;</span><span class="p">:</span> <span class="s2">&#34;must&#34;</span><span class="p">},</span> <span class="err">...</span><span class="p">]}</span>
</span></span></code></pre></div><p>One row per <em>(job, skill)</em> lands in an n8n Data Table. Decisions that mattered:</p>
<ul>
<li><strong>One LLM call per job, not one batch.</strong> Free-tier inference times out on batches; per-job calls
are slower but fail independently. A lesson the poller already paid for.</li>
<li><strong>Insert doubles as the processed-marker.</strong> A job whose extraction fails to parse produces no
rows — so it&rsquo;s retried next run, for free. No status column, no second table.</li>
<li><strong>Canonicalization in code, not in the prompt.</strong> The model says &ldquo;K8s&rdquo;, &ldquo;k3s&rdquo;, &ldquo;EKS&rdquo; on
different days regardless of instructions. A dumb alias map (<code>k8s→kubernetes</code>, <code>eks→aws</code>)
beats prompt engineering for consistency.</li>
<li><strong>8B is good enough — with a guard.</strong> It occasionally echoed the seniority enum back literally
(<code>&quot;junior|mid|senior|staff|lead|unspecified&quot;</code>). The fix is one line of validation, not a bigger
model.</li>
</ul>
<h2 id="supply-no-artifact-no-credit">Supply: no artifact, no credit</h2>
<p>The other side of the diff is a skills registry — markdown in my knowledge vault, with a
machine-parseable YAML block. Every skill has a state, and the rule that keeps the whole thing
honest is brutal: <strong>a skill counts as <code>proven</code> only if an artifact exists</strong> — a public repo, a
blog post, documented production experience. Otherwise it&rsquo;s <code>claimed</code>, and claimed earns half
credit.</p>
<p>That rule immediately produced the most useful insight of the project: <strong>&ldquo;invisible skill&rdquo; is a
real category.</strong> Python turned out to be the market&rsquo;s #5 ask. I use it constantly — and could
point to nothing public that shows it. The cheapest score increase isn&rsquo;t learning something new;
it&rsquo;s a weekend making an existing skill visible. No gut-feeling gap analysis would have ranked
&ldquo;write about what you already do&rdquo; above &ldquo;learn the shiny thing.&rdquo;</p>
<h2 id="the-score-distinct-companies-not-mentions">The score: distinct companies, not mentions</h2>
<p>First naive aggregation: Canonical&rsquo;s listings mention Ubuntu <em>nine times, all marked must-have</em> —
suddenly Ubuntu looks like the hottest skill in Europe. Employer skew is the noise floor of small
samples. The fix: demand weight = <strong>distinct companies naming the skill</strong>, not total mentions.
One enthusiastic employer can&rsquo;t move the radar.</p>
<p>Two more scoring rules I&rsquo;d defend in review:</p>
<ul>
<li>Skills named by fewer than two companies don&rsquo;t count at all — single-listing noise stays out.</li>
<li>Demand the registry hasn&rsquo;t classified yet shows up as &ldquo;unreviewed&rdquo; and <strong>counts fully against
the score</strong>. An unreviewed market signal is a gap until proven otherwise; the dashboard nags me
to triage it.</li>
</ul>
<h2 id="rendering-the-page-is-a-git-commit">Rendering: the page is a git commit</h2>
<p>The dashboard is a single static HTML file, and the pipeline that produces it never touches the
cluster. <code>render.js</code> lives in this repo as the single source of truth; a nightly n8n workflow
fetches it raw from GitLab, <code>eval()</code>s it against the Data Table rows and the registry, and — only
if the result differs from what&rsquo;s committed (timestamps stripped, or every night is a &ldquo;change&rdquo;) —
PUTs the new <code>index.html</code> back via the GitLab API.</p>
<p>Serving is the same pattern as this blog: nginx plus a git-pull sidecar, deployed by Argo CD,
behind the cluster&rsquo;s OAuth middleware. The renderer has no kubeconfig, no SSH, no cluster access
of any kind. <strong>GitLab stays the only source of truth — even for a page that rewrites itself
nightly.</strong> If the workflow goes rogue, the worst it can do is a reviewable commit.</p>
<h2 id="day-one-verdict">Day-one verdict</h2>
<p>First run: 2,297 postings fetched, 25 in scope, 257 skill rows. Coverage score: <strong>63%</strong>.
Kubernetes and AWS tied at the top of demand — which means the AWS gap-closing project already in
flight stopped being a hunch and became the measured top of the market. Go is the only top-ten
demand with zero supply. The dashboard doesn&rsquo;t get anyone a job; it just makes sure every learning
Saturday is pointed where the data says, not where the hype does.</p>
<p>The job board rejected me. The data didn&rsquo;t.</p>
<hr>
<p><em>Workflows, render.js, and setup: <a href="https://github.com/janos-gyorgy/mind-the-gap">github.com/janos-gyorgy/mind-the-gap</a>.</em></p>
]]></content:encoded></item><item><title>🌱 My Second Brain Weeds Itself Now</title><link>https://blog.hippotion.com/posts/an-ai-gardener-for-your-second-brain/</link><pubDate>Fri, 27 Feb 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/an-ai-gardener-for-your-second-brain/</guid><description>I gave my markdown knowledge base a nightly gardener — an AI that finds orphan notes and missing links and fixes them, every change a reviewable git commit. The fun part was the Kubernetes wall I hit on the way.</description><content:encoded><![CDATA[<p>A few weeks ago I <a href="/posts/a-second-brain-you-can-git-clone/">rebuilt my second brain as a folder of markdown in git</a> — vault is the source of truth, everything else (search index, graph, 3D viewer) is a derived layer I can delete and rebuild. I love it. But a knowledge base has a dirty secret: <strong>it rots.</strong></p>
<p>Not the files — those are fine. The <em>connections</em> rot. You capture a note at 11pm and never link it to anything, so it becomes an orphan floating off the graph. A project note&rsquo;s one-line summary describes what the project was three weeks ago. Two notes are obviously about the same thing and neither knows the other exists. Do this for a few months and you don&rsquo;t have a second brain, you have a junk drawer with good search.</p>
<p>The honest fix is to weed the garden regularly. The honest truth is that nobody does, including me.</p>
<p>So I stopped relying on myself and built a gardener.</p>
<h2 id="what-it-actually-does">What it actually does</h2>
<p>Every night at 3am, on my homelab box, a script runs:</p>
<ol>
<li><strong>Detect</strong> — <code>exo garden</code>, a plain query over the index, produces a report: here are the orphans, here are notes that should probably link to each other, here are summaries that look stale. <strong>No AI in this step.</strong> It&rsquo;s SQL and graph traversal. Deterministic, boring, trustworthy.</li>
<li><strong>Decide and write</strong> — that report gets piped to <code>claude -p</code> (Claude Code in headless mode). Claude reads the vault&rsquo;s operating contract, makes <em>only high-confidence</em> edits — add a <code>[[wikilink]]</code> between two genuinely related notes, refresh a stale summary — caps itself at ~10 notes a night, and writes a dated log note explaining exactly what it changed and what it deliberately skipped.</li>
<li><strong>Commit</strong> — the wrapper reindexes and lands everything as a single <code>garden: 2026-06-09 …</code> git commit, then pushes. My 3D graph viewer picks it up on the next sync.</li>
</ol>
<p>The first real run, it found one orphan (<code>90-meta/README</code>), linked it into the notes it actually indexes, and then — this is the part I liked — <em>declined</em> to touch the 12 &ldquo;stale summary&rdquo; candidates because, on inspection, every one of them was already accurate. It wrote: <em>&ldquo;flagged by length, not staleness; churning them would add noise.&rdquo;</em> A gardener that knows when <strong>not</strong> to prune is the one you can leave alone.</p>
<h2 id="isnt-this-a-solved-problem">&ldquo;Isn&rsquo;t this a solved problem?&rdquo;</h2>
<p>Mostly, no — but partly, yes, and I want to be straight about it. AI-assisted note-linking exists: Obsidian plugins like Smart Connections suggest related notes, and apps like Mem and Reflect auto-organize as you write. They&rsquo;re good.</p>
<p>Three things make this different enough to build:</p>
<ul>
<li><strong>Every change is a reviewable git diff, authored by a named agent.</strong> Not silent magic that rearranges your notes while you&rsquo;re not looking. <code>git log -p</code> shows you exactly what the gardener did last night; <code>git revert</code> undoes a bad night in one command. For something as personal as a knowledge base, &ldquo;show me the diff&rdquo; beats &ldquo;trust me.&rdquo;</li>
<li><strong>It&rsquo;s mine, end to end.</strong> Runs on my hardware, on my schedule, with a model I point at. No SaaS holds my brain hostage.</li>
<li><strong>The detection is deterministic; the model only acts.</strong> The LLM never decides <em>what&rsquo;s wrong</em> — a boring query does that. The model only decides <em>how to fix the things already found</em>. That split keeps the whole thing auditable and cheap.</li>
</ul>
<p>If you already live in a tool that does this and you trust it, great. I wanted the git-diff trail and the local control.</p>
<h2 id="the-part-i-actually-want-to-tell-you-about">The part I actually want to tell you about</h2>
<p>The plan was tidy: I run n8n on the same cluster, so n8n would be the scheduler — fire nightly, <strong>SSH into the node</strong>, run the gardener. Clean, visual, one workflow.</p>
<p>n8n could not reach the node. At all. Every port: <code>ECONNREFUSED</code>.</p>
<p>This sent me down a genuinely interesting hole, because the homelab runs <strong>Cilium</strong> for networking, and Cilium has opinions about your own node that plain Kubernetes does not.</p>
<p>First instinct: a NetworkPolicy allowing egress to the node&rsquo;s IP. Wrote it, synced it, still refused. The reason is a Cilium subtlety worth knowing: <strong>the node isn&rsquo;t a CIDR, it&rsquo;s an identity.</strong> Cilium classifies your cluster&rsquo;s own node as the special <code>host</code> identity, and ordinary <code>ipBlock</code> CIDR rules <em>do not match it</em> unless you flip a cluster-wide setting (<code>policy-cidr-match-mode: nodes</code>). My <code>192.168.0.109/32</code> rule was a no-op.</p>
<p>So I switched to the Cilium-native tool: a <code>CiliumNetworkPolicy</code> with <code>toEntities: [host]</code>. Confirmed it applied — I could see <code>reserved:host</code> allowed right there in the datapath&rsquo;s BPF policy map. I confirmed the node&rsquo;s IP really does resolve to identity <code>1</code> (host). I confirmed the host firewall was <em>disabled</em>. Everything said &ldquo;allowed.&rdquo;</p>
<p>Still <code>ECONNREFUSED</code>.</p>
<p>That&rsquo;s the wall. The packet leaves the pod with Cilium&rsquo;s blessing, hits the host&rsquo;s own network stack, and <em>something there</em> sends a reset — and I couldn&rsquo;t see what, because inspecting the host firewall needs root, and this automation deliberately doesn&rsquo;t have it. I could have kept digging with a password. But I stopped and asked a better question: <strong>why am I making a pod reach back into the host it&rsquo;s running on at all?</strong></p>
<p>That&rsquo;s an awkward direction. The work has to happen <em>on</em> the host (that&rsquo;s where the vault, git creds, and Claude live). A pod straining to SSH into its own node is fighting the grain of the platform.</p>
<p>So I inverted it. <strong>The node schedules itself</strong> — a plain cron entry, rock-solid, no network gymnastics. And n8n, instead of <em>triggering</em> the job, <em>receives</em> it: at the end of each run the node POSTs a summary to an n8n webhook. Node→n8n works perfectly (it&rsquo;s just an outbound HTTPS call to a URL). n8n keeps the run history and is the place I&rsquo;ll later wire a phone notification.</p>
<p>I lost nothing that mattered. n8n is still my dashboard; the schedule just lives where the work lives. And I deleted the SSH key and the network-policy hole I&rsquo;d opened — the cleanup felt better than the original plan would have.</p>
<h2 id="the-lesson-such-as-it-is">The lesson, such as it is</h2>
<p>Two, actually.</p>
<p><strong>One:</strong> when you&rsquo;re automating something to run unattended, the bug you want to find is the one that shows up in a <em>dry run at 2pm</em>, not at <em>3am three weeks from now</em>. I almost shipped a version where a brand-new note (untracked by git) was invisible to my change-detection and would&rsquo;ve been silently wiped each night. The dry run caught it. Always build the dry run.</p>
<p><strong>Two, the bigger one:</strong> I spent an hour trying to make a pod punch into its host because that was <em>my</em> plan, and the platform kept saying no in increasingly specific ways. The fix wasn&rsquo;t a cleverer NetworkPolicy. It was noticing I was pushing against the design and turning around. The node scheduling itself and <em>reporting up</em> to n8n is simpler, safer, and more honest about where the work actually lives.</p>
<p>My brain weeds itself now. Every morning there&rsquo;s maybe one small, sensible commit waiting — a link I&rsquo;d have never made, a summary nudged back to true — and I can read exactly what changed before my coffee&rsquo;s done. That&rsquo;s the whole dream of a second brain that isn&rsquo;t a junk drawer: it stays a garden, and I barely have to touch it.</p>
]]></content:encoded></item><item><title>🧠 A Second Brain You Can `git clone`</title><link>https://blog.hippotion.com/posts/a-second-brain-you-can-git-clone/</link><pubDate>Fri, 16 Jan 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/a-second-brain-you-can-git-clone/</guid><description>My first second brain died the way most do — on multi-device sync. The rebuild: plain markdown as the source of truth, every clever layer derived and disposable, and an AI that tends it through reviewable git diffs.</description><content:encoded><![CDATA[<h2 id="the-graveyard-of-second-brains">The graveyard of second brains</h2>
<p>I had a second brain once. Obsidian vault, a CouchDB LiveSync backend, even a
weekly agent that summarised my notes. It worked — for a while. Then the sync
started fighting itself across my laptop, the homelab, and my phone, and the day
syncing becomes a chore is the day you stop opening the thing. The notes were
still there. I just never looked at them again.</p>
<p>That&rsquo;s how most second brains die. Not from bad notes — from the <em>plumbing</em>. The
sync breaks, or the upkeep outpaces the payoff, or the whole thing is trapped in
one app&rsquo;s database and moving it feels like surgery. The knowledge was never the
problem. The container was.</p>
<p>So when I rebuilt it, I started from the failure modes, not the features.</p>
<h2 id="what-i-actually-wanted">What I actually wanted</h2>
<p>Three things, none of them &ldquo;more notes&rdquo;:</p>
<ol>
<li><strong>Memory I share with my AIs.</strong> Every time I open a fresh Claude session, it
starts from zero — I re-explain my homelab, my projects, what we decided last
week. I wanted a place both of us read <em>and</em> write, so the context survives the
session.</li>
<li><strong>Something that outlives any tool.</strong> No lock-in. If the app of the month dies,
my brain shouldn&rsquo;t die with it.</li>
<li><strong>Sync that can&rsquo;t rot.</strong> The thing that killed v1.</li>
</ol>
<h2 id="the-one-decision-that-matters">The one decision that matters</h2>
<p><strong>The store and the intelligence are different layers, and only the store is
sacred.</strong></p>
<p>The store is a folder of plain markdown in git. That&rsquo;s it. Human-readable, diffable,
greppable, yours. Everything clever sits <em>above</em> it and is fully rebuildable:</p>
<pre tabindex="0"><code>L5  Visualisation   3D graph, Obsidian, whatever reads markdown
L4  Automation      scheduled &#34;gardener&#34; runs
L3  Agent interface MCP servers — search, graph, note CRUD
L2  Index           SQLite: full-text + vectors + materialised edges
L1  Structure       typed frontmatter + [[wikilinks]]
L0  Substrate       markdown files in git   ← the only thing that&#39;s truth
</code></pre><p>Delete L1–L5 and nothing is lost — you rebuild them from L0 with one command.
That property is the whole design. The index can corrupt, the embedding model can
change, the viewer can break (mine did, spectacularly — that&rsquo;s another post), and
the knowledge doesn&rsquo;t care. It&rsquo;s text in git.</p>
<p>And <strong>sync is just <code>git pull</code>.</strong> No LiveSync daemon to wedge itself, no proprietary
replication. The exact thing that killed v1 is now the most boring, battle-tested
part of the stack. Three devices, one <code>git pull</code>, done.</p>
<h2 id="search-that-explains-itself">Search that explains itself</h2>
<p>The retrieval layer is deliberately not &ldquo;throw it all at embeddings.&rdquo; It fuses
three signals — keyword (BM25), vector similarity, and graph expansion (pull in
the neighbours of strong hits) — and every result reports <em>which signals fired</em>.</p>
<pre tabindex="0"><code>exo search &#34;hybrid retrieval&#34;
→ hybrid-retrieval   matched_on: [bm25, graph]
</code></pre><p>That <code>matched_on</code> matters more than it looks. An embeddings-only system gives you
a ranked list and no reason — you can&rsquo;t tell a real match from a vibe. For a brain
I&rsquo;m supposed to trust over years, &ldquo;why did this surface?&rdquo; is a feature, not a
nicety.</p>
<h2 id="the-ai-is-a-librarian-not-a-hoarder">The AI is a librarian, not a hoarder</h2>
<p>Here&rsquo;s the part I care about most. The AI doesn&rsquo;t just <em>read</em> the brain — it
writes to it. Through an MCP server it can search, walk the graph, and author
notes. But under a hard rule: <strong>every write is a reviewable git diff.</strong></p>
<p>It searches before it writes (extend a note, don&rsquo;t spawn a duplicate). It links
instead of piling. A scheduled &ldquo;gardener&rdquo; pass finds orphaned notes and stale
summaries and proposes fixes — as commits I can read and <code>git revert</code> if it gets
something wrong. No black-box mutation of my memory. Just a librarian that files
things while I&rsquo;m asleep and leaves a paper trail.</p>
<p>So now &ldquo;what am I building?&rdquo; is a question with an instant, honest answer: a single
map note, kept current, that every project links into. I ask, the AI pulls it, and
neither of us has to remember.</p>
<h2 id="why-not-just">Why not just…</h2>
<ul>
<li><strong>Obsidian alone?</strong> It&rsquo;s a lovely <em>viewer</em> — and I still use it as one. But it
can&rsquo;t give an agent structured read/write or explainable retrieval, and its sync
is what burned me. Here Obsidian reads the same markdown; it&rsquo;s a window, not the
house.</li>
<li><strong>Embeddings RAG?</strong> Opaque and one-directional. It can rank, but it can&rsquo;t tell
you why, and it can&rsquo;t write back. This is transparent and bidirectional.</li>
<li><strong>Notion / a SaaS brain?</strong> Lock-in by design. <code>git clone</code> is my backup and any
text editor is my fallback.</li>
<li><strong>A graph database?</strong> Unnecessary infra. The graph lives in the wikilinks; SQLite
just materialises it. I&rsquo;ll add Neo4j the day my queries actually outgrow a single
file, and not a day sooner.</li>
</ul>
<h2 id="what-it-changes">What it changes</h2>
<p>The vault is small still — that&rsquo;s fine; it grows by use. But the loop already
pays off: I work, the AI checkpoints decisions into markdown, and the <em>next</em>
session — fresh model, no memory of its own — searches the brain and is caught up
in seconds. The knowledge stopped living only in my head and in dead chat logs.</p>
<p>I&rsquo;m a team of one. There&rsquo;s no colleague who remembers why I made a call six months
ago, no handover doc someone else maintains. Continuity isn&rsquo;t a nice-to-have; it&rsquo;s
the whole job. A second brain that the AI helps keep alive — and that I can
<code>git clone</code> onto any machine in thirty seconds — is the first version of this idea
that I actually trust to still be here in five years.</p>
<p>The notes from v1? They&rsquo;re sitting in a folder, waiting to be triaged into v2. This
time I&rsquo;ll still be opening it.</p>
]]></content:encoded></item><item><title>📦 Five Ways to Manage Kubernetes Manifests (and Why They're Not All Equal)</title><link>https://blog.hippotion.com/posts/gitops-manifest-approaches/</link><pubDate>Fri, 10 Oct 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/gitops-manifest-approaches/</guid><description>Raw YAML, Kustomize, Helm, Jsonnet — there&amp;rsquo;s more than one way to describe what you want running in a cluster. Here&amp;rsquo;s what each actually looks like in practice and where each one breaks.</description><content:encoded><![CDATA[<h2 id="the-problem-everyone-hits">The problem everyone hits</h2>
<p>You&rsquo;ve got a Kubernetes cluster. Now you need to describe what should run in it. You write some YAML, apply it, it works.</p>
<p>Then you need a second environment. Or a second service. Or someone else joins the project and asks &ldquo;how do I add an app to this?&rdquo; and you don&rsquo;t have a good answer.</p>
<p>This is the manifest management problem, and there are five common solutions — ranging from &ldquo;this works until it doesn&rsquo;t&rdquo; to &ldquo;this is what production platforms actually look like.&rdquo;</p>
<hr>
<h2 id="approach-1-raw-manifests">Approach 1: Raw manifests</h2>
<p>The starting point for almost everyone. Write a YAML file, <code>kubectl apply -f</code>, done.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">apps/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">selector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">myapp:v1.2.3</span><span class="w">
</span></span></span></code></pre></div><p><strong>Where it works:</strong> one service, one environment, learning Kubernetes. The feedback loop is immediate — write YAML, see what happens.</p>
<p><strong>Where it breaks:</strong></p>
<ul>
<li><strong>No templating.</strong> Want to change the image tag across ten services? Ten files, ten edits, ten chances to get it wrong.</li>
<li><strong>Live state leaks in.</strong> If you export existing resources with <code>kubectl get -o yaml</code>, you get <code>resourceVersion</code>, <code>generation</code>, <code>creationTimestamp</code>, and <code>managedFields</code> in the output. Commit that to Git and you&rsquo;ve created a permanent source of conflicts — ArgoCD compares what&rsquo;s in Git against what&rsquo;s in the cluster, sees stale version counters, and the diff never clears.</li>
<li><strong>Copy-paste hell.</strong> A Deployment, a Service, an IngressRoute, a ServiceAccount, a NetworkPolicy — five files per app. Add a new app, copy five files, change the names, forget to update one. This is how environments drift apart silently.</li>
</ul>
<p>The fix for the live-state problem is: only commit desired state. Strip every field that Kubernetes manages internally back to its clean spec. It&rsquo;s tedious and easy to forget, which is exactly why people move on from raw manifests.</p>
<hr>
<h2 id="approach-2-kustomize">Approach 2: Kustomize</h2>
<p>Kustomize is built into <code>kubectl</code> (<code>kubectl apply -k</code>) and natively supported by ArgoCD. The idea: you have a <code>base/</code> with your raw manifests, and overlays that patch on top of them for different environments.</p>
<pre tabindex="0"><code>app/
├── base/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
└── overlays/
    ├── staging/
    │   ├── kustomization.yaml    # patches replicas to 1, image to :staging
    └── production/
        └── kustomization.yaml    # patches replicas to 3, image to :v1.2.3
</code></pre><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># overlays/production/kustomization.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="l">../../base</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">patches</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">patch</span><span class="p">:</span><span class="w"> </span><span class="p">|-</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">      - op: replace
</span></span></span><span class="line"><span class="cl"><span class="sd">        path: /spec/replicas
</span></span></span><span class="line"><span class="cl"><span class="sd">        value: 3</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">target</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Deployment</span><span class="w">
</span></span></span></code></pre></div><p><strong>Where it works:</strong> multi-environment setups where the difference between environments is mostly configuration values, not structure. Kustomize is good at this — you write the base once and patch only what differs.</p>
<p><strong>Where it breaks:</strong></p>
<ul>
<li><strong>No real parameterization.</strong> Kustomize patches are surgical edits, not templates. If your base structure needs to vary (different resource shapes per environment, conditional blocks), you&rsquo;re fighting the tool.</li>
<li><strong>Patching deep structures is ugly.</strong> JSON patches on nested YAML are verbose and hard to read. You end up writing more patch YAML than it would take to just copy the file.</li>
<li><strong>Still repetitive across apps.</strong> Each app still gets its own base directory. You&rsquo;re not abstracting the shared patterns across apps, only the differences between environments of the same app.</li>
</ul>
<p>Kustomize is a significant step up from raw manifests for multi-environment setups. For complex templating or platform-level abstractions, it runs out of power quickly.</p>
<hr>
<h2 id="approach-3-helm">Approach 3: Helm</h2>
<p>Helm adds real templating. Charts are parameterized bundles — templates with variables, conditionals, and loops — and values files supply the parameters.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># templates/deployment.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">apps/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Values.name }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Release.Namespace }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Values.replicas | default 1 }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Values.name }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Values.image.repository }}:{{ .Values.image.tag }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>{{- <span class="l">if .Values.resources }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">resources</span><span class="p">:</span><span class="w"> </span>{{<span class="w"> </span><span class="l">.Values.resources | toYaml | nindent 12 }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>{{- <span class="l">end }}</span><span class="w">
</span></span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># values-production.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">image</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">repository</span><span class="p">:</span><span class="w"> </span><span class="l">myorg/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">tag</span><span class="p">:</span><span class="w"> </span><span class="l">v1.2.3</span><span class="w">
</span></span></span></code></pre></div><p>Helm renders the templates at deploy time. What lands in the cluster is clean rendered YAML — no internal state, no conflicts.</p>
<p><strong>Where it works:</strong> almost everywhere. The Helm Hub has charts for most common software already. For custom apps, writing a chart once and parameterizing per-environment is straightforwardly better than copying YAML.</p>
<p><strong>Where it breaks:</strong></p>
<ul>
<li><strong>Chart authoring is verbose.</strong> Writing a Helm chart from scratch involves a lot of Go templating boilerplate. For a simple app, it can feel like more scaffolding than application.</li>
<li><strong>Debugging rendered output is annoying.</strong> <code>helm template</code> is your friend, but errors in templates produce unhelpful messages. The indentation rules (<code>nindent</code>, <code>indent</code>, <code>toYaml</code>) have sharp edges.</li>
<li><strong>Values files still pile up.</strong> If every app has its own values file and there&rsquo;s no shared structure between them, you&rsquo;re back to copy-paste but now in YAML-that-configures-YAML.</li>
</ul>
<p>Helm is the right tool for most Kubernetes deployments. The ecosystem support alone (upstream charts for Postgres, Redis, Vault, every CNCF project) makes it the pragmatic default.</p>
<hr>
<h2 id="approach-4-jsonnet--cue">Approach 4: Jsonnet / CUE</h2>
<p>For teams that need programmatic config generation — actual code, not templates — Jsonnet and CUE are the serious alternatives.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-jsonnet" data-lang="jsonnet"><span class="line"><span class="cl"><span class="c1">// deployment.jsonnet
</span></span></span><span class="line"><span class="cl"><span class="k">local</span><span class="w"> </span><span class="nv">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">import</span><span class="w"> </span><span class="s">&#34;k.libsonnet&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">local</span><span class="w"> </span><span class="nf">deployment</span><span class="p">(</span><span class="nv">name</span><span class="p">,</span><span class="w"> </span><span class="nv">image</span><span class="p">,</span><span class="w"> </span><span class="nv">replicas</span><span class="o">=</span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nv">k</span><span class="p">.</span><span class="nv">apps</span><span class="p">.</span><span class="nv">v1</span><span class="p">.</span><span class="nv">deployment</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="nv">name</span><span class="p">,</span><span class="w"> </span><span class="nv">replicas</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nv">k</span><span class="p">.</span><span class="nv">core</span><span class="p">.</span><span class="nv">v1</span><span class="p">.</span><span class="nv">container</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="nv">name</span><span class="p">,</span><span class="w"> </span><span class="nv">image</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="p">]);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nv">&#34;deployment.yaml&#34;</span><span class="p">:</span><span class="w"> </span><span class="nf">deployment</span><span class="p">(</span><span class="s">&#34;myapp&#34;</span><span class="p">,</span><span class="w"> </span><span class="s">&#34;myorg/myapp:v1.2.3&#34;</span><span class="p">,</span><span class="w"> </span><span class="nv">replicas</span><span class="o">=</span><span class="mf">3</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p><strong>Where it works:</strong> large platforms where configuration is genuinely complex — many environments, many apps, deep interdependencies. Jsonnet lets you write real functions, share libraries, compose abstractions properly.</p>
<p><strong>Where it breaks:</strong></p>
<ul>
<li><strong>Steep learning curve.</strong> Jsonnet is a full language. CUE even more so — it has types, schemas, and a constraint system that takes time to internalise.</li>
<li><strong>Small community.</strong> Excellent tooling, but you&rsquo;re solving problems that have fewer Stack Overflow answers.</li>
<li><strong>Overkill for most setups.</strong> If you&rsquo;re not managing hundreds of services across multiple clusters, Helm is simpler and has everything you need.</li>
</ul>
<p>Jsonnet is used seriously at Google-scale infrastructure teams and in some CNCF projects. For a homelab or a small-to-medium platform, it&rsquo;s the right answer to a question you probably aren&rsquo;t asking yet.</p>
<hr>
<h2 id="approach-5-app-of-apps-with-generated-application-crds">Approach 5: App-of-apps with generated Application CRDs</h2>
<p>This is the ArgoCD-native meta-layer. Instead of managing manifests, you manage <code>Application</code> resources — and potentially use a chart or tool to generate those too.</p>
<p>A naive version: commit a folder of <code>Application</code> YAML files to Git, one per service. ArgoCD watches the folder and deploys each app.</p>
<p>A more sophisticated version: one &ldquo;root app&rdquo; that points to a chart, which generates all the other <code>Application</code> resources dynamically from a single config file.</p>
<p><strong>Where it works:</strong> at the platform level, not the individual app level. App-of-apps is how you manage what ArgoCD manages, not how you write the service manifests themselves. Combined with Helm, it gives you centralized control over the entire cluster&rsquo;s structure.</p>
<p><strong>Where it breaks:</strong></p>
<ul>
<li><strong>Manual <code>Application</code> CRDs are painful.</strong> If you&rsquo;re maintaining a folder of hand-written <code>Application</code> YAML files — one per service — you&rsquo;ve traded manifest copy-paste for Application copy-paste. Each app needs its own CRD with its repo URL, path, sync policy, project reference.</li>
<li><strong>Sync ordering matters.</strong> The root app must exist before children can sync. Get the wave ordering wrong and apps try to deploy before their namespaces exist.</li>
</ul>
<hr>
<h2 id="how-this-homelab-compares">How this homelab compares</h2>
<p>My setup sits at the far end of approach 5, using Helm throughout.</p>
<p>There&rsquo;s a single <code>applications.yml</code> file that describes every service in the cluster. A root Helm chart reads it and generates all the ArgoCD <code>Application</code> and <code>AppProject</code> CRDs automatically. Adding a service means adding an entry to that file — not touching five different places across five different files.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># applications.yml — this is the entire service catalog</span><span class="w">
</span></span></span><span class="line"><span class="cl">- <span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-vaultwarden</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">networkPolicies</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">profile</span><span class="p">:</span><span class="w"> </span><span class="l">web-app</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">applications</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">applicationCode</span><span class="p">:</span><span class="w"> </span><span class="l">web-vaultwarden</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">helm-charts/extra-objects</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">autoSync</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p>That one entry generates: a Namespace, an ArgoCD AppProject, an ArgoCD Application, a set of Cilium NetworkPolicies (deny-all with ingress from Traefik and DNS/HTTPS egress), and a ServiceAccount. Nothing is written by hand.</p>
<p>The actual service manifests live in an <code>extra-objects</code> chart — a thin wrapper that renders raw YAML from values files. No templating in the service manifests themselves (they&rsquo;re simple enough not to need it), but the infrastructure scaffolding around each app is entirely generated.</p>
<p>The result: every service gets the same operational properties. Same GitOps workflow, same secret management, same network isolation, same TLS termination. The platform work was done once. Adding a new app is writing manifests for the app&rsquo;s specific behavior, not recreating the scaffolding.</p>
<hr>
<h2 id="the-honest-spectrum">The honest spectrum</h2>
<table>
	<thead>
			<tr>
					<th>Approach</th>
					<th>Templating</th>
					<th>Abstraction</th>
					<th>Ecosystem</th>
					<th>Complexity</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Raw manifests</td>
					<td>None</td>
					<td>None</td>
					<td>None</td>
					<td>Low</td>
			</tr>
			<tr>
					<td>Kustomize</td>
					<td>Patches only</td>
					<td>Overlays</td>
					<td>Medium</td>
					<td>Low-medium</td>
			</tr>
			<tr>
					<td>Helm</td>
					<td>Full</td>
					<td>Per-chart</td>
					<td>Large</td>
					<td>Medium</td>
			</tr>
			<tr>
					<td>Jsonnet/CUE</td>
					<td>Full + typed</td>
					<td>Libraries</td>
					<td>Small</td>
					<td>High</td>
			</tr>
			<tr>
					<td>App-of-apps</td>
					<td>Depends</td>
					<td>Platform-level</td>
					<td>ArgoCD-native</td>
					<td>High</td>
			</tr>
	</tbody>
</table>
<p>Most setups should start at Helm. Kustomize if you&rsquo;re multi-environment and comfortable with patching. App-of-apps when you&rsquo;re managing the platform layer, not individual services. Jsonnet/CUE when you know you&rsquo;ve outgrown Helm — which is a specific and relatively rare problem to have.</p>
<p>Raw manifests are fine for learning. They&rsquo;re the wrong answer for anything you intend to maintain.</p>
<hr>
<p><em>More on how the homelab is structured: <a href="/posts/homelab-gitops/">My Homelab Runs on GitOps</a>.</em></p>
]]></content:encoded></item><item><title>🤖 Local LLM Inference on Kubernetes, No GPU Required</title><link>https://blog.hippotion.com/posts/local-llm-k8s-no-gpu/</link><pubDate>Fri, 15 Aug 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/local-llm-k8s-no-gpu/</guid><description>A CPU-only self-hosted LLM stack running on k3s: llama.cpp as the inference server, Open WebUI as the chat interface, deployed as a single Git push.</description><content:encoded><![CDATA[<h2 id="the-gpu-assumption">The GPU assumption</h2>
<p>Most write-ups about self-hosting LLMs start with a GPU. A 3090, an A100, at minimum something with CUDA. The implication is that without one you&rsquo;re wasting your time — inference will be too slow to be useful.</p>
<p>That&rsquo;s not been my experience.</p>
<p>I&rsquo;ve been running a local LLM stack on a ThinkCentre mini PC (Intel N100, 16 GB RAM, no discrete GPU) for a few months. The model is Phi-3.5-mini-instruct, 3.8 billion parameters, 4-bit quantised. Response time is 3–6 tokens per second on CPU — slow enough that you notice it, fast enough that you use it. For the things I actually reach for a local model to do — rephrase something, summarise a document, explain a config option without sending it to an external API — the latency is fine.</p>
<p>The point isn&rsquo;t that CPU inference beats GPU inference. It&rsquo;s that &ldquo;good enough for personal use&rdquo; is a much lower bar than &ldquo;production LLM serving&rdquo;, and the hardware you already have probably clears it.</p>
<hr>
<h2 id="the-stack">The stack</h2>
<p>Two components:</p>
<p><strong>llama.cpp</strong> (<code>ghcr.io/ggml-org/llama.cpp:server</code>) — inference server that loads a GGUF model file and exposes an OpenAI-compatible REST API. No Python, no framework overhead, minimal memory footprint beyond the model itself.</p>
<p><strong>Open WebUI</strong> (<code>ghcr.io/open-webui/open-webui</code>) — a polished chat interface that speaks OpenAI API format. It points at the llama-server endpoint as its backend, handles conversation history, and supports RAG file uploads out of the box.</p>
<p>The architecture is simple on purpose:</p>
<pre tabindex="0"><code>Browser → Open WebUI (:80)
              │
              │  OpenAI-compatible API
              ▼
         llama-server (:8080)
              │
              │  reads GGUF model file
              ▼
         hostPath /srv/ai-models
</code></pre><p>Open WebUI doesn&rsquo;t know or care that the backend is llama.cpp running on CPU. It sees an OpenAI-compatible API. This matters: if I swap llama-server for Ollama, vLLM, or a cloud endpoint, the frontend doesn&rsquo;t change. The interface is the standard.</p>
<hr>
<h2 id="model-choice">Model choice</h2>
<p>GGUF models on Hugging Face are available at multiple quantisation levels. The trade-off is quality vs. RAM:</p>
<table>
	<thead>
			<tr>
					<th>Model</th>
					<th>Quant</th>
					<th>Size</th>
					<th>RAM at runtime</th>
					<th>Notes</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Llama-3.2-3B</td>
					<td>Q4_K_M</td>
					<td>~2 GB</td>
					<td>~3 GB</td>
					<td>Fastest, lowest quality</td>
			</tr>
			<tr>
					<td>Phi-3.5-mini</td>
					<td>Q4_K_M</td>
					<td>~2.4 GB</td>
					<td>~3–4 GB</td>
					<td>Good balance — what I use</td>
			</tr>
			<tr>
					<td>Mistral-7B-Instruct</td>
					<td>Q4_K_M</td>
					<td>~4.1 GB</td>
					<td>~5–6 GB</td>
					<td>Noticeably better, needs more RAM</td>
			</tr>
			<tr>
					<td>Llama-3.1-8B</td>
					<td>Q4_K_M</td>
					<td>~4.7 GB</td>
					<td>~6–8 GB</td>
					<td>High quality, stretches 16 GB with other workloads</td>
			</tr>
	</tbody>
</table>
<p>On 16 GB RAM with a full k3s stack running alongside (Argo CD, Traefik, Vault, Prometheus, etc.), Phi-3.5-mini leaves enough headroom that the cluster stays stable. Mistral-7B works too, but it&rsquo;s tighter.</p>
<p>Models live in <code>/srv/ai-models</code> on the node, mounted into the pod as a <code>hostPath</code> volume. Single-node homelab, so there&rsquo;s no scheduling concern. Download once with <code>wget</code>, done.</p>
<hr>
<h2 id="key-configuration-choices">Key configuration choices</h2>
<p><strong>Context size (<code>--ctx-size 4096</code>):</strong> How many tokens the model holds in its attention window. Larger context = more RAM + slower inference. 4096 is fine for conversational use. If you&rsquo;re summarising long documents, bump to 8192 and watch your RAM usage.</p>
<p><strong>Max output tokens (<code>--n-predict 1024</code>):</strong> Hard cap on response length. llama.cpp will stop there even mid-sentence. 1024 is usually enough; increase if you find it cutting off long explanations.</p>
<p><strong>Parallel slots (<code>--parallel 1</code>):</strong> How many concurrent inference requests the server handles. On CPU there&rsquo;s no benefit to more than 1 — each slot competes for the same cores. Leave it at 1.</p>
<p><strong>Memory limits:</strong> Set the container limit to roughly 2× the model&rsquo;s file size. A 2.4 GB GGUF typically uses 3–4 GB at runtime with context loaded.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">requests</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="l">500m</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l">1Gi</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">limits</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l">6Gi</span><span class="w">
</span></span></span></code></pre></div><p>No CPU limit. llama-server will use however many cores are available during inference — that&rsquo;s what makes it usable. A CPU limit would throttle inference to unusable speeds.</p>
<hr>
<h2 id="deployment-as-a-gitops-push">Deployment as a GitOps push</h2>
<p>The whole stack lives in one YAML values file, deployed through the <a href="https://github.com/janos-gyorgy/gitops-extra-objects-chart">extra-objects chart</a> that I use for raw manifests across the cluster. Argo CD watches the repo and reconciles automatically.</p>
<p>Nothing was <code>kubectl apply</code>-ed. The deployment happened by pushing to Git.</p>
<p>What that means in practice: when I bumped the Open WebUI image version, I changed one line, pushed, and Argo CD rolled the pod. No manual steps, no SSH, no <code>kubectl</code>. The same process I use for any other service in the cluster.</p>
<p>The namespace, network policies, service account, and RBAC all generate from a single entry in <code>applications.yml</code> — same as every other app. The AI inference stack isn&rsquo;t special from an operations perspective.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># applications.yml excerpt</span><span class="w">
</span></span></span><span class="line"><span class="cl">- <span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-ai-engine</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">applications</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">applicationCode</span><span class="p">:</span><span class="w"> </span><span class="l">web-ai-engine</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">helm-charts/extra-objects</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">autoSync</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><hr>
<h2 id="access-and-auth">Access and auth</h2>
<p>The service is exposed at <code>ai.hippotion.com</code> through the same dual-path ingress setup I use everywhere: Cloudflare Tunnel for external access, direct-to-server via Pi-hole DNS for local access, Traefik handling both with a wildcard Let&rsquo;s Encrypt cert. See <a href="/posts/homelab-dual-path-tls/">that post</a> for the full explanation.</p>
<p>Auth is handled by Traefik&rsquo;s ForwardAuth middleware pointing at an oauth2-proxy backed by GitLab. Open WebUI&rsquo;s own auth is disabled (<code>WEBUI_AUTH: false</code>) — the OAuth layer upstream handles it. One login covers every service in the cluster.</p>
<p>The <code>WEBUI_SECRET_KEY</code> (used to sign Open WebUI sessions) comes from Vault via External Secrets Operator. Nothing sensitive in Git.</p>
<hr>
<h2 id="what-the-day-to-day-is-actually-like">What the day-to-day is actually like</h2>
<p>Slow is the obvious caveat. Phi-3.5-mini at 3–6 tok/s means a paragraph-length response takes 20–30 seconds. For coding help where you&rsquo;re reading what came before while it generates, that&rsquo;s fine. For quick factual lookups, it&rsquo;s a little tedious.</p>
<p>The useful cases for a local model, for me:</p>
<ul>
<li><strong>Rephrasing or editing text</strong> — paste something, ask it to tighten it. No data leaves the house.</li>
<li><strong>Config explanation</strong> — paste a Kubernetes manifest or a Traefik config block, ask what it does. Again, stays local.</li>
<li><strong>Quick summaries</strong> — short documents, log snippets, error messages.</li>
<li><strong>Experimentation</strong> — trying prompting techniques, testing system prompts, benchmarking quantisation levels without API costs.</li>
</ul>
<p>For longer reasoning tasks I use a cloud model. The local stack is for the cases where I want the answer to stay on-premises, or where I&rsquo;m iterating and don&rsquo;t want to pay per token.</p>
<hr>
<h2 id="the-starting-point-if-you-want-to-try-it">The starting point if you want to try it</h2>
<p>The manifests are on GitHub: <a href="https://github.com/janos-gyorgy/homelab-ai-inference-starter">homelab-ai-inference-starter</a></p>
<p>It includes the llama-server and Open WebUI deployments, resource configuration, and ingress options for Traefik and nginx. The README walks through downloading a model, applying the manifests, and the configuration knobs worth knowing.</p>
<p>No GPU required. The ThinkCentre in the corner of my desk does the job.</p>
]]></content:encoded></item><item><title>🔄 Someone kubectl apply'd a Hotfix Directly. How Do You Detect and Prevent It?</title><link>https://blog.hippotion.com/posts/k8s-config-drift/</link><pubDate>Fri, 06 Jun 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-config-drift/</guid><description>Manual kubectl in production is the Kubernetes equivalent of SSH&amp;rsquo;ing into a server and editing files. It works until it doesn&amp;rsquo;t, and when it doesn&amp;rsquo;t, nobody knows why.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;How do you prevent configuration drift in a Kubernetes cluster?&rdquo;</em></p>
<p>Configuration drift: the cluster&rsquo;s actual state diverges from what&rsquo;s declared in your source of truth. Someone runs <code>kubectl edit deployment myapp</code> to bump a memory limit during an incident. Someone adds a debug sidecar directly. Someone applies a YAML file from their laptop that was never committed to Git. The fix works. It goes undocumented. Six months later, a new deployment overwrites it. The incident recurs.</p>
<p>There are two distinct problems here that require different solutions:</p>
<ol>
<li><strong>Detection and remediation</strong>: how do you notice drift and revert it?</li>
<li><strong>Prevention</strong>: how do you stop non-compliant resources from being created in the first place?</li>
</ol>
<hr>
<h2 id="detection-and-remediation-argo-cd-selfheal">Detection and remediation: Argo CD selfHeal</h2>
<p>If you&rsquo;re using GitOps with Argo CD, detection and remediation are handled for you:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">syncPolicy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">automated</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">prune</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">selfHeal</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p><code>selfHeal: true</code> means Argo CD continuously compares the cluster state to the Git repo and reverts any divergence. Someone runs <code>kubectl edit deployment myapp</code> and changes the replica count? Argo CD detects the diff on its next reconciliation cycle (default: every 3 minutes) and reverts it.</p>
<p><code>prune: true</code> means resources that exist in the cluster but not in Git are deleted. Someone <code>kubectl apply</code>&rsquo;d a debug pod directly? Gone on the next sync.</p>
<p>This is the audit trail story too. Every legitimate change is a Git commit with an author, a timestamp, and a commit message. Everything that isn&rsquo;t in Git doesn&rsquo;t survive past the next reconciliation. If you want to know what changed and when, <code>git log</code> is the answer.</p>
<hr>
<h2 id="the-gap-selfheal-doesnt-close">The gap selfHeal doesn&rsquo;t close</h2>
<p><code>selfHeal</code> reverts drift after the fact. There&rsquo;s a window — up to 3 minutes — where a drifted resource is serving traffic. For most changes, that&rsquo;s fine. For a bad resource (wrong RBAC, missing network policy, container running as root), 3 minutes is enough to be a problem.</p>
<p>The other gap: <code>selfHeal</code> doesn&rsquo;t tell you <em>who</em> made the change or generate an alert. It just silently fixes it. You need audit logging (<code>kube-apiserver --audit-log-path</code>) or an alerting rule on Argo CD&rsquo;s health events to know that drift happened.</p>
<hr>
<h2 id="prevention-kyverno">Prevention: Kyverno</h2>
<p>Kyverno is a policy engine that runs as a Kubernetes admission webhook. Every resource creation or modification goes through it before being persisted. If the resource violates a policy, Kyverno can reject it outright (enforce mode) or allow it with a warning (audit mode).</p>
<p>The policies are Kubernetes resources themselves — they live in Git, they&rsquo;re applied via GitOps, they&rsquo;re versioned. No separate policy language to learn.</p>
<p>A policy that requires readiness probes on all Deployments:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">require-readiness-probe</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">validationFailureAction</span><span class="p">:</span><span class="w"> </span><span class="l">Enforce</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">check-readiness-probe</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span>- <span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">validate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Deployments must define a readiness probe.&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">pattern</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                  </span>- <span class="nt">(name)</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;*&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                    </span><span class="nt">readinessProbe</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                      </span><span class="nt">(httpGet | tcpSocket | exec)</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;*&#34;</span><span class="w">
</span></span></span></code></pre></div><p>With this policy active: <code>kubectl apply -f deployment-without-probe.yaml</code> is rejected at the API server. The error message is the one you defined in <code>message</code>. The deployment never reaches etcd.</p>
<p>A policy that blocks containers running as root:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">disallow-root-containers</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">validationFailureAction</span><span class="p">:</span><span class="w"> </span><span class="l">Enforce</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">check-runAsNonRoot</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Deployment, StatefulSet, DaemonSet]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">validate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Containers must not run as root.&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">pattern</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                  </span>- <span class="nt">(name)</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;*&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                    </span><span class="nt">securityContext</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                      </span><span class="nt">runAsNonRoot</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p>A policy that enforces resource limits (common in multi-tenant clusters):</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">require-resource-limits</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">validationFailureAction</span><span class="p">:</span><span class="w"> </span><span class="l">Enforce</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">check-limits</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Deployment]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">validate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;CPU and memory limits are required.&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">pattern</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                  </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                      </span><span class="nt">limits</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                        </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;?*&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                        </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;?*&#34;</span><span class="w">
</span></span></span></code></pre></div><hr>
<h2 id="kyverno-can-also-mutate-and-generate">Kyverno can also mutate and generate</h2>
<p>Policies aren&rsquo;t only for validation. Kyverno can mutate incoming resources (add default labels, inject sidecars, set default resource requests) and generate new resources in response to events (create a NetworkPolicy whenever a new namespace is created).</p>
<p>Auto-add a standard label to every Deployment:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">add-labels</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">add-team-label</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Deployment]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">mutate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">patchStrategicMerge</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">managed-by</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno</span><span class="w">
</span></span></span></code></pre></div><p>Auto-create a default NetworkPolicy when a namespace is created:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">add-default-networkpolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">default-deny</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Namespace]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">generate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">NetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">default-deny-all</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;{{request.object.metadata.name}}&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">podSelector</span><span class="p">:</span><span class="w"> </span>{}<span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span>- <span class="l">Ingress</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span>- <span class="l">Egress</span><span class="w">
</span></span></span></code></pre></div><hr>
<h2 id="the-complete-drift-prevention-picture">The complete drift prevention picture</h2>
<pre tabindex="0"><code>Developer runs: kubectl apply -f bad-deployment.yaml
  → API server receives request
  → Kyverno admission webhook intercepts
  → Policy check: no readiness probe → Rejected
  → API server returns 403 with Kyverno&#39;s message
  → Resource never reaches etcd

Developer runs: kubectl edit deployment myapp (valid change, just not via Git)
  → Edit succeeds (no policy violation)
  → Argo CD reconciliation fires (within 3 minutes)
  → Diff detected: cluster state ≠ Git state
  → selfHeal: revert to Git state
  → If audit logging enabled: event recorded with username and timestamp
</code></pre><p>Git is the audit trail for what <em>should</em> be there. kube-apiserver audit logs are the trail for what <em>was attempted</em>. Kyverno is the enforcer at admission time. Argo CD is the continuous reconciler. Four layers, each with a different job.</p>
<hr>
<h2 id="what-interviewers-are-actually-testing">What interviewers are actually testing</h2>
<p>The follow-up is usually: <em>&ldquo;What&rsquo;s the difference between Kyverno and OPA Gatekeeper?&rdquo;</em></p>
<p>Both are admission webhook policy engines. The practical differences:</p>
<ul>
<li><strong>Kyverno</strong>: policies are k8s-native YAML, no separate language to learn. Generate and mutate policies built in. Easier to get started with.</li>
<li><strong>OPA Gatekeeper</strong>: policies are written in Rego, a purpose-built policy language that&rsquo;s more expressive but has a steeper learning curve. Better if you&rsquo;re already using OPA elsewhere (Terraform, microservice authorization).</li>
</ul>
<p>For a Kubernetes-only environment, Kyverno is the pragmatic choice. For a platform team that uses OPA across the stack, Gatekeeper gives you policy consistency.</p>
<p>The deeper follow-up: <em>&ldquo;How do you test policies before enforcing them?&rdquo;</em> Use <code>Audit</code> mode first (<code>validationFailureAction: Audit</code>). Violations are logged as PolicyReport objects but requests aren&rsquo;t rejected. Review the reports, fix the existing violations, then switch to <code>Enforce</code>. Never flip directly to Enforce in production — you&rsquo;ll break things that were already running.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Previously: <a href="/posts/k8s-network-isolation/">network isolation between services</a>.</em></p>
]]></content:encoded></item><item><title>🔑 Deploy to Kubernetes Without Storing Any Cluster Credentials in CI</title><link>https://blog.hippotion.com/posts/k8s-cicd-no-credentials/</link><pubDate>Fri, 09 May 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-cicd-no-credentials/</guid><description>A common interview question in 2026. If your answer is &amp;lsquo;kubeconfig in a CI secret&amp;rsquo;, you&amp;rsquo;re not wrong — but you&amp;rsquo;re also not getting the job.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;How would you design a CI/CD pipeline that deploys to Kubernetes without storing any cluster credentials anywhere?&rdquo;</em></p>
<p>The expected wrong answer: export your kubeconfig, base64-encode it, paste it into a CI secret named <code>KUBE_CONFIG</code>, and call it a day. This works. Most clusters that got hacked had this setup.</p>
<p>There are two correct answers in 2026, and which one you reach for depends on what you&rsquo;re actually deploying.</p>
<hr>
<h2 id="answer-1-gitops-the-one-your-interviewer-probably-wants">Answer 1: GitOps (the one your interviewer probably wants)</h2>
<p>In a GitOps setup, your CI pipeline never touches the cluster. It can&rsquo;t leak credentials it doesn&rsquo;t have.</p>
<p>The flow:</p>
<pre tabindex="0"><code>Developer pushes code
  → CI builds and tests
  → CI updates the image tag in the Git repo (a commit, not a kubectl command)
  → Argo CD detects the change
  → Argo CD applies it to the cluster
</code></pre><p>The cluster reaches out to Git. CI never reaches into the cluster. The only thing with cluster credentials is Argo CD itself — running inside the cluster, with no credentials to leak externally.</p>
<p>For self-hosted setups on Hetzner or Vultr, this is particularly clean because there&rsquo;s no cloud IAM to configure. You point Argo CD at your GitLab repo, tell it which branch to watch, and you&rsquo;re done.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># The Argo CD Application CRD — the only thing you need</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">argoproj.io/v1alpha1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Application</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">argocd</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">source</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">repoURL</span><span class="p">:</span><span class="w"> </span><span class="l">https://gitlab.example.com/myorg/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">targetRevision</span><span class="p">:</span><span class="w"> </span><span class="l">main</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">helm-charts/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">destination</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">server</span><span class="p">:</span><span class="w"> </span><span class="l">https://kubernetes.default.svc</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">syncPolicy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">automated</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">prune</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">selfHeal</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p><code>selfHeal: true</code> means if someone manually <code>kubectl apply</code>s something, Argo CD reverts it. The Git repo is the only source of truth.</p>
<p>The CI image-tag update step looks like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># .gitlab-ci.yml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">stage</span><span class="p">:</span><span class="w"> </span><span class="l">deploy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">script</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">      # Update the image tag in values.yaml and push
</span></span></span><span class="line"><span class="cl"><span class="sd">      sed -i &#34;s/tag: .*/tag: ${CI_COMMIT_SHORT_SHA}/&#34; values/myapp.yml
</span></span></span><span class="line"><span class="cl"><span class="sd">      git config user.email &#34;ci@example.com&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      git config user.name &#34;CI&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      git add values/myapp.yml
</span></span></span><span class="line"><span class="cl"><span class="sd">      git commit -m &#34;chore: bump myapp to ${CI_COMMIT_SHORT_SHA}&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      git push</span><span class="w">
</span></span></span></code></pre></div><p>CI needs write access to the Git repo — but that&rsquo;s a deploy key, not a cluster credential. If it leaks, someone can push code. You&rsquo;d rotate the deploy key and audit the commits. If a cluster credential leaks, someone owns your cluster.</p>
<hr>
<h2 id="answer-2-oidc-federation-for-when-you-genuinely-need-push-based">Answer 2: OIDC federation (for when you genuinely need push-based)</h2>
<p>Some operations don&rsquo;t fit the GitOps model. Infrastructure provisioning (<code>terraform apply</code>), one-off database migrations, or initial cluster bootstrapping — these need direct cluster access. The correct pattern here is OIDC federation.</p>
<p>The idea: your CI platform (GitLab, GitHub Actions) already issues JWT tokens to every job. These JWTs are signed by the CI platform and contain claims like which repo, which branch, which pipeline triggered the job. You configure your Kubernetes API server to trust those JWTs, and the CI job authenticates directly using the token it already has.</p>
<p>No stored credentials. Every job gets a fresh token. The token expires when the job ends.</p>
<p>For a self-hosted GitLab, configure your k8s API server to trust GitLab as an OIDC issuer:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># /etc/rancher/k3s/config.yaml (or kube-apiserver flags)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kube-apiserver-arg</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-issuer-url=https://gitlab.example.com&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-client-id=your_client_id&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-username-claim=sub&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-groups-claim=groups_direct&#34;</span><span class="w">
</span></span></span></code></pre></div><p>Then create a <code>ClusterRoleBinding</code> that maps a specific GitLab identity to a Kubernetes role:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">rbac.authorization.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterRoleBinding</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">gitlab-ci-deployer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">subjects</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">User</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;project_path:myorg/myapp:ref_type:branch:ref:main&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">apiGroup</span><span class="p">:</span><span class="w"> </span><span class="l">rbac.authorization.k8s.io</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">roleRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterRole</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">deploy-role</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">apiGroup</span><span class="p">:</span><span class="w"> </span><span class="l">rbac.authorization.k8s.io</span><span class="w">
</span></span></span></code></pre></div><p>The subject name is the <code>sub</code> claim from the GitLab JWT — it encodes the repo path and branch. Only jobs running on <code>main</code> in <code>myorg/myapp</code> get this binding. A job on a feature branch gets nothing.</p>
<p>In the CI job:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">stage</span><span class="p">:</span><span class="w"> </span><span class="l">deploy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">id_tokens</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">K8S_TOKEN</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">aud</span><span class="p">:</span><span class="w"> </span><span class="l">your_client_id</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">script</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl config set-credentials gitlab-ci \
</span></span></span><span class="line"><span class="cl"><span class="sd">        --token=&#34;${K8S_TOKEN}&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl config set-context deploy \
</span></span></span><span class="line"><span class="cl"><span class="sd">        --cluster=mycluster \
</span></span></span><span class="line"><span class="cl"><span class="sd">        --user=gitlab-ci
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl config use-context deploy
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl rollout restart deployment/myapp -n myapp</span><span class="w">
</span></span></span></code></pre></div><p>The token in <code>K8S_TOKEN</code> is injected by GitLab. It expires with the job. The API server validates the signature against GitLab&rsquo;s JWKS endpoint on every request.</p>
<hr>
<h2 id="which-one-to-use">Which one to use</h2>
<table>
	<thead>
			<tr>
					<th></th>
					<th>GitOps</th>
					<th>OIDC federation</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>CI needs cluster access</td>
					<td>No</td>
					<td>Yes (short-lived token)</td>
			</tr>
			<tr>
					<td>Audit trail</td>
					<td>Git history</td>
					<td>kube-apiserver audit log</td>
			</tr>
			<tr>
					<td>Revocability</td>
					<td>Revert the commit</td>
					<td>Token expires with the job</td>
			</tr>
			<tr>
					<td>Self-hosted setup effort</td>
					<td>Low</td>
					<td>Moderate (OIDC config)</td>
			</tr>
			<tr>
					<td>Works for infra provisioning</td>
					<td>Not really</td>
					<td>Yes</td>
			</tr>
	</tbody>
</table>
<p>For application deployments: GitOps. The cluster reconciles continuously, drift is impossible, and CI is completely decoupled from cluster state.</p>
<p>For infrastructure provisioning or one-off operations: OIDC federation. Short-lived credentials, branch-scoped permissions, nothing to rotate.</p>
<p>What you should never do: store a kubeconfig or a long-lived ServiceAccount token in CI secrets. Not because it&rsquo;s hard to make work — it&rsquo;s easy — but because the blast radius of a leak is unbounded, there&rsquo;s no audit trail, and there&rsquo;s no expiry. Everything that goes wrong with static secrets goes wrong eventually.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Next: <a href="/posts/k8s-gitops-secrets/">how to handle secrets in a GitOps repository</a>.</em></p>
]]></content:encoded></item><item><title>🤫 How Do You Handle Secrets in a GitOps Repository?</title><link>https://blog.hippotion.com/posts/k8s-gitops-secrets/</link><pubDate>Fri, 25 Apr 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-gitops-secrets/</guid><description>GitOps says Git is the source of truth. Secrets say don&amp;rsquo;t put them in Git. These two things appear to be in direct conflict. They&amp;rsquo;re not.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;You&rsquo;re using GitOps — everything goes through Git. How do you handle secrets?&rdquo;</em></p>
<p>The wrong answer: base64-encode them and commit them as Kubernetes <code>Secret</code> objects. Base64 is not encryption. Anyone with read access to the repo has your secrets. If the repo is public, everyone does.</p>
<p>The slightly better wrong answer: use a private repo and just not think about it. This works until a deploy key leaks, someone joins and then leaves the company, or you need to rotate one secret and have to find every place it&rsquo;s referenced.</p>
<p>There are three real answers. They make different tradeoffs.</p>
<hr>
<h2 id="the-constraint">The constraint</h2>
<p>The constraint is actually tighter than &ldquo;don&rsquo;t commit secrets&rdquo;. It&rsquo;s: <strong>your Git repo should be safe to make public at any point</strong>, and <strong>secrets must be rotatable without touching Git</strong>.</p>
<p>If rotating a password requires a new commit, someone has to be awake to merge and deploy it. That&rsquo;s not how you want to handle a 3am incident.</p>
<hr>
<h2 id="option-1-external-secrets-operator--vault">Option 1: External Secrets Operator + Vault</h2>
<p>This is the most robust pattern and the one worth knowing for interviews.</p>
<p>The idea: secrets live in a dedicated secret store (HashiCorp Vault, or a cloud equivalent). A Kubernetes operator called ESO watches <code>ExternalSecret</code> CRD objects in the cluster and syncs the referenced secret into a real Kubernetes <code>Secret</code>. The CRD is safe to commit — it says where the secret lives, not what it is.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># This lives in Git — safe to commit</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">external-secrets.io/v1beta1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ExternalSecret</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-db-credentials</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">refreshInterval</span><span class="p">:</span><span class="w"> </span><span class="l">1h</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">secretStoreRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">vault</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterSecretStore</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">target</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-db-credentials  </span><span class="w"> </span><span class="c"># the k8s Secret it creates</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">secretKey</span><span class="p">:</span><span class="w"> </span><span class="l">DB_PASSWORD</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">remoteRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">key</span><span class="p">:</span><span class="w"> </span><span class="l">secret/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">property</span><span class="p">:</span><span class="w"> </span><span class="l">db-password</span><span class="w">
</span></span></span></code></pre></div><p>Rotation: you update the secret in Vault. ESO syncs it to the cluster within <code>refreshInterval</code>. No Git commit, no deployment. The pod reads the updated <code>Secret</code> on the next restart (or immediately if you mount it as an env var and the app handles <code>SIGHUP</code>).</p>
<p>Audit trail: Vault logs every read and write. You know exactly which service account read which secret at what time.</p>
<p>The cost: you&rsquo;re running Vault. For a homelab or small team, that&rsquo;s an extra thing to operate. For production, it&rsquo;s worth it.</p>
<p>Self-hosted setup:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># ClusterSecretStore — connects ESO to your Vault instance</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">external-secrets.io/v1beta1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterSecretStore</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">vault</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">provider</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">vault</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">server</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;http://sys-vault.sys-vault.svc.cluster.local:8200&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;secret&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;v2&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">auth</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">kubernetes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">mountPath</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;kubernetes&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">role</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;external-secrets&#34;</span><span class="w">
</span></span></span></code></pre></div><p>ESO authenticates to Vault using the pod&rsquo;s Kubernetes ServiceAccount token. Vault validates it against the cluster&rsquo;s token review endpoint. No static credentials anywhere.</p>
<hr>
<h2 id="option-2-sealed-secrets">Option 2: Sealed Secrets</h2>
<p>Sealed Secrets uses asymmetric encryption. The cluster holds a private key. You use the <code>kubeseal</code> CLI to encrypt a secret with the cluster&rsquo;s public key. The resulting <code>SealedSecret</code> object is safe to commit — only the cluster can decrypt it.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># Encrypt a secret for committing to Git</span>
</span></span><span class="line"><span class="cl">kubectl create secret generic myapp-db <span class="se">\
</span></span></span><span class="line"><span class="cl">  --from-literal<span class="o">=</span><span class="nv">DB_PASSWORD</span><span class="o">=</span>hunter2 <span class="se">\
</span></span></span><span class="line"><span class="cl">  --dry-run<span class="o">=</span>client -o yaml <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="p">|</span> kubeseal <span class="se">\
</span></span></span><span class="line"><span class="cl">  &gt; sealed-secrets/myapp-db.yaml
</span></span></code></pre></div><p>The resulting YAML looks like:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">bitnami.com/v1alpha1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">SealedSecret</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-db</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">encryptedData</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">DB_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l">AgBy3i4OJSWK+PiTySYZZA9rO43cGDEq...</span><span class="w">
</span></span></span></code></pre></div><p>This gets committed. The Sealed Secrets controller in the cluster decrypts it and creates the real <code>Secret</code> automatically.</p>
<p>The tradeoff: rotation means re-sealing. You need the cluster&rsquo;s public key (which is public) and access to the plaintext secret. You commit a new <code>SealedSecret</code>. That&rsquo;s a Git commit, which means a review, a merge, and a deploy. For a 3am incident, that&rsquo;s a lot of friction.</p>
<p>Also: if the cluster&rsquo;s private key is lost, you can&rsquo;t decrypt any of your sealed secrets. Back up the private key.</p>
<p>Good fit for: small teams, homelab, situations where secrets change rarely and the GitOps review process is actually desirable.</p>
<hr>
<h2 id="option-3-sops">Option 3: SOPS</h2>
<p>SOPS (Secrets OPerationS) encrypts files at rest using age keys or cloud KMS. You commit encrypted files. CI decrypts them during deployment using a key it holds in memory (not stored in Git).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># Encrypt a file for Git</span>
</span></span><span class="line"><span class="cl">sops --encrypt --age age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8q <span class="se">\
</span></span></span><span class="line"><span class="cl">  secrets/myapp.yaml &gt; secrets/myapp.enc.yaml
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># In CI: decrypt to temp file, apply, delete</span>
</span></span><span class="line"><span class="cl">sops --decrypt secrets/myapp.enc.yaml <span class="p">|</span> kubectl apply -f -
</span></span></code></pre></div><p>The difference from Sealed Secrets: SOPS encrypts at the file level, not the k8s object level. You can use it outside of Kubernetes (application configs, Terraform variables). The key can live in the CI environment, a cloud KMS, or a personal age key.</p>
<p>The tradeoff: CI needs the decryption key, which puts you back in &ldquo;secret in CI&rdquo; territory — just for the encryption key rather than the actual secrets. If you use a cloud KMS, OIDC federation handles that (no stored key). If you use an age key, it lives in CI secrets.</p>
<p>Good fit for: teams already using Helm and Helm Secrets, polyglot environments where not everything is Kubernetes, small teams where Vault feels like overengineering.</p>
<hr>
<h2 id="comparison">Comparison</h2>
<table>
	<thead>
			<tr>
					<th></th>
					<th>ESO + Vault</th>
					<th>Sealed Secrets</th>
					<th>SOPS</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Rotation without Git commit</td>
					<td>Yes</td>
					<td>No</td>
					<td>Depends</td>
			</tr>
			<tr>
					<td>Audit trail</td>
					<td>Full (Vault)</td>
					<td>None</td>
					<td>Depends on KMS</td>
			</tr>
			<tr>
					<td>Complexity</td>
					<td>High</td>
					<td>Low</td>
					<td>Medium</td>
			</tr>
			<tr>
					<td>Works outside k8s</td>
					<td>With effort</td>
					<td>No</td>
					<td>Yes</td>
			</tr>
			<tr>
					<td>Recovery if key lost</td>
					<td>Vault backup</td>
					<td>Lose all secrets</td>
					<td>Key backup</td>
			</tr>
			<tr>
					<td>CI needs secret material</td>
					<td>No</td>
					<td>No</td>
					<td>Yes (decrypt key)</td>
			</tr>
	</tbody>
</table>
<hr>
<h2 id="what-interviewers-are-actually-testing">What interviewers are actually testing</h2>
<p>The interesting follow-up question is: <em>&ldquo;How do you rotate a secret without downtime?&rdquo;</em></p>
<p>The answer requires you to understand that pods mount <code>Secret</code> objects at startup. Updating the <code>Secret</code> in Kubernetes doesn&rsquo;t automatically restart the pod. Your options are:</p>
<ol>
<li>Mount the secret as a volume and have the app watch for file changes (good)</li>
<li>Restart the deployment after rotation (<code>kubectl rollout restart</code>, automatable)</li>
<li>Use a sidecar like Vault Agent Injector that handles refresh in-process (complex but zero-restart)</li>
</ol>
<p>The correct answer depends on the app. An API key that can be rotated gradually is different from a database password where the old one is invalidated immediately.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Previously: <a href="/posts/k8s-cicd-no-credentials/">deploying without cluster credentials</a>. Next: <a href="/posts/k8s-zero-downtime/">zero-downtime deployments</a>.</em></p>
]]></content:encoded></item><item><title>🏗️ My Homelab Runs on GitOps. Here's What That Actually Means.</title><link>https://blog.hippotion.com/posts/homelab-gitops/</link><pubDate>Fri, 28 Mar 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/homelab-gitops/</guid><description>I wanted to learn production-grade Kubernetes patterns without breaking production. One node, a full GitOps stack, and a hard rule: no manual kubectl after bootstrap.</description><content:encoded><![CDATA[<h2 id="why-this-exists">Why this exists</h2>
<p>I&rsquo;ve been working in DevOps and platform engineering long enough to know what I don&rsquo;t know. The patterns that separate robust infrastructure from &ldquo;it works on my machine&rdquo; infrastructure — GitOps, admission policies, network segmentation, secrets management — are easy to read about. They&rsquo;re harder to actually internalise without running them yourself.</p>
<p>So I built a homelab. An old ThinkCentre I had sitting around, k3s, and a rule I set for myself before writing a single line of configuration: <strong>GitLab is the only source of truth. No manual <code>kubectl</code> after bootstrap. All changes go through <code>git push</code>.</strong></p>
<p>That rule turned out to be more consequential than I expected.</p>
<hr>
<h2 id="the-stack">The stack</h2>
<p>The cluster runs about thirty services across two categories: infrastructure that makes the platform work, and applications that actually do things.</p>
<p>Infrastructure:</p>
<ul>
<li><strong>k3s</strong> — lightweight Kubernetes, single-node</li>
<li><strong>Cilium</strong> — CNI with NetworkPolicy support (Flannel, k3s&rsquo;s default, silently ignores NetworkPolicies)</li>
<li><strong>Argo CD</strong> — GitOps reconciler, watches the repo, applies changes</li>
<li><strong>Traefik</strong> — ingress controller, two entrypoints</li>
<li><strong>Cloudflare tunnel</strong> — external access without open ports</li>
<li><strong>cert-manager</strong> — wildcard TLS cert via Let&rsquo;s Encrypt DNS-01</li>
<li><strong>oauth2-proxy</strong> — GitLab SSO protecting everything by default</li>
<li><strong>Vault + External Secrets Operator</strong> — secrets management</li>
<li><strong>Pi-hole</strong> — local DNS for <code>*.hippotion.com</code></li>
</ul>
<p>Applications: a media server (Jellyfin, *arr stack), Immich for photos, Vaultwarden for passwords, Home Assistant, n8n for automation, a Hugo blog, Obsidian via browser-based KasmVNC, and a few custom-built things I&rsquo;ll get to below.</p>
<hr>
<h2 id="traffic-reaches-the-cluster-in-two-ways">Traffic reaches the cluster in two ways</h2>
<p>External traffic (from anywhere on the internet) goes through a Cloudflare tunnel. The cloudflared pod dials out to Cloudflare — no open ports on the server, no firewall rules, no exposed IP. Cloudflare terminates TLS and forwards plain HTTP to Traefik on port 7080. Cloudflare handles the certificate for external visitors.</p>
<p>Local traffic (home WiFi) goes through Pi-hole, which resolves <code>*.hippotion.com</code> to the server&rsquo;s LAN IP. Traefik receives HTTPS on port 443, served with a wildcard certificate that cert-manager issues from Let&rsquo;s Encrypt via DNS-01 challenge. Port 80 redirects to 443; the <code>cloudflare</code> entrypoint on 7080 does not redirect, because it&rsquo;s already receiving plain HTTP from cloudflared.</p>
<p>The result: the same IngressRoute handles both paths.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">entryPoints</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">cloudflare  </span><span class="w"> </span><span class="c"># plain HTTP from the cloudflared pod</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">websecure   </span><span class="w"> </span><span class="c"># local HTTPS with wildcard cert</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">routes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">match</span><span class="p">:</span><span class="w"> </span><span class="l">Host(`myapp.hippotion.com`)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Rule</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">middlewares</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">oauth-auth</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">sys-oauth2-gitlab</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span></code></pre></div><p>Every IngressRoute has both entrypoints. If you forget one, the service is unreachable from half your access paths. Learned that the first time I added an app and couldn&rsquo;t reach it from the phone.</p>
<hr>
<h2 id="one-file-generates-everything">One file generates everything</h2>
<p>The centrepiece of the setup is <code>applications.yml</code> — a single file that is the complete list of everything running in the cluster. Every entry generates a Namespace, an Argo CD AppProject, an Application, NetworkPolicies, and RBAC. Nothing is created anywhere else.</p>
<p>An entry looks like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl">- <span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-vaultwarden</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">networkPolicies</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">profile</span><span class="p">:</span><span class="w"> </span><span class="l">web-app</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">applications</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">applicationCode</span><span class="p">:</span><span class="w"> </span><span class="l">web-vaultwarden</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">helm-charts/extra-objects</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">autoSync</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p>Six lines. That deploys a namespace, an Argo CD app that watches <code>helm-charts/extra-objects/values-web-vaultwarden.yml</code>, a full set of Cilium NetworkPolicies based on the <code>web-app</code> profile (deny-all with ingress from Traefik and egress to external), and a ServiceAccount. Adding a new service to the cluster is this file plus a values file with the actual Kubernetes manifests.</p>
<p>The <code>profile: web-app</code> notation deserves a word. Raw NetworkPolicy YAML is repetitive and error-prone — every namespace needs a deny-all base plus specific allows. I template it. A Helm chart maps profile names to concrete policy sets. <code>web-app</code> means: deny all ingress except from the ingress namespace, deny all egress except DNS and external HTTPS. <code>web-app-internal</code> means the same but no external egress — suitable for services that only talk to other in-cluster services. <code>media-server</code> adds port 6881 for BitTorrent. The policies are generated; no one writes them by hand.</p>
<hr>
<h2 id="secrets-without-storing-them-in-git">Secrets without storing them in Git</h2>
<p>Kubernetes <code>Secret</code> objects are not secrets. They&rsquo;re base64-encoded blobs in etcd, and base64 is not encryption. Committing them to a Git repo — even a private one — is the wrong answer.</p>
<p>The setup here uses HashiCorp Vault as the actual secret store, with External Secrets Operator syncing Vault paths to Kubernetes Secrets. What lives in Git is an <code>ExternalSecret</code> CRD:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">external-secrets.io/v1beta1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ExternalSecret</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-credentials</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">secretStoreRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">vault</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterSecretStore</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">target</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-credentials</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">secretKey</span><span class="p">:</span><span class="w"> </span><span class="l">DB_PASSWORD</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">remoteRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">key</span><span class="p">:</span><span class="w"> </span><span class="l">secret/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">property</span><span class="p">:</span><span class="w"> </span><span class="l">db-password</span><span class="w">
</span></span></span></code></pre></div><p>This is safe to commit. It says where the secret lives, not what it is. Vault contains the actual value. ESO syncs it to the cluster and refreshes every hour. Rotation means updating the value in Vault — no Git commit, no deployment.</p>
<p>Vault runs in-cluster with a sidecar that auto-unseals on restart. Not production-grade (the unseal key is on the same PVC as Vault itself), but pragmatic for a homelab where availability matters more than a sophisticated key management ceremony.</p>
<hr>
<h2 id="three-things-i-built-that-were-worth-building">Three things I built that were worth building</h2>
<h3 id="local-ai-inference">Local AI inference</h3>
<p>The cluster runs a local LLM. The <code>web-ai-engine</code> namespace has Open WebUI fronting a llama-server serving Phi-3.5 Mini in GGUF format. The model file lives on the node&rsquo;s filesystem, mounted as a hostPath volume.</p>
<p><code>web-openclaw</code> is a personal AI assistant UI that can route requests to either external providers (via NVIDIA&rsquo;s API) or the local llama-server, depending on the task. The local model handles things that don&rsquo;t need to leave the house; the external API handles things that do. The network policy for <code>web-openclaw</code> explicitly allows egress to <code>web-ai-engine</code> and nowhere else for local inference.</p>
<p>Running a 3.8B parameter model on homelab hardware is genuinely useful and costs nothing per query. It&rsquo;s not GPT-4, but for summarisation, first drafts, and things you don&rsquo;t want sending to a third-party API, it&rsquo;s more than good enough.</p>
<h3 id="brew-buddy">Brew Buddy</h3>
<p>I make kombucha. I was tracking fermentation batches in a notes app and getting annoyed at not being able to see history across batches. So I built a tracker.</p>
<p>Brew Buddy is a React frontend and a Go API backed by PostgreSQL, all running in the <code>web-brew-buddy</code> namespace. The images are built locally and imported into the cluster&rsquo;s container runtime with <code>k3s ctr images import</code>. It&rsquo;s deployed like any other app — a values file, an entry in <code>applications.yml</code>, a Vault secret for the database password.</p>
<p>The point isn&rsquo;t the app. The point is that the platform handles a custom hobby project with the same operational properties as Vaultwarden or Immich. Same GitOps workflow, same secret management, same network isolation, same TLS termination. Adding an app to this cluster takes an afternoon of writing manifests and a few seconds of git push. The platform work was done once.</p>
<h3 id="qr-device-login">QR device login</h3>
<p>This one has <a href="/posts/qr-device-login/">its own post</a> because it took three days and four complete rewrites of oauth2-proxy&rsquo;s session format to get right.</p>
<p>The short version: the Homer dashboard on the living room TV needed a way to log in without typing credentials on a TV keyboard. I built a device-flow OAuth service — phone scans QR, phone authenticates with GitLab, TV session is created. End session from the phone kills the TV&rsquo;s session immediately by deleting the oauth2-proxy Redis ticket.</p>
<p>It&rsquo;s the most overengineered solution to a problem I have, and I don&rsquo;t regret a minute of it.</p>
<hr>
<h2 id="what-operating-this-way-actually-changes">What operating this way actually changes</h2>
<p>The practical difference of the no-manual-kubectl rule is larger than it sounds.</p>
<p><strong>The audit trail is automatic.</strong> Every change to the cluster is a git commit with an author, a timestamp, and a diff. There&rsquo;s no &ldquo;what did I change last Tuesday?&rdquo; — I know exactly what changed last Tuesday, and I can revert it with <code>git revert</code>. The Argo CD UI shows the diff between what&rsquo;s in Git and what&rsquo;s running. If there&rsquo;s a diff, something went wrong.</p>
<p><strong>New services are cheap to add.</strong> The platform does the repetitive work — namespace, RBAC, network policies, TLS termination, OAuth protection. Adding a new app is writing the manifests and updating <code>applications.yml</code>. The infrastructure concerns are handled.</p>
<p><strong>Recovery is straightforward.</strong> If I rebuild the node (which I&rsquo;ve done), I run two bootstrap scripts, apply one Argo CD manifest, and the cluster reconciles itself from Git over the next few minutes. The only things that require manual work are the secrets that can&rsquo;t live in Git — two OAuth credentials and the Cloudflare tunnel token, all recreated by <code>scripts/create-secrets.sh</code>.</p>
<p><strong>Experimentation is safe.</strong> I run things on <code>toggleable: true</code> apps that I&rsquo;m not sure I&rsquo;ll keep. Turning them off is removing the entry from <code>applications.yml</code> and pushing. Turning them back on is adding it back.</p>
<hr>
<h2 id="what-it-doesnt-solve">What it doesn&rsquo;t solve</h2>
<p>Bootstrap is manual. The first <code>kubectl apply -f argocd/root-app.yaml</code> happens outside of GitOps by definition. The three bootstrap secrets can&rsquo;t be in Git. This is unavoidable — you need to trust something before GitOps can take over, and that something is a short manual procedure.</p>
<p>Some things fight the model. k3s&rsquo;s built-in addon controller rewrites the metrics-server Deployment on every k3s restart, removing a patch needed for Cilium compatibility. The fix is a pod that watches for the revert and reapplies the patch. It works, but it&rsquo;s a workaround for a component I don&rsquo;t control.</p>
<p>Single-node means single point of failure. For a homelab, that&rsquo;s acceptable. For anything important, it&rsquo;s not.</p>
<hr>
<h2 id="the-honest-summary">The honest summary</h2>
<p>I set out to learn production-grade Kubernetes patterns, and I did. The GitOps constraint turned out to be the best engineering decision in the project — not because it made things easier in the short term (it didn&rsquo;t), but because it forced every change through a path that is auditable, reversible, and consistent.</p>
<p>The cluster is a single ThinkCentre running about thirty services, secured by Cilium network policies, authenticated via GitLab SSO, with secrets managed by Vault and all configuration in a Git repo that I could hand to someone tomorrow and they&rsquo;d understand what&rsquo;s running and why.</p>
<p>That&rsquo;s the goal. For a homelab, I&rsquo;ll call it achieved.</p>
]]></content:encoded></item></channel></rss>