<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>N8n on hippotion</title><link>https://blog.hippotion.com/tags/n8n/</link><description>Recent content in N8n on hippotion</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 15 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.hippotion.com/tags/n8n/index.xml" rel="self" type="application/rss+xml"/><item><title>VoteWatch: How Your Representatives Voted — and Whether You'd Agree</title><link>https://blog.hippotion.com/posts/votewatch/</link><pubDate>Fri, 15 May 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/votewatch/</guid><description>Parliamentary roll-call votes are public, machine-readable, and almost completely unread. I built a thing that scrapes them, distills each decision into one plain-language question, shows which party voted which way, and lets you register whether you agree — then puts your answer next to how parliament actually voted. The rule that keeps it honest: the AI writes the summary, but it never decides a fact.</description><content:encoded><![CDATA[<h2 id="open-data-nobody-opens">Open data nobody opens</h2>
<p>Every vote in the European Parliament and the Slovak National Council is
public. The EU even ships it as a clean API. And almost nobody reads it,
because the raw record is unreadable: <em>&ldquo;Návrh poslanca… ktorým sa dopĺňa zákon
č. 581/2004 Z. z. … (tlač 1259) — tretie čítanie, hlasovanie o návrhu zákona
ako o celku.&rdquo;</em> Multiply that by a few hundred votes a sitting. Transparency
that no human can parse is transparency on paper only.</p>
<p>So I built <strong>VoteWatch</strong> — a small site on my homelab that turns the record
into something a citizen can actually use: <em>what was decided, who voted, and
do you agree?</em></p>
<figure>
    <img loading="lazy" src="sk-overview.png"
         alt="VoteWatch SK in plain-language mode"/> <figcaption>
            <p>VoteWatch SK: each decision summarised in plain language, which parties voted how, and a Yes/No question whose live citizen tally sits next to how parliament actually voted — labelled <em>agree</em> or <em>gap</em>.</p>
        </figcaption>
</figure>

<h2 id="two-halves-one-lopsided">Two halves, one lopsided</h2>
<p>The EU half was easy. <a href="https://howtheyvote.eu">HowTheyVote.eu</a> already did the
hard work and publishes roll-call votes as a clean, open-licensed API. You
consume it; you don&rsquo;t scrape it.</p>
<p>The Slovak half is where the real work lives — and the real value. <code>nrsr.sk</code>
has <strong>no API</strong>. The HTML is the contract: a results listing, and per-vote
pages where each MP appears next to a one-letter code (<code>[Z]</code> za, <code>[P]</code> proti,
<code>[?]</code> zdržal sa). So the national half is a genuine scraper — the unglamorous
kind that nobody maintains, which is exactly why a gap exists to fill. The
unglamorous part <em>is</em> the moat.</p>
<h2 id="from-ten-votes-to-one-question">From ten votes to one question</h2>
<p>A single bill generates a pile of procedural roll-calls — shorten the debate,
move to third reading, amendment block A, amendment block B, the bill as a
whole. Ten rows that are really one decision. Nobody wants ten rows.</p>
<p>So the pipeline groups votes by bill, then asks an LLM (llama-3.3-70b on
NVIDIA NIM) to do exactly one job: turn the bureaucratic titles into a plain
headline, two sentences of summary, and <strong>one neutral Yes/No question</strong> a
person can actually answer. Seven votes on the health-insurer bill collapse
into: <em>&ldquo;Changes to the health-insurance law&rdquo;</em> → <em>&ldquo;Do you agree with the
health-insurance bill?&rdquo;</em></p>
<h2 id="the-rule-that-keeps-it-honest">The rule that keeps it honest</h2>
<p>Here&rsquo;s the line I won&rsquo;t cross, and it&rsquo;s the whole reason I trust the result:
<strong>the AI writes the prose, but it never decides a fact.</strong></p>
<ul>
<li>Which votes belong to one bill? Deterministic — parsed from the bill number.</li>
<li>Did it pass? Deterministic — read from the result row.</li>
<li>Which parties voted for, against, abstained? Deterministic — tallied from
the per-MP record, shown as <em>Za: SMER-SD, HLAS-SD, SNS · Zdržali sa: PS, KDH,
SaS</em>.</li>
</ul>
<p>The model only touches language: the headline, the summary, the question. If
it hallucinates, you get an awkward sentence — never a wrong vote count. And
if the model fails entirely, the card falls back to the raw title. The facts
come from the record; the model just makes the record legible. For civic data,
that separation isn&rsquo;t a nice-to-have — it&rsquo;s the difference between a tool and a
liability. (Every card says so out loud: <em>summaries are AI-generated; the raw
record prevails.</em>)</p>
<h2 id="the-part-that-closes-the-loop">The part that closes the loop</h2>
<p>Showing people how their representatives voted is only half a feedback loop.
The other half is letting them answer.</p>
<p>Each decision carries its one distilled question and two buttons — <strong>Áno / Nie</strong>.
You vote, and the site shows the citizen tally <em>next to</em> how parliament
actually decided, with the honest verdict on top: <em>&quot;✓ Citizens and Parliament
agree&quot;</em> or <em>&quot;⚖ Gap between citizens and Parliament.&quot;</em> That gap is the entire
point. It&rsquo;s the thesis behind a side project of mine called
<a href="https://veracracy.hippotion.com">veracracy</a> — governance measured against
verified knowledge and the actual will of the governed — made concrete enough
to click.</p>
<figure>
    <img loading="lazy" src="eu-overview.png"
         alt="VoteWatch EU overview mode"/> <figcaption>
            <p>The same loop on the European Parliament — dossiers consolidated, political-group stances (EPP, S&amp;D, PfE…), and the citizen poll under each topic.</p>
        </figcaption>
</figure>

<p>The backend is deliberately boring. The site is static (git-synced nginx,
same as this blog). Votes can&rsquo;t POST to a static page, so they go to a public
<a href="https://n8n.hippotion.com">n8n</a> webhook that records to a data table and
returns live tallies — no new service, no database, just the automation box I
already run. Vote keys are namespaced so EU and Slovak polls share one store
without colliding.</p>
<h2 id="the-honest-caveat">The honest caveat</h2>
<p>Dedup is browser-local. It stops casual double-voting, but behind a Cloudflare
tunnel every request shares one IP, so this is an <strong>indicative signal, not a
secured ballot</strong>. That&rsquo;s the right altitude for &ldquo;let people express an
opinion.&rdquo; The day it needs to mean more than that, it needs real identity
first — and I&rsquo;d rather ship the honest version than fake the robust one.</p>
<p>It&rsquo;s live at <a href="https://votewatch.hippotion.com">votewatch.hippotion.com</a> — the
EU parliament and the Slovak NR SR, every MEP and every poslanec, in plain
language, with a button that asks the only question that matters after a vote:
<strong>would you have voted the same way?</strong></p>
<p>A neutral record — what was decided and who decided it — not a villain list.
Data © <a href="https://howtheyvote.eu">HowTheyVote.eu</a> (ODbL) and <code>nrsr.sk</code>.</p>
]]></content:encoded></item><item><title>Mind the gap: I pointed monitoring at my own skill set</title><link>https://blog.hippotion.com/posts/mind-the-gap-skill-radar/</link><pubDate>Fri, 27 Mar 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/mind-the-gap-skill-radar/</guid><description>A rejection isn&amp;rsquo;t actionable data. So an n8n workflow now extracts skill demand from live job listings, diffs it against what I can prove, and renders the gap as a dashboard — deployed like everything else here: via git push.</description><content:encoded><![CDATA[<p>A while back I applied for a senior platform role at n8n and didn&rsquo;t land it. Fair enough — but
&ldquo;fair enough&rdquo; isn&rsquo;t actionable. Rejections come with no logs, no metrics, no trace. For someone
who runs thirty-odd services with full observability, having <em>vibes</em> as the only instrumentation
on my own career felt architecturally embarrassing.</p>
<p>So I built <strong>mind-the-gap</strong>: a pipeline that measures what the market demands, diffs it against
what I can prove, and renders the gap as a private dashboard on my cluster. The job hunt is now a
monitored system. This post is about the non-obvious decisions.</p>
<h2 id="demand-an-llm-reads-job-listings-so-i-dont-have-to">Demand: an LLM reads job listings so I don&rsquo;t have to</h2>
<p>I already had <a href="/posts/ats-job-poller/">a job poller</a> — an n8n workflow that polls the public ATS
APIs (Greenhouse / Lever / Ashby) of ~33 companies plus a broad remote-jobs feed every six hours.
A sibling workflow now re-fetches the same boards and, for every listing that passes the
role+location gate, asks a small hosted LLM (Llama-3.1-8B) for a structured extraction:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span><span class="nt">&#34;seniority&#34;</span><span class="p">:</span> <span class="s2">&#34;senior&#34;</span><span class="p">,</span> <span class="nt">&#34;skills&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;kubernetes&#34;</span><span class="p">,</span> <span class="nt">&#34;importance&#34;</span><span class="p">:</span> <span class="s2">&#34;must&#34;</span><span class="p">},</span> <span class="err">...</span><span class="p">]}</span>
</span></span></code></pre></div><p>One row per <em>(job, skill)</em> lands in an n8n Data Table. Decisions that mattered:</p>
<ul>
<li><strong>One LLM call per job, not one batch.</strong> Free-tier inference times out on batches; per-job calls
are slower but fail independently. A lesson the poller already paid for.</li>
<li><strong>Insert doubles as the processed-marker.</strong> A job whose extraction fails to parse produces no
rows — so it&rsquo;s retried next run, for free. No status column, no second table.</li>
<li><strong>Canonicalization in code, not in the prompt.</strong> The model says &ldquo;K8s&rdquo;, &ldquo;k3s&rdquo;, &ldquo;EKS&rdquo; on
different days regardless of instructions. A dumb alias map (<code>k8s→kubernetes</code>, <code>eks→aws</code>)
beats prompt engineering for consistency.</li>
<li><strong>8B is good enough — with a guard.</strong> It occasionally echoed the seniority enum back literally
(<code>&quot;junior|mid|senior|staff|lead|unspecified&quot;</code>). The fix is one line of validation, not a bigger
model.</li>
</ul>
<h2 id="supply-no-artifact-no-credit">Supply: no artifact, no credit</h2>
<p>The other side of the diff is a skills registry — markdown in my knowledge vault, with a
machine-parseable YAML block. Every skill has a state, and the rule that keeps the whole thing
honest is brutal: <strong>a skill counts as <code>proven</code> only if an artifact exists</strong> — a public repo, a
blog post, documented production experience. Otherwise it&rsquo;s <code>claimed</code>, and claimed earns half
credit.</p>
<p>That rule immediately produced the most useful insight of the project: <strong>&ldquo;invisible skill&rdquo; is a
real category.</strong> Python turned out to be the market&rsquo;s #5 ask. I use it constantly — and could
point to nothing public that shows it. The cheapest score increase isn&rsquo;t learning something new;
it&rsquo;s a weekend making an existing skill visible. No gut-feeling gap analysis would have ranked
&ldquo;write about what you already do&rdquo; above &ldquo;learn the shiny thing.&rdquo;</p>
<h2 id="the-score-distinct-companies-not-mentions">The score: distinct companies, not mentions</h2>
<p>First naive aggregation: Canonical&rsquo;s listings mention Ubuntu <em>nine times, all marked must-have</em> —
suddenly Ubuntu looks like the hottest skill in Europe. Employer skew is the noise floor of small
samples. The fix: demand weight = <strong>distinct companies naming the skill</strong>, not total mentions.
One enthusiastic employer can&rsquo;t move the radar.</p>
<p>Two more scoring rules I&rsquo;d defend in review:</p>
<ul>
<li>Skills named by fewer than two companies don&rsquo;t count at all — single-listing noise stays out.</li>
<li>Demand the registry hasn&rsquo;t classified yet shows up as &ldquo;unreviewed&rdquo; and <strong>counts fully against
the score</strong>. An unreviewed market signal is a gap until proven otherwise; the dashboard nags me
to triage it.</li>
</ul>
<h2 id="rendering-the-page-is-a-git-commit">Rendering: the page is a git commit</h2>
<p>The dashboard is a single static HTML file, and the pipeline that produces it never touches the
cluster. <code>render.js</code> lives in this repo as the single source of truth; a nightly n8n workflow
fetches it raw from GitLab, <code>eval()</code>s it against the Data Table rows and the registry, and — only
if the result differs from what&rsquo;s committed (timestamps stripped, or every night is a &ldquo;change&rdquo;) —
PUTs the new <code>index.html</code> back via the GitLab API.</p>
<p>Serving is the same pattern as this blog: nginx plus a git-pull sidecar, deployed by Argo CD,
behind the cluster&rsquo;s OAuth middleware. The renderer has no kubeconfig, no SSH, no cluster access
of any kind. <strong>GitLab stays the only source of truth — even for a page that rewrites itself
nightly.</strong> If the workflow goes rogue, the worst it can do is a reviewable commit.</p>
<h2 id="day-one-verdict">Day-one verdict</h2>
<p>First run: 2,297 postings fetched, 25 in scope, 257 skill rows. Coverage score: <strong>63%</strong>.
Kubernetes and AWS tied at the top of demand — which means the AWS gap-closing project already in
flight stopped being a hunch and became the measured top of the market. Go is the only top-ten
demand with zero supply. The dashboard doesn&rsquo;t get anyone a job; it just makes sure every learning
Saturday is pointed where the data says, not where the hype does.</p>
<p>The job board rejected me. The data didn&rsquo;t.</p>
<hr>
<p><em>Workflows, render.js, and setup: <a href="https://github.com/janos-gyorgy/mind-the-gap">github.com/janos-gyorgy/mind-the-gap</a>.</em></p>
]]></content:encoded></item><item><title>🎯 Know the Market Without Job-Hunting: An LLM-Scored Job Poller in n8n</title><link>https://blog.hippotion.com/posts/ats-job-poller/</link><pubDate>Fri, 13 Feb 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/ats-job-poller/</guid><description>You don&amp;rsquo;t have to be job-hunting to want to know your market — what&amp;rsquo;s out there, what it pays, where you&amp;rsquo;d fit. So I built an n8n workflow: it polls the public ATS APIs (Greenhouse/Lever/Ashby) plus a broad remote-jobs feed, filters for remote-EU infra roles, scores each posting against my CV with an LLM, and emails me only the 80%+ matches. No database, no scraping.</description><content:encoded><![CDATA[<p>You don&rsquo;t have to be about to change jobs to want to know the landscape. What&rsquo;s being built, what it
pays, where you&rsquo;d actually fit — staying current on the market (and your own worth) is just good
professional hygiene. The trouble is that <em>checking</em> is tedious, so most of us don&rsquo;t, until we&rsquo;re
already job-hunting and starting cold.</p>
<p>So I automated mine. An <a href="https://n8n.io">n8n</a> workflow on my homelab polls job boards every six hours,
scores each new posting against my profile with an LLM, and emails me only the strong matches — the
ones scoring 80%+. When it&rsquo;s quiet, it&rsquo;s silent. When something genuinely fits, I know the same day.
Here&rsquo;s what I learned building it. Repo at the bottom.</p>
<h2 id="three-apis-cover-most-of-the-market">Three APIs cover most of the market</h2>
<p>Company career pages look bespoke, but underneath, the vast majority run on one of three ATS — and
all three hand you the jobs as unauthenticated JSON:</p>
<ul>
<li><strong>Greenhouse</strong> — <code>boards-api.greenhouse.io/v1/boards/{token}/jobs?content=true</code></li>
<li><strong>Lever</strong> — <code>api.lever.co/v0/postings/{token}?mode=json</code></li>
<li><strong>Ashby</strong> — <code>api.ashbyhq.com/posting-api/job-board/{token}?includeCompensation=true</code></li>
</ul>
<p>No scraping, no headless browser. You poll the API the page itself calls, normalize the three
shapes into one <code>{ company, title, location, remote, url, posted_at, description, external_id }</code>, and
you&rsquo;re done with the hard part.</p>
<h2 id="resolve-the-token-is-half-the-battle">&ldquo;Resolve the token&rdquo; is half the battle</h2>
<p>The naive assumption — <em>the token is the company name, and everyone&rsquo;s on one of the three</em> — is half
right. When I probed my initial wishlist, <strong>roughly half 404&rsquo;d everywhere</strong>: HashiCorp (now under
IBM → Workday), SUSE (SuccessFactors), Aiven (Teamtailor), Hugging Face. They&rsquo;re on a fourth or fifth
system entirely. The honest move was to ship the ~33 that actually resolve and leave the rest as
disabled config stubs. Verify before you trust a slug.</p>
<h2 id="dedup-without-a-database">Dedup without a database</h2>
<p>I didn&rsquo;t want to stand up Postgres just to remember which jobs I&rsquo;d already seen. n8n&rsquo;s <strong>Data Tables</strong>
handle it natively: a <code>seen_jobs</code> table, an <code>external_id</code> namespaced <code>{ats}:{company}:{id}</code>, and the
<code>rowNotExists</code> operation drops anything already recorded. State lives inside n8n, backed up with it.
Zero extra infrastructure.</p>
<p>The ordering matters: <strong>notify first, mark seen second.</strong> The insert only happens after the email
sends, so a failed send retries next run instead of silently swallowing a posting.</p>
<h2 id="the-location-filter-is-a-trap">The location filter is a trap</h2>
<p>My first version kept everything that wasn&rsquo;t explicitly US-based. The inbox filled with <em>&ldquo;Senior
Platform Engineer — Spain (Remote)&rdquo;</em> and <em>&quot;… — United Kingdom (Remote)&quot;</em>. Those aren&rsquo;t remote-for-me
— they&rsquo;re remote <em>if you live in Spain</em>. Useless from where I sit.</p>
<p>The fix was to invert the logic. Keep only three things:</p>
<ul>
<li>globally-remote / worldwide / anywhere,</li>
<li>pan-EU (EMEA / Europe / EU / EEA),</li>
<li>my own country.</li>
</ul>
<p>…and <strong>drop single-country remote</strong>, even EU ones. Region and home matches win over the country
deny-list, ambiguous locations are kept (a missed match is worse than one extra line to skim). That
one change cut the noise more than anything else.</p>
<h2 id="let-an-llm-read-the-actual-job">Let an LLM read the actual job</h2>
<p>Keyword + location filtering gets you a candidate list, but it can&rsquo;t tell a &ldquo;Platform Engineer&rdquo; who
herds Kubernetes from a &ldquo;Platform Engineer&rdquo; who owns a Figma design system. The job description can.</p>
<p>So the last step scores each new posting against my CV. My first version batched all of them into
<strong>one</strong> big LLM call — which promptly timed out on the free tier. The fix was the opposite: <strong>one
small call per job</strong>, which also means a single slow or rate-limited job never sinks the batch. Each
call asks a <a href="https://build.nvidia.com">NVIDIA NIM</a> model (Llama 3.1 8B, OpenAI-compatible) for one
number and a reason:</p>
<blockquote>
<p>Score this job 0–100 for fit against my profile. Return <code>{score, reason}</code>.</p>
</blockquote>
<p>That score is what lets me <strong>widen the net instead of narrowing it.</strong> On top of the curated company
list I pull a broad remote-jobs feed (every company, all categories); the cheap keyword + location
filters do the first pass, then I <strong>only email the roles scoring 80%+.</strong> Casting wide is fine when a
model is the bar at the door. A line ends up looking like:</p>
<blockquote>
<p><strong>92%</strong> — <em>Grafana Labs</em> — Senior Platform Engineer (Remote, EMEA) — <em>strong k8s/GitOps overlap</em> — link</p>
</blockquote>
<p>Scoring is fail-safe: if a call hiccups, that job is just skipped, and every posting gets marked seen
either way — so nothing re-scores forever, and a rare bad run never floods or stalls the inbox.</p>
<h2 id="the-unglamorous-bits-that-make-it-trustworthy">The unglamorous bits that make it trustworthy</h2>
<ul>
<li><strong>One bad source can&rsquo;t kill the run</strong> — every fetch is wrapped; failures become a <code>⚠️ N sources failing</code> footer so a company quietly changing ATS is visible, not invisible.</li>
<li><strong>A prime run</strong> seeds the table silently the first time, so I&rsquo;m not buried under every currently-open
role on day one.</li>
<li><strong>Everything tunable lives in one Config node</strong> — companies, keywords, location lists, the profile,
the model — so adding a company is a one-line edit, not a graph safari.</li>
</ul>
<h2 id="takeaways">Takeaways</h2>
<ul>
<li>The &ldquo;scrape job boards&rdquo; problem mostly isn&rsquo;t a scraping problem — it&rsquo;s three public APIs and a
normalizer.</li>
<li>For personal automation, reach for the boring-but-correct primitive: native dedup state beats a
database you have to operate.</li>
<li>An LLM works best here as the <strong>bar at the door</strong>: cheap deterministic filters keep the candidate
set (and the cost) small, then the model gates on real fit — which is what lets you cast a wide net
without drowning in it.</li>
</ul>
<p>Workflow JSON, the full node-by-node breakdown, and setup notes:
<strong><a href="https://github.com/janos-gyorgy/ats-job-poller">github.com/janos-gyorgy/ats-job-poller</a></strong>.</p>
]]></content:encoded></item><item><title>🧱 How Do You Isolate Two n8n Tenants on Kubernetes — and Prove Each Wall Holds?</title><link>https://blog.hippotion.com/posts/n8n-multitenant/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/n8n-multitenant/</guid><description>Multi-tenant isolation is easy to assert and hard to verify. Three walls — network, secret, resource — and the actual 403s, timeouts, and admission rejections that prove each one holds.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;You&rsquo;re running n8n for multiple customers on the same Kubernetes cluster. What stops Customer A from reading Customer B&rsquo;s API keys, calling Customer B&rsquo;s services, or starving Customer B&rsquo;s workflows by burning the whole node?&rdquo;</em></p>
<p>Three different walls, three different mechanisms. Most articles I&rsquo;ve read on K8s multi-tenancy list the primitives — namespaces, NetworkPolicies, ResourceQuotas, RBAC — without showing what each one actually catches when you try to cross it. This post does the second part. The receipts are the point.</p>
<p>The setup: two namespaces, <code>web-tenant-acme</code> and <code>web-tenant-globex</code>, each running their own n8n instance on the same node. The only thing keeping them apart is the walls we build around each namespace.</p>
<hr>
<h2 id="the-mental-model-subtractive-isolation">The mental model: subtractive isolation</h2>
<p>Kubernetes is a flat network with shared everything by default. You don&rsquo;t <em>add</em> isolation by writing allow rules. You <em>subtract</em> trust by adding default-deny rules, and then carefully allow back only the connections each tenant actually needs.</p>
<p>A tenant doesn&rsquo;t have access to another tenant because there is <em>no rule allowing it</em>. The absence of an allow rule is the wall.</p>
<p>Three of these absences make up the picture:</p>
<table>
	<thead>
			<tr>
					<th>Wall</th>
					<th>Primitive</th>
					<th>Failure mode when crossed</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Network</td>
					<td>Cilium NetworkPolicy, default-deny egress</td>
					<td>Connection times out (silent drop)</td>
			</tr>
			<tr>
					<td>Secret</td>
					<td>Vault Kubernetes-auth, per-tenant policy</td>
					<td><code>403 permission denied</code> from Vault itself</td>
			</tr>
			<tr>
					<td>Resource</td>
					<td>ResourceQuota + LimitRange</td>
					<td>Pod rejected at admission time</td>
			</tr>
	</tbody>
</table>
<p>Different layers, different error messages. That&rsquo;s how you can tell what stopped you.</p>
<hr>
<h2 id="wall-1--network-cilium-networkpolicy">Wall 1 — Network: Cilium NetworkPolicy</h2>
<p>n8n in <code>web-tenant-acme</code> can reach <code>whoami.web-tenant-acme.svc.cluster.local</code> (its own service in its own namespace) but not <code>whoami.web-tenant-globex.svc.cluster.local</code>. The same DNS shape, the same cluster, the same node. One succeeds, the other hangs.</p>
<p>The primitive is a default-deny egress policy applied to every pod in the namespace, with two narrow exceptions: intra-namespace traffic (so n8n can still reach its own service) and DNS to <code>kube-system</code> (otherwise nothing resolves anything).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Effective policy on every pod in web-tenant-acme:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">podSelector</span><span class="p">:</span><span class="w"> </span>{}<span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Egress, Ingress]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">egress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">to</span><span class="p">:</span><span class="w">                                     </span><span class="c"># intra-namespace traffic OK</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">podSelector</span><span class="p">:</span><span class="w"> </span>{}<span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">to</span><span class="p">:</span><span class="w">                                     </span><span class="c"># DNS to kube-dns OK</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">namespaceSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kubernetes.io/metadata.name</span><span class="p">:</span><span class="w"> </span><span class="l">kube-system</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">ports</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>{<span class="nt">port: 53, protocol</span><span class="p">:</span><span class="w"> </span><span class="l">UDP}]</span><span class="w">
</span></span></span></code></pre></div><p>There is no rule for <code>web-tenant-globex</code>. Cilium&rsquo;s eBPF datapath drops the SYN packet on the way out.</p>
<p><strong>The receipt</strong> — an n8n HTTP node configured to GET <code>http://whoami.web-tenant-globex.svc.cluster.local/</code>. It hangs for the full timeout, then errors with <code>AxiosError: timeout of 5000ms exceeded</code> / <code>code: ECONNABORTED</code>.</p>
<p>The interesting bit: <strong>DNS still works.</strong> kube-dns is allowed, so the cross-namespace Service still resolves. The TCP handshake is what gets dropped. That&rsquo;s a useful signal in real incident response — &ldquo;DNS resolves but the connection hangs&rdquo; almost always means a NetworkPolicy is the cause.</p>
<hr>
<h2 id="wall-2--secret-vault-kubernetes-auth--eso">Wall 2 — Secret: Vault Kubernetes-auth + ESO</h2>
<p>Now imagine Acme&rsquo;s n8n misbehaves: somebody pushes a workflow that tries to read Globex&rsquo;s API keys via an <code>ExternalSecret</code>. The network isn&rsquo;t the issue — both tenants need to reach Vault, so they both have an egress rule for <code>sys-vault</code>. The wall has to be at the identity layer.</p>
<p>Each tenant gets three things:</p>
<ol>
<li>A dedicated <code>ServiceAccount</code> (<code>n8n-acme</code>, <code>n8n-globex</code>).</li>
<li>A Vault Kubernetes-auth <code>role</code> bound to that SA in that namespace, mapped to a Vault <code>policy</code> that grants <code>read</code> on <em>only its own</em> KV path.</li>
<li>A namespaced External Secrets <code>SecretStore</code> that authenticates as the SA via the Kubernetes TokenRequest API.</li>
</ol>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="cl"><span class="c1"># Vault policy: tenant-acme can read its own secrets, nothing else.
</span></span></span><span class="line"><span class="cl"><span class="n">path &#34;secret/data/web-tenant-acme&#34;     { capabilities</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;read&#34;</span><span class="p">]</span> }
</span></span><span class="line"><span class="cl"><span class="n">path &#34;secret/metadata/web-tenant-acme&#34; { capabilities</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;read&#34;</span><span class="p">]</span> }
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">vault write auth/kubernetes/role/tenant-acme <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="nv">bound_service_account_names</span><span class="o">=</span>n8n-acme <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="nv">bound_service_account_namespaces</span><span class="o">=</span>web-tenant-acme <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="nv">policies</span><span class="o">=</span>tenant-acme <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="nv">ttl</span><span class="o">=</span>1h
</span></span></code></pre></div><p>When Acme&rsquo;s n8n tries an <code>ExternalSecret</code> pointing at <code>secret/web-tenant-globex/...</code>, ESO authenticates fine (the SA is valid), Vault recognises the caller, looks up the <code>tenant-acme</code> policy, and answers with the most satisfying line in this whole demo:</p>
<pre tabindex="0"><code>URL: GET http://sys-vault.sys-vault.svc.cluster.local:8200/v1/secret/data/web-tenant-globex
Code: 403. Errors:
* permission denied
</code></pre><p>This is the bit that separates &ldquo;namespace isolation&rdquo; from real multi-tenant secret isolation. Plain Kubernetes Secrets + RBAC stop a tenant from <em>listing</em> another tenant&rsquo;s Secret objects, but the moment you go upstream — to Vault, to a cloud KMS, to an SSM Parameter Store — the secret store needs to enforce identity itself. The network said yes; the secret store still says no.</p>
<hr>
<h2 id="wall-3--resource-resourcequota--limitrange">Wall 3 — Resource: ResourceQuota + LimitRange</h2>
<p>The third concern is the noisy neighbour: Acme&rsquo;s runaway workflow allocating a 4Gi pod and OOM-killing everything else on the node. The network policy doesn&rsquo;t catch this (no network call), and Vault doesn&rsquo;t catch this (no secret request). The kernel will, <em>eventually</em> — but you don&rsquo;t want eventually. You want admission-time rejection.</p>
<p>Two primitives:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ResourceQuota</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">name: tenant-quota, namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-tenant-acme }</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">hard</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">requests.cpu</span><span class="p">:</span><span class="w">    </span><span class="s2">&#34;1&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">requests.memory</span><span class="p">:</span><span class="w"> </span><span class="l">1Gi</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">limits.cpu</span><span class="p">:</span><span class="w">      </span><span class="s2">&#34;2&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">limits.memory</span><span class="p">:</span><span class="w">   </span><span class="l">2Gi</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">pods</span><span class="p">:</span><span class="w">            </span><span class="s2">&#34;10&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nn">---</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">LimitRange</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">name: tenant-limits, namespace</span><span class="p">:</span><span class="w"> </span><span class="l">web-tenant-acme }</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">limits</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">Container</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">default</span><span class="p">:</span><span class="w">        </span>{<span class="w"> </span><span class="nt">cpu: 500m, memory</span><span class="p">:</span><span class="w"> </span><span class="l">512Mi }</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">defaultRequest</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">cpu: 50m,  memory</span><span class="p">:</span><span class="w"> </span><span class="l">128Mi }</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">max</span><span class="p">:</span><span class="w">            </span>{<span class="w"> </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;2&#34;</span><span class="nt">,  memory</span><span class="p">:</span><span class="w"> </span><span class="l">1Gi }</span><span class="w">
</span></span></span></code></pre></div><p><code>ResourceQuota</code> caps the namespace total. <code>LimitRange</code> bounds any <em>individual</em> container and supplies defaults so pods that don&rsquo;t declare requests/limits still get reasonable ones — important because a missing limit on a single container can blow past the quota in one allocation.</p>
<p><strong>The receipt</strong> — a server-side dry-run of a single 4Gi pod, which never gets created:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ kubectl apply -n web-tenant-acme --dry-run<span class="o">=</span>server -f noisy-neighbor.yaml
</span></span><span class="line"><span class="cl">Error from server <span class="o">(</span>Forbidden<span class="o">)</span>: error when creating <span class="s2">&#34;STDIN&#34;</span>:
</span></span><span class="line"><span class="cl">pods <span class="s2">&#34;noisy-neighbor&#34;</span> is forbidden:
</span></span><span class="line"><span class="cl">  maximum memory usage per Container is 1Gi, but limit is 4Gi
</span></span></code></pre></div><p>Not a kernel OOMKill. Not a pod stuck in <code>Pending</code>. A flat refusal from the API server before the scheduler even sees the request.</p>
<hr>
<h2 id="what-this-does-not-prove">What this does <em>not</em> prove</h2>
<p>A homelab demo on one node with two synthetic tenants is not n8n Cloud. The honest gaps:</p>
<ul>
<li><strong>Execution sandboxing.</strong> A workflow can still run arbitrary code via the <code>Code</code> node or shell-outs. These walls stop <em>infrastructure</em> leakage; they don&rsquo;t sandbox what n8n itself executes. Real n8n Cloud needs more than namespace walls for that — gVisor / Firecracker / per-tenant worker pools are the usual answers, and n8n&rsquo;s <a href="https://docs.n8n.io/hosting/scaling/queue-mode/">queue mode</a> lends itself to the last.</li>
<li><strong>Pooled worker queues.</strong> Queue mode runs main/webhook/worker as separate deployments backed by Redis + Postgres. Two tenants sharing a worker pool need additional checks at the job-routing layer to keep workflows from accessing the wrong tenant&rsquo;s binary data. Out of scope for the homelab demo.</li>
<li><strong>Control plane.</strong> Both tenants reach the same API server. A cluster-admin-equivalent compromise breaks everything. This is the assumption every shared K8s setup makes.</li>
<li><strong>Node-level.</strong> Same kernel. Container escape, CPU side channels, the usual list — all apply. For paranoid tenants the answer is dedicated nodes via taints/tolerations or separate clusters entirely.</li>
</ul>
<p>The demo proves the <em>namespace-shaped</em> walls hold. It does not prove the whole stack is safe against a determined attacker already running code inside a tenant. That&rsquo;s a different post.</p>
<hr>
<p><em>Part of a Kubernetes-on-the-homelab series — previously: <a href="/posts/k8s-network-isolation/">preventing a compromised pod from calling your database</a>, <a href="/posts/k8s-gitops-secrets/">GitOps secrets</a>.</em></p>
]]></content:encoded></item><item><title>🍵 I A/B-Tested Cloud vs Local LLMs in One n8n Agent. The Local One Faked It.</title><link>https://blog.hippotion.com/posts/n8n-agent-cloud-vs-local/</link><pubDate>Fri, 07 Nov 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/n8n-agent-cloud-vs-local/</guid><description>I built an AI agent in self-hosted n8n over my kombucha-tracking app, then gave it two brains — NVIDIA&amp;rsquo;s 70B and a local Phi-3.5 — sharing the same tools. The cloud model called the tools and answered from real data. The local one couldn&amp;rsquo;t, so it made things up.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p>I run <a href="https://n8n.io">n8n</a> on my k3s homelab. Not docker-compose on a NUC — the full treatment: GitOps-reconciled, Vault-backed secrets, default-deny networking. The same boring platform everything else here runs on.</p>
<p>But &ldquo;I have n8n running&rdquo; proves nothing. I wanted to know if I actually understood it as an <em>agent platform</em>, and to answer a question I kept dodging: <strong>for agent work, do I need a cloud model, or is my local one good enough?</strong></p>
<p>So I built a real agent and gave it two brains.</p>
<h2 id="what-i-built">What I built</h2>
<p>A chat assistant over brew-buddy, my homemade kombucha-tracking app (React + a small API + Postgres). You ask it things in plain language; it calls the app&rsquo;s API and answers. The twist: the same question runs through <strong>two agents in parallel</strong> — one backed by NVIDIA&rsquo;s hosted <strong>Llama-3.3-70B</strong>, one by a local <strong>Phi-3.5-mini</strong> on CPU — and the workflow prints both answers side by side.</p>
<pre tabindex="0"><code>Chat ──▶ Agent (cloud: NVIDIA 70B) ──┐   tools (shared):
     └─▶ Agent (local: Phi-3.5)   ──┤     • get_all_batches
                                    │     • get_batch_detail
                                    │     • brewing_statistics
            (Merge) ──▶ both replies, labeled     • add_batch_log   ⟵ write
                                                  • create_batch    ⟵ write
</code></pre><p>Both agents share the same read tools. The two <em>write</em> tools are wired to the cloud agent only — more on that below.</p>
<p><img alt="The kombucha agent in n8n: a chat trigger fans out to two AI Agent nodes (cloud and local), both wired to the same brew-buddy tools, then merged so the two answers print side by side." loading="lazy" src="/posts/n8n-agent-cloud-vs-local/n8n.png"></p>
<p>The nice part: I didn&rsquo;t write a line of glue. n8n&rsquo;s stock <strong>OpenAI Chat Model</strong> node talks to anything OpenAI-compatible if you override the credential&rsquo;s Base URL — so one node points at <code>https://integrate.api.nvidia.com/v1</code>, the other at <code>http://llama-server.&lt;ns&gt;.svc:8080/v1</code> for the local server. Same node, two endpoints.</p>
<h2 id="the-infra-that-keeps-it-honest">The infra that keeps it honest</h2>
<p>I won&rsquo;t re-explain the platform here — it&rsquo;s in earlier posts: <a href="/posts/homelab-gitops/">GitOps</a>, <a href="/posts/k8s-gitops-secrets/">Vault-backed secrets</a>, <a href="/posts/k8s-network-isolation/">default-deny networking</a>, <a href="/posts/homelab-dual-path-tls/">dual-path TLS ingress</a>. But building the agent made one of them <em>tangible</em>.</p>
<p>n8n is, by design, a thing that makes arbitrary HTTP calls on a schedule. That&rsquo;s exactly what you want behind a default-deny network policy. n8n couldn&rsquo;t reach the brew-buddy API at all until I declared it — one line:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># n8n&#39;s namespace</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">allowEgressToNamespaces</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">web-ai-engine, web-brew-buddy]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c">#                                          ^ added this for the agent</span><span class="w">
</span></span></span></code></pre></div><p>(plus a matching ingress-allow on brew-buddy&rsquo;s side). That&rsquo;s the posture working as intended: the blast radius of a workflow tool is whatever I&rsquo;ve explicitly granted, and not one namespace more. Adding a capability is a reviewable one-liner in Git; Argo reconciles it. No <code>kubectl</code>, no guessing what n8n can reach.</p>
<h2 id="the-ab-same-agent-same-tools-two-brains">The A/B: same agent, same tools, two brains</h2>
<p><strong>Plain &ldquo;hi&rdquo;.</strong> Cloud answers in ~0.5s. Local takes noticeably longer — because even for &ldquo;hi&rdquo;, the agent feeds the model the full system prompt <em>plus the JSON schemas for every tool</em>, and Phi-3.5 has to chew through all of it on CPU before it can say a word. So far, the boring expected result: local is slower.</p>
<p>Then I asked a real question, and the result flipped in a way I didn&rsquo;t expect.</p>
<p><strong>&ldquo;What batches do I have?&rdquo;</strong></p>
<p>Cloud (70B) called <code>get_all_batches</code>, got the real rows, and answered:</p>
<blockquote>
<p>You have two batches: 2026-04-09-A (cold-crash, 3L) and 2026-04-09-W (cold-crash, 3L).</p>
</blockquote>
<p>Local (Phi-3.5) <strong>never called the tool.</strong> It didn&rsquo;t seem to realise it <em>had</em> tools. Instead it confidently explained how <em>I</em> could go find the data myself:</p>
<blockquote>
<p>To list all batches: 1. Access the brew-buddy app. 2. Look for a button labeled &ldquo;List Batches&rdquo;… <code>def get_all_batches(): …</code> … Remember, I&rsquo;m unable to directly interact with apps or databases.</p>
</blockquote>
<p>Fake instructions. Fake code. A polite apology. Everything except the actual answer it was sitting on top of.</p>
<p><strong>Writing data.</strong> I asked both to <em>log</em> an observation. Cloud called <code>add_batch_log</code> and wrote a real row to Postgres (&ldquo;I have recorded the observation…&rdquo;). Local bluffed again — &ldquo;here&rsquo;s how <em>you</em> can log it yourself.&rdquo;</p>
<h2 id="why-it-matters-capability-not-latency">Why it matters: capability, not latency</h2>
<p>The interesting finding isn&rsquo;t &ldquo;the big model is better.&rdquo; It&rsquo;s <em>how</em> the small one fails.</p>
<p>With a ~3.8B model on CPU, the bottleneck for agent work isn&rsquo;t speed — it&rsquo;s <strong>capability</strong>. Phi-3.5 couldn&rsquo;t reliably emit tool calls, so n8n&rsquo;s tools never fired, and the model degraded into a chatbot that <strong>hallucinates a plausible answer instead of fetching the real one.</strong> That failure mode is worse than an error: an error you catch, a confident wrong answer you ship.</p>
<p>A couple of measurements that sharpened it:</p>
<ul>
<li>NVIDIA 70B, <strong>plain chat</strong>: ~0.5s.</li>
<li>NVIDIA 70B, <strong>function-calling</strong> (with tool schemas): ~8.6s per round-trip — and an agent makes several round-trips per answer. That&rsquo;s real latency you have to budget a timeout for. (It&rsquo;s also why the cloud side initially <em>timed out</em> in n8n until I raised the model node&rsquo;s timeout — the model was fine, n8n was cutting it off.)</li>
</ul>
<p>So the snappy-vs-slow comparison <strong>flips depending on whether the question triggers tools</strong>. Plain chat: cloud wins on speed. Tool use: the local model is &ldquo;fast&rdquo; only because it skips the tools and makes something up. Speed was never the real axis.</p>
<p>The honest caveat: this is <em>this</em> small general model in a multi-tool agent loop. Purpose-built small models with tool-calling fine-tunes do better at narrow tasks — I run a 1.7B one elsewhere that emits a single structured tool call just fine. But for &ldquo;pick the right tool from several and chain them,&rdquo; 70B was in a different league.</p>
<h2 id="the-trust-boundary">The trust boundary</h2>
<p>I gave the write tools (<code>add_batch_log</code>, <code>create_batch</code>) to the cloud agent <strong>only</strong>. The local agent is read-only — not by instruction, by wiring. Even if Phi-3.5 <em>did</em> decide to call a write tool, the connection isn&rsquo;t there. The reliable model is the only one allowed to mutate real data, and that&rsquo;s enforced structurally, not by trusting a prompt.</p>
<h2 id="whats-toy-and-whats-real">What&rsquo;s toy and what&rsquo;s real</h2>
<p>Worth being straight: this is a <strong>single-node homelab</strong>. The agent and both model paths share one box. Running n8n on Kubernetes and swapping models isn&rsquo;t novel — <a href="https://docs.n8n.io/hosting/scaling/queue-mode/">n8n&rsquo;s own docs</a> cover queue mode, where a main instance fans work out to a pool of worker pods you scale horizontally, with external Postgres for state. That&rsquo;s the real production shape. Mine is one replica with an emptyDir&rsquo;s worth of ambition.</p>
<p>What I think <em>is</em> worth sharing is the finding (the capability cliff, and that its failure mode is confident fabrication) and the boring thing underneath it: because the platform is default-deny and GitOps-reconciled, running this experiment cost me one reviewable egress line and zero risk to anything else.</p>
<h2 id="the-boring-part-is-the-point">The boring part is the point</h2>
<p>The AI was the fun bit. But the reason I could bolt an agent onto a live cluster, point it at a real app, give it write access to one model and not the other, and tear it all down again — without worrying what it might touch — is that the infrastructure was already boring. Default-deny. Secrets out of Git. <code>git push</code>, Argo reconciles.</p>
<p>The model picks the tools. The platform decides what the tools can reach. Keep those two honest about each other and self-hosting an agent stops being scary and starts being just another app.</p>
]]></content:encoded></item><item><title>🕵️ Privacy-Preserving LLM Pipelines: Anonymize Before You Send</title><link>https://blog.hippotion.com/posts/llm-anonymizer-privacy-pipeline/</link><pubDate>Fri, 12 Sep 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/llm-anonymizer-privacy-pipeline/</guid><description>Replace PII with semantically realistic fakes before sending to a cloud LLM, then restore the originals from the response. Started with a general model and prompt engineering — then upgraded to a purpose-built 1.7B fine-tune via Ollama.</description><content:encoded><![CDATA[<h2 id="the-problem-with-blocking">The problem with blocking</h2>
<p>The <a href="/posts/ai-pii-guardrail-proxy/">PII guardrail proxy I built last week</a> works by classifying prompts and blocking the sensitive ones. That&rsquo;s fine for a chat interface where a human can rephrase. It doesn&rsquo;t work for automated pipelines.</p>
<p>If a Jira ticket contains someone&rsquo;s name and an internal hostname, you don&rsquo;t want the agent to fail — you want it to process the ticket without exposing that data. Blocking is the wrong primitive for pipelines. Anonymization is the right one.</p>
<h2 id="the-pattern">The pattern</h2>
<pre tabindex="0"><code>Input text
  → anonymizer: extract PII, replace with semantic fakes
  → &#34;Nathan Chen from DataSoft LLC needs ProjectX fixed on dev.internal.net&#34;
  + mapping: {&#34;Nathan Chen&#34; → &#34;John Smith&#34;, &#34;DataSoft LLC&#34; → &#34;ACME&#34;, ...}
  → cloud LLM: processes coherent text, never sees real values
  → &#34;Nathan Chen should check the ProjectX docs with the DataSoft LLC team&#34;
  → string substitution with reverse mapping
  → &#34;John Smith should check the OAuth docs with the ACME team&#34;
</code></pre><p>Two things that make this work:</p>
<p><strong>Deanonymization needs no LLM.</strong> Once you have the mapping, restoring is pure string substitution. The model call only happens on the way in.</p>
<p><strong>Semantic fakes beat placeholder tokens.</strong> An earlier version of this used <code>[PERSON_1]</code>, <code>[ORG_1]</code> tokens. The problem: cloud models see bracketed text and subtly change behaviour — shorter responses, hedging, dropped context. When the cloud model sees <code>Nathan Chen from DataSoft LLC</code>, it treats it as real text and responds naturally. Quality is noticeably better.</p>
<h2 id="prior-art--what-already-exists">Prior art — what already exists</h2>
<p>This is a well-established pattern. Worth knowing what&rsquo;s out there:</p>
<p><strong><a href="https://llm-guard.com/output_scanners/deanonymize/">LLM Guard</a></strong> (Protect AI) — the most complete open-source implementation. Anonymize + Deanonymize scanner pair with a Vault for the mapping. Production-grade, actively maintained. Start here if you&rsquo;re building this for anything serious.</p>
<p><strong><a href="https://techcommunity.microsoft.com/blog/azuredevcommunityblog/introducing-pii-shield-a-privacy-proxy-for-every-llm-call/4514726">Microsoft PII Shield</a></strong> — session-based proxy. Returns a session ID with the anonymized text, uses it to deanonymize the response.</p>
<p><strong><a href="https://github.com/fsndzomga/anonLLM">anonLLM</a></strong> — uses GLiNER (a proper NER model) + Faker for realistic replacements. Better accuracy than a general chat model.</p>
<p><strong><a href="https://ieeexplore.ieee.org/document/11140717/">REDACT</a></strong> — IEEE paper describing a system using Ollama for PII redaction in documents.</p>
<p><strong><a href="https://huggingface.co/blog/pratyushrt/anonymizerslm">HuggingFace Anonymizer SLM series</a></strong> — purpose-built models (0.6B/1.7B/4B) fine-tuned specifically for anonymization. 9.20/10 quality score for 1.7B, close to GPT-4.1&rsquo;s 9.77.</p>
<p>That last one is what this implementation actually uses.</p>
<h2 id="the-model-anonymizer-17b">The model: Anonymizer-1.7B</h2>
<p><a href="https://huggingface.co/eternisai/Anonymizer-1.7B">eternisai/Anonymizer-1.7B</a> is a Qwen3-1.7B fine-tune trained on ~30k anonymization samples using GRPO with GPT-4.1 as judge. It outputs structured tool calls instead of free text:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;replace_entities&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;arguments&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;replacements&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl">      <span class="p">{</span><span class="nt">&#34;original&#34;</span><span class="p">:</span> <span class="s2">&#34;John Smith&#34;</span><span class="p">,</span> <span class="nt">&#34;replacement&#34;</span><span class="p">:</span> <span class="s2">&#34;Nathan Chen&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">      <span class="p">{</span><span class="nt">&#34;original&#34;</span><span class="p">:</span> <span class="s2">&#34;ACME Corp&#34;</span><span class="p">,</span> <span class="nt">&#34;replacement&#34;</span><span class="p">:</span> <span class="s2">&#34;DataSoft LLC&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">      <span class="p">{</span><span class="nt">&#34;original&#34;</span><span class="p">:</span> <span class="s2">&#34;auth.acme.internal&#34;</span><span class="p">,</span> <span class="nt">&#34;replacement&#34;</span><span class="p">:</span> <span class="s2">&#34;dev.internal.net&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="p">]</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>No prompt engineering needed. The model knows exactly what it&rsquo;s doing and outputs a structured contract. Compare that to the first version of this service, which sent a long JSON-format prompt to Phi-3.5-mini and hoped the output parsed correctly.</p>
<p>The model runs via Ollama (which handles the Qwen3 chat template and tool calling natively), pointed at the GGUF version from HuggingFace: <code>hf.co/gabriellarson/Anonymizer-1.7B-GGUF</code>.</p>
<h2 id="the-implementation">The implementation</h2>
<p><code>llm-anonymizer</code> is a FastAPI service with two endpoints.</p>
<p><strong><code>POST /anonymize</code></strong> — calls Ollama with the tool definition, parses the response:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">TOOLS</span> <span class="o">=</span> <span class="p">[{</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;function&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;function&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;replace_entities&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;description&#34;</span><span class="p">:</span> <span class="s2">&#34;Replace PII entities with anonymized versions&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;parameters&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;object&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="s2">&#34;properties&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="s2">&#34;replacements&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;array&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                    <span class="s2">&#34;items&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                        <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;object&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                        <span class="s2">&#34;properties&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                            <span class="s2">&#34;original&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">                            <span class="s2">&#34;replacement&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;string&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">                        <span class="p">},</span>
</span></span><span class="line"><span class="cl">                        <span class="s2">&#34;required&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;original&#34;</span><span class="p">,</span> <span class="s2">&#34;replacement&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="cl">                    <span class="p">},</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">            <span class="p">},</span>
</span></span><span class="line"><span class="cl">            <span class="s2">&#34;required&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;replacements&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">}]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">resp</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">OLLAMA_BASE</span><span class="si">}</span><span class="s2">/api/chat&#34;</span><span class="p">,</span> <span class="n">json</span><span class="o">=</span><span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;model&#34;</span><span class="p">:</span> <span class="n">MODEL</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl">        <span class="p">{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="n">SYSTEM_PROMPT</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">        <span class="p">{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="n">text</span> <span class="o">+</span> <span class="s2">&#34;</span><span class="se">\n</span><span class="s2">/no_think&#34;</span><span class="p">},</span>  <span class="c1"># skip Qwen3 thinking mode</span>
</span></span><span class="line"><span class="cl">    <span class="p">],</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;tools&#34;</span><span class="p">:</span> <span class="n">TOOLS</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;stream&#34;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">})</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">tool_calls</span> <span class="o">=</span> <span class="n">resp</span><span class="o">.</span><span class="n">json</span><span class="p">()[</span><span class="s2">&#34;message&#34;</span><span class="p">][</span><span class="s2">&#34;tool_calls&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="n">replacements</span> <span class="o">=</span> <span class="n">tool_calls</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s2">&#34;function&#34;</span><span class="p">][</span><span class="s2">&#34;arguments&#34;</span><span class="p">][</span><span class="s2">&#34;replacements&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Build reverse mapping: replacement → original (for deanonymization)</span>
</span></span><span class="line"><span class="cl"><span class="n">anonymized</span> <span class="o">=</span> <span class="n">text</span>
</span></span><span class="line"><span class="cl"><span class="n">mapping</span> <span class="o">=</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl"><span class="k">for</span> <span class="n">pair</span> <span class="ow">in</span> <span class="n">replacements</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">anonymized</span> <span class="o">=</span> <span class="n">anonymized</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="n">pair</span><span class="p">[</span><span class="s2">&#34;original&#34;</span><span class="p">],</span> <span class="n">pair</span><span class="p">[</span><span class="s2">&#34;replacement&#34;</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">    <span class="n">mapping</span><span class="p">[</span><span class="n">pair</span><span class="p">[</span><span class="s2">&#34;replacement&#34;</span><span class="p">]]</span> <span class="o">=</span> <span class="n">pair</span><span class="p">[</span><span class="s2">&#34;original&#34;</span><span class="p">]</span>
</span></span></code></pre></div><p>The <code>/no_think</code> suffix tells the model to skip its chain-of-thought — faster response, same accuracy for this task.</p>
<p><strong><code>POST /deanonymize</code></strong> — no model call, just substitution:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">for</span> <span class="n">replacement</span><span class="p">,</span> <span class="n">original</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">mapping</span><span class="o">.</span><span class="n">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">reverse</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">text</span> <span class="o">=</span> <span class="n">text</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="n">replacement</span><span class="p">,</span> <span class="n">original</span><span class="p">)</span>
</span></span></code></pre></div><p>Sorted by length descending so longer tokens don&rsquo;t get partially overwritten by shorter ones.</p>
<h2 id="the-kubernetes-stack">The Kubernetes stack</h2>
<p>Ollama runs as a separate deployment in the same namespace as everything else (<code>web-ai-engine</code>). Intra-namespace traffic is always allowed — no new network policies.</p>
<pre tabindex="0"><code>llm-anonymizer (FastAPI) → Ollama (port 11434) → Anonymizer-1.7B GGUF
</code></pre><p>One-time model pull after first deploy:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">kubectl <span class="nb">exec</span> -n web-ai-engine deploy/ollama -- <span class="se">\
</span></span></span><span class="line"><span class="cl">  ollama pull hf.co/gabriellarson/Anonymizer-1.7B-GGUF
</span></span></code></pre></div><p>Ollama caches it on a 10Gi PVC, so pod restarts don&rsquo;t re-download.</p>
<h2 id="the-n8n-pipeline">The n8n pipeline</h2>
<p>Five-node chain triggered by webhook:</p>
<pre tabindex="0"><code>Webhook → /anonymize → NVIDIA NIM → /deanonymize → Respond
</code></pre><p>The NVIDIA NIM call includes a system prompt instructing it to treat the text as normal input. No mention of tokens, no special handling — because the text looks like real text.</p>
<p>Wire any upstream source to the webhook: Jira event, Slack slash command, a scheduled job that processes internal docs. The pipeline is source-agnostic.</p>
<h2 id="the-caveats">The caveats</h2>
<p><strong>1.7B isn&rsquo;t GPT-4.1.</strong> The model scores 9.20/10 on the benchmark — which means roughly 1 in 10 cases has a missed or incorrect entity. Test with real examples from your domain before depending on it.</p>
<p><strong>Deanonymization breaks on heavy rephrasing.</strong> If the cloud model restructures a sentence enough that the fake value no longer appears verbatim, the substitution silently misses it. The prompt helps but doesn&rsquo;t eliminate the risk.</p>
<p><strong>Ollama adds a deployment.</strong> It&rsquo;s ~500MB image + the model weights (~1GB Q4). On a constrained single-node cluster that&rsquo;s real overhead. llama-server already covers general chat; Ollama is purely for this model&rsquo;s tool-calling support.</p>
<h2 id="source">Source</h2>
<p><a href="https://github.com/janos-gyorgy/llm-anonymizer">github.com/janos-gyorgy/llm-anonymizer</a> — MIT licensed, Kubernetes manifests and n8n workflow included.</p>
]]></content:encoded></item></channel></rss>