N8n on hippotion

VoteWatch: How Your Representatives Voted — and Whether You'd Agree

Fri, 15 May 2026 00:00:00 +0000

Open data nobody opens

Every vote in the European Parliament and the Slovak National Council is public. The EU even ships it as a clean API. And almost nobody reads it, because the raw record is unreadable: “Návrh poslanca… ktorým sa dopĺňa zákon č. 581/2004 Z. z. … (tlač 1259) — tretie čítanie, hlasovanie o návrhu zákona ako o celku.” Multiply that by a few hundred votes a sitting. Transparency that no human can parse is transparency on paper only.

So I built VoteWatch — a small site on my homelab that turns the record into something a citizen can actually use: what was decided, who voted, and do you agree?

VoteWatch SK: each decision summarised in plain language, which parties voted how, and a Yes/No question whose live citizen tally sits next to how parliament actually voted — labelled agree or gap.

Two halves, one lopsided

The EU half was easy. HowTheyVote.eu already did the hard work and publishes roll-call votes as a clean, open-licensed API. You consume it; you don’t scrape it.

The Slovak half is where the real work lives — and the real value. nrsr.sk has no API. The HTML is the contract: a results listing, and per-vote pages where each MP appears next to a one-letter code ([Z] za, [P] proti, [?] zdržal sa). So the national half is a genuine scraper — the unglamorous kind that nobody maintains, which is exactly why a gap exists to fill. The unglamorous part is the moat.

From ten votes to one question

A single bill generates a pile of procedural roll-calls — shorten the debate, move to third reading, amendment block A, amendment block B, the bill as a whole. Ten rows that are really one decision. Nobody wants ten rows.

So the pipeline groups votes by bill, then asks an LLM (llama-3.3-70b on NVIDIA NIM) to do exactly one job: turn the bureaucratic titles into a plain headline, two sentences of summary, and one neutral Yes/No question a person can actually answer. Seven votes on the health-insurer bill collapse into: “Changes to the health-insurance law” → “Do you agree with the health-insurance bill?”

The rule that keeps it honest

Here’s the line I won’t cross, and it’s the whole reason I trust the result: the AI writes the prose, but it never decides a fact.

Which votes belong to one bill? Deterministic — parsed from the bill number.
Did it pass? Deterministic — read from the result row.
Which parties voted for, against, abstained? Deterministic — tallied from the per-MP record, shown as Za: SMER-SD, HLAS-SD, SNS · Zdržali sa: PS, KDH, SaS.

The model only touches language: the headline, the summary, the question. If it hallucinates, you get an awkward sentence — never a wrong vote count. And if the model fails entirely, the card falls back to the raw title. The facts come from the record; the model just makes the record legible. For civic data, that separation isn’t a nice-to-have — it’s the difference between a tool and a liability. (Every card says so out loud: summaries are AI-generated; the raw record prevails.)

The part that closes the loop

Showing people how their representatives voted is only half a feedback loop. The other half is letting them answer.

Each decision carries its one distilled question and two buttons — Áno / Nie. You vote, and the site shows the citizen tally next to how parliament actually decided, with the honest verdict on top: "✓ Citizens and Parliament agree" or "⚖ Gap between citizens and Parliament." That gap is the entire point. It’s the thesis behind a side project of mine called veracracy — governance measured against verified knowledge and the actual will of the governed — made concrete enough to click.

The same loop on the European Parliament — dossiers consolidated, political-group stances (EPP, S&D, PfE…), and the citizen poll under each topic.

The backend is deliberately boring. The site is static (git-synced nginx, same as this blog). Votes can’t POST to a static page, so they go to a public n8n webhook that records to a data table and returns live tallies — no new service, no database, just the automation box I already run. Vote keys are namespaced so EU and Slovak polls share one store without colliding.

The honest caveat

Dedup is browser-local. It stops casual double-voting, but behind a Cloudflare tunnel every request shares one IP, so this is an indicative signal, not a secured ballot. That’s the right altitude for “let people express an opinion.” The day it needs to mean more than that, it needs real identity first — and I’d rather ship the honest version than fake the robust one.

It’s live at votewatch.hippotion.com — the EU parliament and the Slovak NR SR, every MEP and every poslanec, in plain language, with a button that asks the only question that matters after a vote: would you have voted the same way?

A neutral record — what was decided and who decided it — not a villain list. Data © HowTheyVote.eu (ODbL) and nrsr.sk.

Mind the gap: I pointed monitoring at my own skill set

Fri, 27 Mar 2026 00:00:00 +0000

A while back I applied for a senior platform role at n8n and didn’t land it. Fair enough — but “fair enough” isn’t actionable. Rejections come with no logs, no metrics, no trace. For someone who runs thirty-odd services with full observability, having vibes as the only instrumentation on my own career felt architecturally embarrassing.

So I built mind-the-gap: a pipeline that measures what the market demands, diffs it against what I can prove, and renders the gap as a private dashboard on my cluster. The job hunt is now a monitored system. This post is about the non-obvious decisions.

Demand: an LLM reads job listings so I don’t have to

I already had a job poller — an n8n workflow that polls the public ATS APIs (Greenhouse / Lever / Ashby) of ~33 companies plus a broad remote-jobs feed every six hours. A sibling workflow now re-fetches the same boards and, for every listing that passes the role+location gate, asks a small hosted LLM (Llama-3.1-8B) for a structured extraction:

{"seniority": "senior", "skills": [{"name": "kubernetes", "importance": "must"}, ...]}

One row per (job, skill) lands in an n8n Data Table. Decisions that mattered:

One LLM call per job, not one batch. Free-tier inference times out on batches; per-job calls are slower but fail independently. A lesson the poller already paid for.
Insert doubles as the processed-marker. A job whose extraction fails to parse produces no rows — so it’s retried next run, for free. No status column, no second table.
Canonicalization in code, not in the prompt. The model says “K8s”, “k3s”, “EKS” on different days regardless of instructions. A dumb alias map (k8s→kubernetes, eks→aws) beats prompt engineering for consistency.
8B is good enough — with a guard. It occasionally echoed the seniority enum back literally ("junior|mid|senior|staff|lead|unspecified"). The fix is one line of validation, not a bigger model.

Supply: no artifact, no credit

The other side of the diff is a skills registry — markdown in my knowledge vault, with a machine-parseable YAML block. Every skill has a state, and the rule that keeps the whole thing honest is brutal: a skill counts as proven only if an artifact exists — a public repo, a blog post, documented production experience. Otherwise it’s claimed, and claimed earns half credit.

That rule immediately produced the most useful insight of the project: “invisible skill” is a real category. Python turned out to be the market’s #5 ask. I use it constantly — and could point to nothing public that shows it. The cheapest score increase isn’t learning something new; it’s a weekend making an existing skill visible. No gut-feeling gap analysis would have ranked “write about what you already do” above “learn the shiny thing.”

The score: distinct companies, not mentions

First naive aggregation: Canonical’s listings mention Ubuntu nine times, all marked must-have — suddenly Ubuntu looks like the hottest skill in Europe. Employer skew is the noise floor of small samples. The fix: demand weight = distinct companies naming the skill, not total mentions. One enthusiastic employer can’t move the radar.

Two more scoring rules I’d defend in review:

Skills named by fewer than two companies don’t count at all — single-listing noise stays out.
Demand the registry hasn’t classified yet shows up as “unreviewed” and counts fully against the score. An unreviewed market signal is a gap until proven otherwise; the dashboard nags me to triage it.

Rendering: the page is a git commit

The dashboard is a single static HTML file, and the pipeline that produces it never touches the cluster. render.js lives in this repo as the single source of truth; a nightly n8n workflow fetches it raw from GitLab, eval()s it against the Data Table rows and the registry, and — only if the result differs from what’s committed (timestamps stripped, or every night is a “change”) — PUTs the new index.html back via the GitLab API.

Serving is the same pattern as this blog: nginx plus a git-pull sidecar, deployed by Argo CD, behind the cluster’s OAuth middleware. The renderer has no kubeconfig, no SSH, no cluster access of any kind. GitLab stays the only source of truth — even for a page that rewrites itself nightly. If the workflow goes rogue, the worst it can do is a reviewable commit.

Day-one verdict

First run: 2,297 postings fetched, 25 in scope, 257 skill rows. Coverage score: 63%. Kubernetes and AWS tied at the top of demand — which means the AWS gap-closing project already in flight stopped being a hunch and became the measured top of the market. Go is the only top-ten demand with zero supply. The dashboard doesn’t get anyone a job; it just makes sure every learning Saturday is pointed where the data says, not where the hype does.

The job board rejected me. The data didn’t.

Workflows, render.js, and setup: github.com/janos-gyorgy/mind-the-gap.

🎯 Know the Market Without Job-Hunting: An LLM-Scored Job Poller in n8n

Fri, 13 Feb 2026 00:00:00 +0000

You don’t have to be about to change jobs to want to know the landscape. What’s being built, what it pays, where you’d actually fit — staying current on the market (and your own worth) is just good professional hygiene. The trouble is that checking is tedious, so most of us don’t, until we’re already job-hunting and starting cold.

So I automated mine. An n8n workflow on my homelab polls job boards every six hours, scores each new posting against my profile with an LLM, and emails me only the strong matches — the ones scoring 80%+. When it’s quiet, it’s silent. When something genuinely fits, I know the same day. Here’s what I learned building it. Repo at the bottom.

Three APIs cover most of the market

Company career pages look bespoke, but underneath, the vast majority run on one of three ATS — and all three hand you the jobs as unauthenticated JSON:

Greenhouse — boards-api.greenhouse.io/v1/boards/{token}/jobs?content=true
Lever — api.lever.co/v0/postings/{token}?mode=json
Ashby — api.ashbyhq.com/posting-api/job-board/{token}?includeCompensation=true

No scraping, no headless browser. You poll the API the page itself calls, normalize the three shapes into one { company, title, location, remote, url, posted_at, description, external_id }, and you’re done with the hard part.

“Resolve the token” is half the battle

The naive assumption — the token is the company name, and everyone’s on one of the three — is half right. When I probed my initial wishlist, roughly half 404’d everywhere: HashiCorp (now under IBM → Workday), SUSE (SuccessFactors), Aiven (Teamtailor), Hugging Face. They’re on a fourth or fifth system entirely. The honest move was to ship the ~33 that actually resolve and leave the rest as disabled config stubs. Verify before you trust a slug.

Dedup without a database

I didn’t want to stand up Postgres just to remember which jobs I’d already seen. n8n’s Data Tables handle it natively: a seen_jobs table, an external_id namespaced {ats}:{company}:{id}, and the rowNotExists operation drops anything already recorded. State lives inside n8n, backed up with it. Zero extra infrastructure.

The ordering matters: notify first, mark seen second. The insert only happens after the email sends, so a failed send retries next run instead of silently swallowing a posting.

The location filter is a trap

My first version kept everything that wasn’t explicitly US-based. The inbox filled with “Senior Platform Engineer — Spain (Remote)” and "… — United Kingdom (Remote)". Those aren’t remote-for-me — they’re remote if you live in Spain. Useless from where I sit.

The fix was to invert the logic. Keep only three things:

globally-remote / worldwide / anywhere,
pan-EU (EMEA / Europe / EU / EEA),
my own country.

…and drop single-country remote, even EU ones. Region and home matches win over the country deny-list, ambiguous locations are kept (a missed match is worse than one extra line to skim). That one change cut the noise more than anything else.

Let an LLM read the actual job

Keyword + location filtering gets you a candidate list, but it can’t tell a “Platform Engineer” who herds Kubernetes from a “Platform Engineer” who owns a Figma design system. The job description can.

So the last step scores each new posting against my CV. My first version batched all of them into one big LLM call — which promptly timed out on the free tier. The fix was the opposite: one small call per job, which also means a single slow or rate-limited job never sinks the batch. Each call asks a NVIDIA NIM model (Llama 3.1 8B, OpenAI-compatible) for one number and a reason:

Score this job 0–100 for fit against my profile. Return {score, reason}.

That score is what lets me widen the net instead of narrowing it. On top of the curated company list I pull a broad remote-jobs feed (every company, all categories); the cheap keyword + location filters do the first pass, then I only email the roles scoring 80%+. Casting wide is fine when a model is the bar at the door. A line ends up looking like:

92% — Grafana Labs — Senior Platform Engineer (Remote, EMEA) — strong k8s/GitOps overlap — link

Scoring is fail-safe: if a call hiccups, that job is just skipped, and every posting gets marked seen either way — so nothing re-scores forever, and a rare bad run never floods or stalls the inbox.

The unglamorous bits that make it trustworthy

One bad source can’t kill the run — every fetch is wrapped; failures become a ⚠️ N sources failing footer so a company quietly changing ATS is visible, not invisible.
A prime run seeds the table silently the first time, so I’m not buried under every currently-open role on day one.
Everything tunable lives in one Config node — companies, keywords, location lists, the profile, the model — so adding a company is a one-line edit, not a graph safari.

Takeaways

The “scrape job boards” problem mostly isn’t a scraping problem — it’s three public APIs and a normalizer.
For personal automation, reach for the boring-but-correct primitive: native dedup state beats a database you have to operate.
An LLM works best here as the bar at the door: cheap deterministic filters keep the candidate set (and the cost) small, then the model gates on real fit — which is what lets you cast a wide net without drowning in it.

Workflow JSON, the full node-by-node breakdown, and setup notes: github.com/janos-gyorgy/ats-job-poller.

🧱 How Do You Isolate Two n8n Tenants on Kubernetes — and Prove Each Wall Holds?

Fri, 19 Dec 2025 00:00:00 +0000

The question

“You’re running n8n for multiple customers on the same Kubernetes cluster. What stops Customer A from reading Customer B’s API keys, calling Customer B’s services, or starving Customer B’s workflows by burning the whole node?”

Three different walls, three different mechanisms. Most articles I’ve read on K8s multi-tenancy list the primitives — namespaces, NetworkPolicies, ResourceQuotas, RBAC — without showing what each one actually catches when you try to cross it. This post does the second part. The receipts are the point.

The setup: two namespaces, web-tenant-acme and web-tenant-globex, each running their own n8n instance on the same node. The only thing keeping them apart is the walls we build around each namespace.

The mental model: subtractive isolation

Kubernetes is a flat network with shared everything by default. You don’t add isolation by writing allow rules. You subtract trust by adding default-deny rules, and then carefully allow back only the connections each tenant actually needs.

A tenant doesn’t have access to another tenant because there is no rule allowing it. The absence of an allow rule is the wall.

Three of these absences make up the picture:

Wall	Primitive	Failure mode when crossed
Network	Cilium NetworkPolicy, default-deny egress	Connection times out (silent drop)
Secret	Vault Kubernetes-auth, per-tenant policy	`403 permission denied` from Vault itself
Resource	ResourceQuota + LimitRange	Pod rejected at admission time

Different layers, different error messages. That’s how you can tell what stopped you.

Wall 1 — Network: Cilium NetworkPolicy

n8n in web-tenant-acme can reach whoami.web-tenant-acme.svc.cluster.local (its own service in its own namespace) but not whoami.web-tenant-globex.svc.cluster.local. The same DNS shape, the same cluster, the same node. One succeeds, the other hangs.

The primitive is a default-deny egress policy applied to every pod in the namespace, with two narrow exceptions: intra-namespace traffic (so n8n can still reach its own service) and DNS to kube-system (otherwise nothing resolves anything).

# Effective policy on every pod in web-tenant-acme:
spec:
  podSelector: {}
  policyTypes: [Egress, Ingress]
  egress:
    - to:                                     # intra-namespace traffic OK
        - podSelector: {}
    - to:                                     # DNS to kube-dns OK
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports: [{port: 53, protocol: UDP}]

There is no rule for web-tenant-globex. Cilium’s eBPF datapath drops the SYN packet on the way out.

The receipt — an n8n HTTP node configured to GET http://whoami.web-tenant-globex.svc.cluster.local/. It hangs for the full timeout, then errors with AxiosError: timeout of 5000ms exceeded / code: ECONNABORTED.

The interesting bit: DNS still works. kube-dns is allowed, so the cross-namespace Service still resolves. The TCP handshake is what gets dropped. That’s a useful signal in real incident response — “DNS resolves but the connection hangs” almost always means a NetworkPolicy is the cause.

Wall 2 — Secret: Vault Kubernetes-auth + ESO

Now imagine Acme’s n8n misbehaves: somebody pushes a workflow that tries to read Globex’s API keys via an ExternalSecret. The network isn’t the issue — both tenants need to reach Vault, so they both have an egress rule for sys-vault. The wall has to be at the identity layer.

Each tenant gets three things:

A dedicated ServiceAccount (n8n-acme, n8n-globex).
A Vault Kubernetes-auth role bound to that SA in that namespace, mapped to a Vault policy that grants read on only its own KV path.
A namespaced External Secrets SecretStore that authenticates as the SA via the Kubernetes TokenRequest API.

# Vault policy: tenant-acme can read its own secrets, nothing else.
path "secret/data/web-tenant-acme"     { capabilities = ["read"] }
path "secret/metadata/web-tenant-acme" { capabilities = ["read"] }

vault write auth/kubernetes/role/tenant-acme \
  bound_service_account_names=n8n-acme \
  bound_service_account_namespaces=web-tenant-acme \
  policies=tenant-acme \
  ttl=1h

When Acme’s n8n tries an ExternalSecret pointing at secret/web-tenant-globex/..., ESO authenticates fine (the SA is valid), Vault recognises the caller, looks up the tenant-acme policy, and answers with the most satisfying line in this whole demo:

URL: GET http://sys-vault.sys-vault.svc.cluster.local:8200/v1/secret/data/web-tenant-globex
Code: 403. Errors:
* permission denied

This is the bit that separates “namespace isolation” from real multi-tenant secret isolation. Plain Kubernetes Secrets + RBAC stop a tenant from listing another tenant’s Secret objects, but the moment you go upstream — to Vault, to a cloud KMS, to an SSM Parameter Store — the secret store needs to enforce identity itself. The network said yes; the secret store still says no.

Wall 3 — Resource: ResourceQuota + LimitRange

The third concern is the noisy neighbour: Acme’s runaway workflow allocating a 4Gi pod and OOM-killing everything else on the node. The network policy doesn’t catch this (no network call), and Vault doesn’t catch this (no secret request). The kernel will, eventually — but you don’t want eventually. You want admission-time rejection.

Two primitives:

apiVersion: v1
kind: ResourceQuota
metadata: { name: tenant-quota, namespace: web-tenant-acme }
spec:
  hard:
    requests.cpu:    "1"
    requests.memory: 1Gi
    limits.cpu:      "2"
    limits.memory:   2Gi
    pods:            "10"
---
apiVersion: v1
kind: LimitRange
metadata: { name: tenant-limits, namespace: web-tenant-acme }
spec:
  limits:
    - type: Container
      default:        { cpu: 500m, memory: 512Mi }
      defaultRequest: { cpu: 50m,  memory: 128Mi }
      max:            { cpu: "2",  memory: 1Gi }

ResourceQuota caps the namespace total. LimitRange bounds any individual container and supplies defaults so pods that don’t declare requests/limits still get reasonable ones — important because a missing limit on a single container can blow past the quota in one allocation.

The receipt — a server-side dry-run of a single 4Gi pod, which never gets created:

$ kubectl apply -n web-tenant-acme --dry-run=server -f noisy-neighbor.yaml
Error from server (Forbidden): error when creating "STDIN":
pods "noisy-neighbor" is forbidden:
  maximum memory usage per Container is 1Gi, but limit is 4Gi

Not a kernel OOMKill. Not a pod stuck in Pending. A flat refusal from the API server before the scheduler even sees the request.

What this does not prove

A homelab demo on one node with two synthetic tenants is not n8n Cloud. The honest gaps:

Execution sandboxing. A workflow can still run arbitrary code via the Code node or shell-outs. These walls stop infrastructure leakage; they don’t sandbox what n8n itself executes. Real n8n Cloud needs more than namespace walls for that — gVisor / Firecracker / per-tenant worker pools are the usual answers, and n8n’s queue mode lends itself to the last.
Pooled worker queues. Queue mode runs main/webhook/worker as separate deployments backed by Redis + Postgres. Two tenants sharing a worker pool need additional checks at the job-routing layer to keep workflows from accessing the wrong tenant’s binary data. Out of scope for the homelab demo.
Control plane. Both tenants reach the same API server. A cluster-admin-equivalent compromise breaks everything. This is the assumption every shared K8s setup makes.
Node-level. Same kernel. Container escape, CPU side channels, the usual list — all apply. For paranoid tenants the answer is dedicated nodes via taints/tolerations or separate clusters entirely.

The demo proves the namespace-shaped walls hold. It does not prove the whole stack is safe against a determined attacker already running code inside a tenant. That’s a different post.

Part of a Kubernetes-on-the-homelab series — previously: preventing a compromised pod from calling your database, GitOps secrets.

🍵 I A/B-Tested Cloud vs Local LLMs in One n8n Agent. The Local One Faked It.

Fri, 07 Nov 2025 00:00:00 +0000

The question

I run n8n on my k3s homelab. Not docker-compose on a NUC — the full treatment: GitOps-reconciled, Vault-backed secrets, default-deny networking. The same boring platform everything else here runs on.

But “I have n8n running” proves nothing. I wanted to know if I actually understood it as an agent platform, and to answer a question I kept dodging: for agent work, do I need a cloud model, or is my local one good enough?

So I built a real agent and gave it two brains.

What I built

A chat assistant over brew-buddy, my homemade kombucha-tracking app (React + a small API + Postgres). You ask it things in plain language; it calls the app’s API and answers. The twist: the same question runs through two agents in parallel — one backed by NVIDIA’s hosted Llama-3.3-70B, one by a local Phi-3.5-mini on CPU — and the workflow prints both answers side by side.

Chat ──▶ Agent (cloud: NVIDIA 70B) ──┐   tools (shared):
     └─▶ Agent (local: Phi-3.5)   ──┤     • get_all_batches
                                    │     • get_batch_detail
                                    │     • brewing_statistics
            (Merge) ──▶ both replies, labeled     • add_batch_log   ⟵ write
                                                  • create_batch    ⟵ write

Both agents share the same read tools. The two write tools are wired to the cloud agent only — more on that below.

The nice part: I didn’t write a line of glue. n8n’s stock OpenAI Chat Model node talks to anything OpenAI-compatible if you override the credential’s Base URL — so one node points at https://integrate.api.nvidia.com/v1, the other at http://llama-server..svc:8080/v1 for the local server. Same node, two endpoints.

The infra that keeps it honest

I won’t re-explain the platform here — it’s in earlier posts: GitOps, Vault-backed secrets, default-deny networking, dual-path TLS ingress. But building the agent made one of them tangible.

n8n is, by design, a thing that makes arbitrary HTTP calls on a schedule. That’s exactly what you want behind a default-deny network policy. n8n couldn’t reach the brew-buddy API at all until I declared it — one line:

# n8n's namespace
allowEgressToNamespaces: [web-ai-engine, web-brew-buddy]
#                                          ^ added this for the agent

(plus a matching ingress-allow on brew-buddy’s side). That’s the posture working as intended: the blast radius of a workflow tool is whatever I’ve explicitly granted, and not one namespace more. Adding a capability is a reviewable one-liner in Git; Argo reconciles it. No kubectl, no guessing what n8n can reach.

The A/B: same agent, same tools, two brains

Plain “hi”. Cloud answers in ~0.5s. Local takes noticeably longer — because even for “hi”, the agent feeds the model the full system prompt plus the JSON schemas for every tool, and Phi-3.5 has to chew through all of it on CPU before it can say a word. So far, the boring expected result: local is slower.

Then I asked a real question, and the result flipped in a way I didn’t expect.

“What batches do I have?”

Cloud (70B) called get_all_batches, got the real rows, and answered:

You have two batches: 2026-04-09-A (cold-crash, 3L) and 2026-04-09-W (cold-crash, 3L).

Local (Phi-3.5) never called the tool. It didn’t seem to realise it had tools. Instead it confidently explained how I could go find the data myself:

To list all batches: 1. Access the brew-buddy app. 2. Look for a button labeled “List Batches”… def get_all_batches(): … … Remember, I’m unable to directly interact with apps or databases.

Fake instructions. Fake code. A polite apology. Everything except the actual answer it was sitting on top of.

Writing data. I asked both to log an observation. Cloud called add_batch_log and wrote a real row to Postgres (“I have recorded the observation…”). Local bluffed again — “here’s how you can log it yourself.”

Why it matters: capability, not latency

The interesting finding isn’t “the big model is better.” It’s how the small one fails.

With a ~3.8B model on CPU, the bottleneck for agent work isn’t speed — it’s capability. Phi-3.5 couldn’t reliably emit tool calls, so n8n’s tools never fired, and the model degraded into a chatbot that hallucinates a plausible answer instead of fetching the real one. That failure mode is worse than an error: an error you catch, a confident wrong answer you ship.

A couple of measurements that sharpened it:

NVIDIA 70B, plain chat: ~0.5s.
NVIDIA 70B, function-calling (with tool schemas): ~8.6s per round-trip — and an agent makes several round-trips per answer. That’s real latency you have to budget a timeout for. (It’s also why the cloud side initially timed out in n8n until I raised the model node’s timeout — the model was fine, n8n was cutting it off.)

So the snappy-vs-slow comparison flips depending on whether the question triggers tools. Plain chat: cloud wins on speed. Tool use: the local model is “fast” only because it skips the tools and makes something up. Speed was never the real axis.

The honest caveat: this is this small general model in a multi-tool agent loop. Purpose-built small models with tool-calling fine-tunes do better at narrow tasks — I run a 1.7B one elsewhere that emits a single structured tool call just fine. But for “pick the right tool from several and chain them,” 70B was in a different league.

The trust boundary

I gave the write tools (add_batch_log, create_batch) to the cloud agent only. The local agent is read-only — not by instruction, by wiring. Even if Phi-3.5 did decide to call a write tool, the connection isn’t there. The reliable model is the only one allowed to mutate real data, and that’s enforced structurally, not by trusting a prompt.

What’s toy and what’s real

Worth being straight: this is a single-node homelab. The agent and both model paths share one box. Running n8n on Kubernetes and swapping models isn’t novel — n8n’s own docs cover queue mode, where a main instance fans work out to a pool of worker pods you scale horizontally, with external Postgres for state. That’s the real production shape. Mine is one replica with an emptyDir’s worth of ambition.

What I think is worth sharing is the finding (the capability cliff, and that its failure mode is confident fabrication) and the boring thing underneath it: because the platform is default-deny and GitOps-reconciled, running this experiment cost me one reviewable egress line and zero risk to anything else.

The boring part is the point

The AI was the fun bit. But the reason I could bolt an agent onto a live cluster, point it at a real app, give it write access to one model and not the other, and tear it all down again — without worrying what it might touch — is that the infrastructure was already boring. Default-deny. Secrets out of Git. git push, Argo reconciles.

The model picks the tools. The platform decides what the tools can reach. Keep those two honest about each other and self-hosting an agent stops being scary and starts being just another app.

🕵️ Privacy-Preserving LLM Pipelines: Anonymize Before You Send

Fri, 12 Sep 2025 00:00:00 +0000

The problem with blocking

The PII guardrail proxy I built last week works by classifying prompts and blocking the sensitive ones. That’s fine for a chat interface where a human can rephrase. It doesn’t work for automated pipelines.

If a Jira ticket contains someone’s name and an internal hostname, you don’t want the agent to fail — you want it to process the ticket without exposing that data. Blocking is the wrong primitive for pipelines. Anonymization is the right one.

The pattern

Input text
  → anonymizer: extract PII, replace with semantic fakes
  → "Nathan Chen from DataSoft LLC needs ProjectX fixed on dev.internal.net"
  + mapping: {"Nathan Chen" → "John Smith", "DataSoft LLC" → "ACME", ...}
  → cloud LLM: processes coherent text, never sees real values
  → "Nathan Chen should check the ProjectX docs with the DataSoft LLC team"
  → string substitution with reverse mapping
  → "John Smith should check the OAuth docs with the ACME team"

Two things that make this work:

Deanonymization needs no LLM. Once you have the mapping, restoring is pure string substitution. The model call only happens on the way in.

Semantic fakes beat placeholder tokens. An earlier version of this used [PERSON_1], [ORG_1] tokens. The problem: cloud models see bracketed text and subtly change behaviour — shorter responses, hedging, dropped context. When the cloud model sees Nathan Chen from DataSoft LLC, it treats it as real text and responds naturally. Quality is noticeably better.

Prior art — what already exists

This is a well-established pattern. Worth knowing what’s out there:

LLM Guard (Protect AI) — the most complete open-source implementation. Anonymize + Deanonymize scanner pair with a Vault for the mapping. Production-grade, actively maintained. Start here if you’re building this for anything serious.

Microsoft PII Shield — session-based proxy. Returns a session ID with the anonymized text, uses it to deanonymize the response.

anonLLM — uses GLiNER (a proper NER model) + Faker for realistic replacements. Better accuracy than a general chat model.

REDACT — IEEE paper describing a system using Ollama for PII redaction in documents.

HuggingFace Anonymizer SLM series — purpose-built models (0.6B/1.7B/4B) fine-tuned specifically for anonymization. 9.20/10 quality score for 1.7B, close to GPT-4.1’s 9.77.

That last one is what this implementation actually uses.

The model: Anonymizer-1.7B

eternisai/Anonymizer-1.7B is a Qwen3-1.7B fine-tune trained on ~30k anonymization samples using GRPO with GPT-4.1 as judge. It outputs structured tool calls instead of free text:

{
  "name": "replace_entities",
  "arguments": {
    "replacements": [
      {"original": "John Smith", "replacement": "Nathan Chen"},
      {"original": "ACME Corp", "replacement": "DataSoft LLC"},
      {"original": "auth.acme.internal", "replacement": "dev.internal.net"}
    ]
  }
}

No prompt engineering needed. The model knows exactly what it’s doing and outputs a structured contract. Compare that to the first version of this service, which sent a long JSON-format prompt to Phi-3.5-mini and hoped the output parsed correctly.

The model runs via Ollama (which handles the Qwen3 chat template and tool calling natively), pointed at the GGUF version from HuggingFace: hf.co/gabriellarson/Anonymizer-1.7B-GGUF.

The implementation

llm-anonymizer is a FastAPI service with two endpoints.

POST /anonymize — calls Ollama with the tool definition, parses the response:

TOOLS = [{
    "type": "function",
    "function": {
        "name": "replace_entities",
        "description": "Replace PII entities with anonymized versions",
        "parameters": {
            "type": "object",
            "properties": {
                "replacements": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "original": {"type": "string"},
                            "replacement": {"type": "string"},
                        },
                        "required": ["original", "replacement"],
                    },
                }
            },
            "required": ["replacements"],
        },
    },
}]

resp = await client.post(f"{OLLAMA_BASE}/api/chat", json={
    "model": MODEL,
    "messages": [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": text + "\n/no_think"},  # skip Qwen3 thinking mode
    ],
    "tools": TOOLS,
    "stream": False,
})

tool_calls = resp.json()["message"]["tool_calls"]
replacements = tool_calls[0]["function"]["arguments"]["replacements"]

# Build reverse mapping: replacement → original (for deanonymization)
anonymized = text
mapping = {}
for pair in replacements:
    anonymized = anonymized.replace(pair["original"], pair["replacement"])
    mapping[pair["replacement"]] = pair["original"]

The /no_think suffix tells the model to skip its chain-of-thought — faster response, same accuracy for this task.

POST /deanonymize — no model call, just substitution:

for replacement, original in sorted(mapping.items(), key=lambda x: len(x[0]), reverse=True):
    text = text.replace(replacement, original)

Sorted by length descending so longer tokens don’t get partially overwritten by shorter ones.

The Kubernetes stack

Ollama runs as a separate deployment in the same namespace as everything else (web-ai-engine). Intra-namespace traffic is always allowed — no new network policies.

llm-anonymizer (FastAPI) → Ollama (port 11434) → Anonymizer-1.7B GGUF

One-time model pull after first deploy:

kubectl exec -n web-ai-engine deploy/ollama -- \
  ollama pull hf.co/gabriellarson/Anonymizer-1.7B-GGUF

Ollama caches it on a 10Gi PVC, so pod restarts don’t re-download.

The n8n pipeline

Five-node chain triggered by webhook:

Webhook → /anonymize → NVIDIA NIM → /deanonymize → Respond

The NVIDIA NIM call includes a system prompt instructing it to treat the text as normal input. No mention of tokens, no special handling — because the text looks like real text.

Wire any upstream source to the webhook: Jira event, Slack slash command, a scheduled job that processes internal docs. The pipeline is source-agnostic.

The caveats

1.7B isn’t GPT-4.1. The model scores 9.20/10 on the benchmark — which means roughly 1 in 10 cases has a missed or incorrect entity. Test with real examples from your domain before depending on it.

Deanonymization breaks on heavy rephrasing. If the cloud model restructures a sentence enough that the fake value no longer appears verbatim, the substitution silently misses it. The prompt helps but doesn’t eliminate the risk.

Ollama adds a deployment. It’s ~500MB image + the model weights (~1GB Q4). On a constrained single-node cluster that’s real overhead. llama-server already covers general chat; Ollama is purely for this model’s tool-calling support.

Source

github.com/janos-gyorgy/llm-anonymizer — MIT licensed, Kubernetes manifests and n8n workflow included.