Security on hippotion

Two Birds That Read the Web for Me: One Hoards, One Scatters

Fri, 12 Jun 2026 00:00:00 +0000

I have a vault of markdown notes that I treat as a second brain, and I run GitOps over it like it’s production infrastructure. It already has agents that work on it from the inside: a nightly gardener that weeds orphans and suggests links, and a Wanderer that collides random pairs of my own notes looking for connections I missed.

The obvious next move is to point an agent at the outside — let it read the web and tell me what matters. That move is also a small landmine, and most “AI reads the internet for you” tooling steps right on it. So this week I built two of them instead of one, named them after corvids, and the reason there are two is the entire point of this post.

Meet the Magpie and the Blue Jay.

The same fear, twice

Before either bird got a name, both inherited a single non-negotiable rule, and it’s worth saying plainly because it’s the part everyone skips:

An agent that reads the internet and writes to your notes is a prompt-injection pipeline aimed straight at your trust root.

My vault isn’t just storage. Every other agent — the gardener, the Wanderer, the search that answers “what am I building?” — reads it as trusted context. So the moment one agent ingests a GitHub README or a news headline (attacker- influenceable text) and is allowed to write a note, a stranger on the internet gets to whisper instructions into the thing my whole system believes. “Structured API” narrows that surface. It does not close it.

Both birds are built on the same chassis as the gardener, and that chassis enforces the fear rather than trusting the model to behave:

Two phases, hard split. A wrapper-owned FETCH step pulls the external text in plain Bash — Claude is not in the loop, can’t be talked into anything, because it isn’t running yet. Then a COLLIDE step starts claude -p with the fetched text handed in as inline data, and that process gets only Read / Glob / Grep / Write. No Bash, no git, no network, no MCP. While untrusted text is in the context window, the agent has no tool that can reach the outside world or rewrite history.
Allowlist, not the open web. Each bird reads a short, named list of sources. Nothing else.
Quarantine, not the vault. Findings land in quarantine//, which lives outside vault/. The indexer never sees it. Nothing it writes is ever auto-wikilinked into the graph. Promotion to a real note is a thing I do, by hand, after reading it.
Blast radius is checked, not assumed. A run may modify only its quarantine directory. Anything written anywhere else is discarded and reported as a violation.
“Nothing found” is a successful run. Neither bird has a quota. This is the honesty contract I stole from the Wanderer — an agent under pressure to produce N findings will manufacture N findings, and manufactured insight is worse than silence.

That’s the shared spine. Now the interesting part: given the same security model, the two birds do almost opposite things, and trying to make one bird do both jobs would have quietly ruined it.

The Magpie hoards what’s already shiny

A magpie collects shiny objects and keeps them close. Mine watches my own GitHub stars.

The premise is slow public signal × private context. I starred some repo three weeks ago, forgot about it, and moved on. Meanwhile my projects shifted. The Magpie runs weekly, pulls my starred repos through one allowlisted endpoint (gh api user/starred), and collides each one against what I’m actively building right now — the live projects, the open hubs.

Its output contract is a tight one: it is a relevance filter. It fires only when a star actually touches live work, and every finding has to name three concrete things — the repo, the project it connects to, and one “so what.” A vague “these are thematically related” doesn’t count as a hit. It’s a watchdog on the dials, not a newsletter.

The supervised proof run, over 28 stars, surfaced exactly two real hits and refused to invent a third:

supertonic (on-device multilingual TTS) × my Hungarian-audiobook voice-cloning project — a possible escape from a TTS fight I’d been losing. I checked: it genuinely supports Hungarian. That’s a hit with a so-what.
agentmemory × the exocortex itself — prior art for persistent AI memory, notably with benchmarks my own notes lacked. (And if you’ve read about the time I benchmarked my own search and it lost, you’ll know how much I needed that nudge.)

The other ~22 stars mapped to tidy thematic clusters and were correctly not reported. That restraint is the feature.

The Blue Jay scatters acorns and forgets where

Here’s the bird that explains why there are two.

Blue jays don’t hoard close like magpies. They cache acorns far and wide and forget where they buried some — and the forgotten ones grow into oak trees. Ecologists think blue jays are why oak forests spread north after the last ice age. Seed dispersal, by way of a bad memory. That is exactly the job I wanted for the second bird, and the metaphor was too good to pass up.

The Blue Jay reads an allowlist of eight RSS feeds, picked so tech and science cross-pollinate:

Tech: Hacker News (high-score front page), lobste.rs, Ars Technica
Science & ideas: phys.org, Quanta, Aeon, Nautilus
Wildcard: Medium — but scoped to specific tag feeds, never the raw firehose of crypto and self-help

Quanta, Aeon, and Nautilus are on that list on purpose: they’re the connective tissue, the feeds where “huh, that’s weirdly similar to…” happens before my vault even gets involved.

And its output contract is the opposite of the Magpie’s. The Blue Jay is a serendipity filter. Its job is to surface the connection that isn’t in my projects yet — the distant idea, the acorn worth burying. If I ran it through the Magpie’s “only fire on a live-work hit” rule, I would strangle the one thing it exists to do. Relevance and serendipity pull in opposite directions, and you can’t tune a single agent to maximize both.

One more load-bearing detail, half design and half security: the Blue Jay collides on the RSS summary only — title, abstract, link. It never pulls the full article body into context. That’s simultaneously the lower-injection path and the right cognitive shape (a headline is a seed; I click through myself from quarantine if the seed is interesting). The narrow input is doing double duty.

Why two birds and not one with a flag

I genuinely considered making this one agent with a --mode=relevance|serendipity switch. I’m glad I didn’t, and the reasoning generalizes past birds:

	Magpie	Blue Jay
Source	my GitHub stars (structured API)	8 RSS feeds (open prose)
Injection risk	low	the highest frontier
Fires when	a star hits live work	a summary sparks a distant idea
Output	relevance: repo → project → so-what	serendipity: the not-yet-relevant connection
Failure mode it guards against	noise / false relevance	being strangled into silence

Two things made the split non-negotiable. First, the output contracts are too different to share one brain — “only speak on a hit” and “speak about the thing that isn’t a hit yet” are contradictory prompts, and a single agent told to do both does neither well. Second, open news is a higher injection frontier than a structured stars API, so the riskier bird deserves its own enforced blast-radius wrapper, not a code path bolted onto the safe one. When two jobs disagree on both what good output is and how dangerous the input is, that’s not a flag. That’s two programs.

So now my vault has two more agents reading the world on a cron. The Magpie runs Saturday at 06:00 and tells me when something I bookmarked finally became relevant. The Blue Jay runs Saturday at 07:00 and buries acorns in a quarantine folder, most of which I’ll ignore — but I only need one of them to grow into an oak.

Both are on probation for their first few runs, because I don’t trust a thing that reads the internet until I’ve watched it behave. But the part I’m actually happy about isn’t the agents. It’s that building the second one forced me to say out loud what the first one was secretly assuming — and the names made the difference impossible to forget. A magpie hoards. A blue jay scatters. You want both, and you do not want them to be the same bird.

Is Anyone Knocking? A Security Pass on My Homelab

Fri, 22 May 2026 00:00:00 +0000

The question I actually had

It started as a nervous-Sunday kind of question: is a third party trying to get into my server — over SSH, or some other way? I run a single-node Kubernetes homelab that hosts a couple dozen little apps, some of them public. You read about credential-stuffing bots and you start to wonder who’s been rattling the handle while you slept.

So I did the audit. The good news came first, and it’s worth saying plainly because it’s the part most homelabs get wrong: the front door is solid. Nothing is reachable from the internet except through a Cloudflare Tunnel — an outbound-only connection, zero open inbound ports on my router. Almost every service sits behind OAuth. The cluster has 140 network policies doing real east-west segmentation. And the login history? Eleven straight weeks where every single shell login came from one IP — my own workstation on the LAN. No strangers. No 3 a.m. logins from a VPS in another hemisphere.

I could have stopped there feeling good. That would have been a mistake.

The scary finding wasn’t an attacker

The useful question turned out not to be “is someone knocking?” but “if someone got in, would anything tell me?” And when I traced that wire, it ended in the dark.

I have a full monitoring stack — Prometheus, Grafana, Alertmanager, the works. Alertmanager was running. It was also configured to notify exactly no one: no receivers, and upstream, no alert rules at all. It was a smoke detector with the battery taken out and, for good measure, no smoke sensor either. If an attacker had walked in, the alarm would have stayed perfectly, silently green.

That reframed the whole job. Three gaps, in priority order.

Gap 1 — an alarm with no one to call

I built the missing chain end to end. A small exporter on the host parses the SSH journal and fail2ban state and writes metrics into node_exporter’s textfile collector — so it rides the monitoring I already had instead of adding a new moving part. On top sit the alert rules that were never there. The one that matters most is blunt:

A shell login succeeded from a non-LAN IP.

That should be impossible in normal life, so if it ever fires, I want it shouting. It now emails me the instant it happens, alongside quieter alerts for brute-force spikes, distributed scans, fail2ban going down, and — the meta-alert I’m fondest of — the watchdog itself going stale, because a security monitor that silently dies is worse than none. And fail2ban now actually bans the bots, with escalating ban times and my LAN permanently on the allow-list.

The honest lesson: I’d been treating “I have Prometheus” as if it meant “I have monitoring.” Dashboards you have to remember to look at are not monitoring. Monitoring is the thing that interrupts you. Until an alert can reach your phone, you don’t have a security alarm — you have a security museum.

Gap 2 — there was a web terminal on the open internet

This is the one that made me wince. Among my public hostnames was ttyd — a browser-based shell. A full terminal on my server, reachable from anywhere, sitting behind a single OAuth proxy. One misconfiguration, one OAuth bypass, and that’s not “an app is compromised,” that’s root on the box from a browser tab.

The fix here isn’t more locks. It’s the realization that the strongest control is not exposing the thing at all. I deleted the web terminal entirely — app, manifests, dashboard tile, all of it. Then I went down the public hostname list and pulled everything with no business being public off the tunnel: the secrets UI, the ingress dashboard, Prometheus, Alertmanager, the network-observability console, the DNS admin. They still work — on my LAN, over the same wildcard cert — they’re just not the internet’s business anymore. A service that isn’t exposed has no attack surface to harden.

Gap 3 — no floor under the blast radius

The network policies limit how far a compromised pod can talk sideways. But nothing stopped a workload from running as root, mounting the host filesystem, or grabbing the host network in the first place. So I turned on Kubernetes' built-in Pod Security Admission: every namespace now at least reports baseline violations, and the clean app namespaces enforce baseline — meaning a compromised app there simply cannot request privileged mode or a hostPath mount. It’s a floor. Floors are underrated.

What the audit was really about

I went looking for an intruder and didn’t find one — the logs were clean, the front door held. What I found instead was that I’d built something secure at the perimeter and then never asked the uncomfortable follow-up: what happens after the perimeter? The answer had been “nothing happens, and no one is told,” and I just hadn’t looked.

Three principles I’m taking with me:

An alarm that can’t reach you is decoration. Wire the notification first; the rules are easy once something is listening.
Don’t expose it beats add more auth. Every hostname you take off the public internet is a class of attack you no longer have to be clever about.
Give the blast radius a floor. Assume one thing gets popped, and decide in advance how far it gets.

The best part: all of it is GitOps. The intrusion alerts, the un-exposing, the pod-security floor — every change is a commit, reviewable and revertible, and my cluster reconciles itself to match. The audit didn’t just make the homelab safer. It wrote down why it’s safer, in a form the next version of me can read.

Now if someone knocks, I’ll know. And the web terminal isn’t answering the door anymore — because it’s gone.

🚩 I Built a Usage Dashboard and Tripped Claude Fable 5's Safety Net

Fri, 24 Apr 2026 00:00:00 +0000

The thing I was actually building

I wanted a small web page on my homelab that shows my Claude usage — the 5-hour session window, the weekly limits, the per-model split. There’s a nice Electron widget out there that does this on the desktop, but I don’t want a desktop app; I want a URL behind my own OAuth that I can glance at from my phone.

The mechanics are unremarkable. The claude.ai web app reads those numbers from a couple of undocumented endpoints using your logged-in session cookie. So a self-hosted version does the same thing server-side: hold the session token as a secret, replay the same calls, cache the result, render some bars. An afternoon’s work. I was pairing with Claude Fable 5 on it — Anthropic’s newest model, and the one that ships with extra safety measures around dual-use capability.

Then, partway through, I got the message: Fable 5 flagged something in this session and switched to a more conservative model. It dropped me to Opus 4.8 for the rest of the conversation. Safe conversations sometimes trip it, the notice said. Send feedback.

I wasn’t doing anything wrong. That’s the interesting part.

My first reaction was the obvious one — what did I say? But I knew exactly what I’d built, and none of it was sketchy. It was my account, my usage data, my hardware, my OAuth in front of it.

So I went looking at the request the way a classifier would — not “what did he mean” but “what does this look like.” And from that angle it’s a different picture entirely. Stack up the surface features:

🔑 capturing a session token and storing it to replay later
🌐 sending it to an undocumented API that isn’t meant for third parties
🕵️ spoofing a browser User-Agent so the request blends in
🧱 detecting and working around a Cloudflare bot challenge

Read that list cold, with no context. That’s not a usage dashboard. That’s the exact signature of credential theft and scraping tooling. Every individual move is one a malicious script would also make. The only thing separating my afternoon project from the bad version is whose account it touches and why — and intent is precisely the part that doesn’t show up in the tokens.

Surface vs. intent

This is the part worth sitting with, because it’s not a Claude quirk — it’s the shape of every content classifier, every WAF rule, every fraud model I’ve ever run in production.

A detector scores what it can see. It cannot see intent; it sees features. And the features of “monitor my own usage” and “harvest someone else’s session” overlap almost completely, because the technique is identical — the difference lives entirely in context the model has been deliberately built not to over-trust. You can’t tune that gap away. You can only pick where to sit on the precision/recall curve, and Fable 5 — being the high-capability model with the extra dual-use measures bolted on — sits where it catches the pattern even when it costs some false positives, then hands off to Opus 4.8. I was the false positive. The system did roughly the right thing for roughly the right reason; it just doesn’t feel that way when it’s pointed at you.

The honest engineering takeaway is the one I keep relearning: if a benign task has the silhouette of an abusive one, expect to get treated like the silhouette. Not just by AI — by rate limiters, by bot detection, by the fraud team. The fix isn’t to be offended. It’s to recognize the silhouette, and where it matters, make the legitimate context legible up front.

What I’d do differently

Practically, very little — the project was fine, and it downshifted to a model that finished the job. But the framing changed how I built it. I leaned harder into the parts that make intent visible in the design: the session token never leaves the server, it lives in Vault and arrives as an injected secret, the whole thing sits behind OAuth, and it polls on a leash instead of hammering. Not because a classifier made me, but because those are the same choices that make it obviously a personal dashboard and not a harvesting bot — to a reviewer, to future-me, and yes, to a model reading over my shoulder.

The widget rides your credential on your desktop. Mine keeps it server-side behind my own front door. Turns out building it the trustworthy way and building it the legibly trustworthy way are the same work — and getting flagged is what made me notice the difference.

🔒 Building a PII Guardrail Proxy for Cloud LLM Calls

Fri, 26 Sep 2025 00:00:00 +0000

The problem with cloud LLM access

Running a local model is great for privacy. But local models hit a ceiling — for the heavy lifting, you want a cloud API like NVIDIA NIM with Llama 3.3 70B.

The moment you open that channel, you have a new risk: what if someone (or some automation) accidentally pastes a password, a private key, or someone’s personal data into the chat? It leaves the cluster. It’s logged somewhere you don’t control.

The standard answer is “train your users.” I’d rather have a technical control.

The architecture

Open WebUI → ai-guard proxy
                 │
        ┌────────┴────────┐
        │                 │
  llama-server       if SAFE:
  (classify)         forward to NVIDIA NIM
        │
   if SENSITIVE:
   block + explain

Every request to NVIDIA NIM goes through ai-guard first. ai-guard pulls the user message, sends it to the local llama.cpp server with a classification prompt, and makes a binary decision:

SAFE → forward to NVIDIA NIM with the real API key (which ai-guard holds, not the client)
SENSITIVE: → return HTTP 400, log the block, nothing leaves the cluster

The local model is already running for inference — this reuses it as a privacy gatekeeper at zero extra infrastructure cost.

The implementation

The proxy is ~150 lines of FastAPI. The classifier call:

CLASSIFIER_PROMPT = """You are a data security classifier. Check if the text below contains sensitive information:
passwords, API keys, tokens, credentials, personal identifiable information (names, emails, phone numbers, SSNs, addresses), financial data (card numbers, bank accounts), or private keys.

Reply with ONLY one of:
SAFE
SENSITIVE: 

Text to check:
"""

async def classify(text: str) -> tuple[bool, str]:
    async with httpx.AsyncClient(timeout=60) as client:
        resp = await client.post(
            f"{LLAMA_BASE}/chat/completions",
            json={
                "model": "phi-3.5-mini",
                "messages": [{"role": "user", "content": CLASSIFIER_PROMPT + text[:3000]}],
                "max_tokens": 30,
                "temperature": 0,
                "stream": False,
            },
            headers={"Authorization": "Bearer sk-no-key"},
        )
    answer = resp.json()["choices"][0]["message"]["content"].strip()
    if answer.upper().startswith("SENSITIVE"):
        reason = answer.split(":", 1)[1].strip() if ":" in answer else "sensitive content detected"
        return True, reason
    return False, ""

temperature=0 and max_tokens=30 keep the response deterministic and fast. The model only needs to output one word or one line.

The main handler:

@app.post("/v1/chat/completions")
async def proxy_chat(request: Request):
    body = await request.json()
    user_text = extract_user_text(body.get("messages", []))

    if user_text.strip():
        try:
            is_sensitive, reason = await classify(user_text)
        except Exception as exc:
            log.error("classifier error: %s — allowing request through", exc)
            is_sensitive = False

        if is_sensitive:
            return JSONResponse(status_code=400, content={
                "error": {
                    "message": f"Request blocked by ai-guard: {reason}. Remove sensitive content before sending to external models.",
                    "type": "content_policy_violation",
                }
            })

    # Safe — forward to upstream with streaming support
    ...

Fail-open: if the classifier itself errors (llama-server down, timeout), the request goes through and the error is logged. Fail-closed would be safer for high-stakes environments, but this is a homelab and I’d rather not block all cloud LLM access because the local model is warming up.

Kubernetes deployment

ai-guard runs in the same namespace as llama-server and Open WebUI (web-ai-engine). Intra-namespace traffic is always allowed in Cilium, so no new network policy needed.

Open WebUI uses semicolon-separated lists for multiple API backends:

- name: OPENAI_API_BASE_URLS
  value: "http://llama-server.web-ai-engine.svc:8080/v1;http://ai-guard.web-ai-engine.svc:8080/v1"
- name: OPENAI_API_KEYS
  value: "sk-no-key;sk-no-key"

The second entry is ai-guard. Open WebUI passes sk-no-key as the API key — ai-guard ignores it and uses its own UPSTREAM_API_KEY from a Kubernetes Secret (pulled from Vault via External Secrets Operator). The real NVIDIA API key never touches the client.

The latency tradeoff

The classification step adds 5–15 seconds on CPU inference. That’s the cost of keeping the check fully private — the classifier never sends data anywhere.

For a personal homelab assistant, this is fine. For a high-throughput production setup, you’d want the classifier on a GPU or a dedicated smaller model purpose-built for classification.

What it catches

The classifier prompt targets:

Passwords, API keys, tokens, credentials
PII: names, emails, phone numbers, SSNs, addresses
Financial data: card numbers, bank accounts
Private keys

False negatives are possible — no classifier is perfect. This is a first line of defense, not a compliance control. The value is catching the obvious, accidental leaks.

Source

github.com/janos-gyorgy/ai-guard — MIT licensed, Kubernetes manifests included.

🕵️ Privacy-Preserving LLM Pipelines: Anonymize Before You Send

Fri, 12 Sep 2025 00:00:00 +0000

The problem with blocking

The PII guardrail proxy I built last week works by classifying prompts and blocking the sensitive ones. That’s fine for a chat interface where a human can rephrase. It doesn’t work for automated pipelines.

If a Jira ticket contains someone’s name and an internal hostname, you don’t want the agent to fail — you want it to process the ticket without exposing that data. Blocking is the wrong primitive for pipelines. Anonymization is the right one.

The pattern

Input text
  → anonymizer: extract PII, replace with semantic fakes
  → "Nathan Chen from DataSoft LLC needs ProjectX fixed on dev.internal.net"
  + mapping: {"Nathan Chen" → "John Smith", "DataSoft LLC" → "ACME", ...}
  → cloud LLM: processes coherent text, never sees real values
  → "Nathan Chen should check the ProjectX docs with the DataSoft LLC team"
  → string substitution with reverse mapping
  → "John Smith should check the OAuth docs with the ACME team"

Two things that make this work:

Deanonymization needs no LLM. Once you have the mapping, restoring is pure string substitution. The model call only happens on the way in.

Semantic fakes beat placeholder tokens. An earlier version of this used [PERSON_1], [ORG_1] tokens. The problem: cloud models see bracketed text and subtly change behaviour — shorter responses, hedging, dropped context. When the cloud model sees Nathan Chen from DataSoft LLC, it treats it as real text and responds naturally. Quality is noticeably better.

Prior art — what already exists

This is a well-established pattern. Worth knowing what’s out there:

LLM Guard (Protect AI) — the most complete open-source implementation. Anonymize + Deanonymize scanner pair with a Vault for the mapping. Production-grade, actively maintained. Start here if you’re building this for anything serious.

Microsoft PII Shield — session-based proxy. Returns a session ID with the anonymized text, uses it to deanonymize the response.

anonLLM — uses GLiNER (a proper NER model) + Faker for realistic replacements. Better accuracy than a general chat model.

REDACT — IEEE paper describing a system using Ollama for PII redaction in documents.

HuggingFace Anonymizer SLM series — purpose-built models (0.6B/1.7B/4B) fine-tuned specifically for anonymization. 9.20/10 quality score for 1.7B, close to GPT-4.1’s 9.77.

That last one is what this implementation actually uses.

The model: Anonymizer-1.7B

eternisai/Anonymizer-1.7B is a Qwen3-1.7B fine-tune trained on ~30k anonymization samples using GRPO with GPT-4.1 as judge. It outputs structured tool calls instead of free text:

{
  "name": "replace_entities",
  "arguments": {
    "replacements": [
      {"original": "John Smith", "replacement": "Nathan Chen"},
      {"original": "ACME Corp", "replacement": "DataSoft LLC"},
      {"original": "auth.acme.internal", "replacement": "dev.internal.net"}
    ]
  }
}

No prompt engineering needed. The model knows exactly what it’s doing and outputs a structured contract. Compare that to the first version of this service, which sent a long JSON-format prompt to Phi-3.5-mini and hoped the output parsed correctly.

The model runs via Ollama (which handles the Qwen3 chat template and tool calling natively), pointed at the GGUF version from HuggingFace: hf.co/gabriellarson/Anonymizer-1.7B-GGUF.

The implementation

llm-anonymizer is a FastAPI service with two endpoints.

POST /anonymize — calls Ollama with the tool definition, parses the response:

TOOLS = [{
    "type": "function",
    "function": {
        "name": "replace_entities",
        "description": "Replace PII entities with anonymized versions",
        "parameters": {
            "type": "object",
            "properties": {
                "replacements": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "original": {"type": "string"},
                            "replacement": {"type": "string"},
                        },
                        "required": ["original", "replacement"],
                    },
                }
            },
            "required": ["replacements"],
        },
    },
}]

resp = await client.post(f"{OLLAMA_BASE}/api/chat", json={
    "model": MODEL,
    "messages": [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": text + "\n/no_think"},  # skip Qwen3 thinking mode
    ],
    "tools": TOOLS,
    "stream": False,
})

tool_calls = resp.json()["message"]["tool_calls"]
replacements = tool_calls[0]["function"]["arguments"]["replacements"]

# Build reverse mapping: replacement → original (for deanonymization)
anonymized = text
mapping = {}
for pair in replacements:
    anonymized = anonymized.replace(pair["original"], pair["replacement"])
    mapping[pair["replacement"]] = pair["original"]

The /no_think suffix tells the model to skip its chain-of-thought — faster response, same accuracy for this task.

POST /deanonymize — no model call, just substitution:

for replacement, original in sorted(mapping.items(), key=lambda x: len(x[0]), reverse=True):
    text = text.replace(replacement, original)

Sorted by length descending so longer tokens don’t get partially overwritten by shorter ones.

The Kubernetes stack

Ollama runs as a separate deployment in the same namespace as everything else (web-ai-engine). Intra-namespace traffic is always allowed — no new network policies.

llm-anonymizer (FastAPI) → Ollama (port 11434) → Anonymizer-1.7B GGUF

One-time model pull after first deploy:

kubectl exec -n web-ai-engine deploy/ollama -- \
  ollama pull hf.co/gabriellarson/Anonymizer-1.7B-GGUF

Ollama caches it on a 10Gi PVC, so pod restarts don’t re-download.

The n8n pipeline

Five-node chain triggered by webhook:

Webhook → /anonymize → NVIDIA NIM → /deanonymize → Respond

The NVIDIA NIM call includes a system prompt instructing it to treat the text as normal input. No mention of tokens, no special handling — because the text looks like real text.

Wire any upstream source to the webhook: Jira event, Slack slash command, a scheduled job that processes internal docs. The pipeline is source-agnostic.

The caveats

1.7B isn’t GPT-4.1. The model scores 9.20/10 on the benchmark — which means roughly 1 in 10 cases has a missed or incorrect entity. Test with real examples from your domain before depending on it.

Deanonymization breaks on heavy rephrasing. If the cloud model restructures a sentence enough that the fake value no longer appears verbatim, the substitution silently misses it. The prompt helps but doesn’t eliminate the risk.

Ollama adds a deployment. It’s ~500MB image + the model weights (~1GB Q4). On a constrained single-node cluster that’s real overhead. llama-server already covers general chat; Ollama is purely for this model’s tool-calling support.

Source

github.com/janos-gyorgy/llm-anonymizer — MIT licensed, Kubernetes manifests and n8n workflow included.

🔄 Someone kubectl apply'd a Hotfix Directly. How Do You Detect and Prevent It?

Fri, 06 Jun 2025 00:00:00 +0000

The question

“How do you prevent configuration drift in a Kubernetes cluster?”

Configuration drift: the cluster’s actual state diverges from what’s declared in your source of truth. Someone runs kubectl edit deployment myapp to bump a memory limit during an incident. Someone adds a debug sidecar directly. Someone applies a YAML file from their laptop that was never committed to Git. The fix works. It goes undocumented. Six months later, a new deployment overwrites it. The incident recurs.

There are two distinct problems here that require different solutions:

Detection and remediation: how do you notice drift and revert it?
Prevention: how do you stop non-compliant resources from being created in the first place?

Detection and remediation: Argo CD selfHeal

If you’re using GitOps with Argo CD, detection and remediation are handled for you:

syncPolicy:
  automated:
    prune: true
    selfHeal: true

selfHeal: true means Argo CD continuously compares the cluster state to the Git repo and reverts any divergence. Someone runs kubectl edit deployment myapp and changes the replica count? Argo CD detects the diff on its next reconciliation cycle (default: every 3 minutes) and reverts it.

prune: true means resources that exist in the cluster but not in Git are deleted. Someone kubectl apply’d a debug pod directly? Gone on the next sync.

This is the audit trail story too. Every legitimate change is a Git commit with an author, a timestamp, and a commit message. Everything that isn’t in Git doesn’t survive past the next reconciliation. If you want to know what changed and when, git log is the answer.

The gap selfHeal doesn’t close

selfHeal reverts drift after the fact. There’s a window — up to 3 minutes — where a drifted resource is serving traffic. For most changes, that’s fine. For a bad resource (wrong RBAC, missing network policy, container running as root), 3 minutes is enough to be a problem.

The other gap: selfHeal doesn’t tell you who made the change or generate an alert. It just silently fixes it. You need audit logging (kube-apiserver --audit-log-path) or an alerting rule on Argo CD’s health events to know that drift happened.

Prevention: Kyverno

Kyverno is a policy engine that runs as a Kubernetes admission webhook. Every resource creation or modification goes through it before being persisted. If the resource violates a policy, Kyverno can reject it outright (enforce mode) or allow it with a warning (audit mode).

The policies are Kubernetes resources themselves — they live in Git, they’re applied via GitOps, they’re versioned. No separate policy language to learn.

A policy that requires readiness probes on all Deployments:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-readiness-probe
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-readiness-probe
      match:
        any:
          - resources:
              kinds:
                - Deployment
      validate:
        message: "Deployments must define a readiness probe."
        pattern:
          spec:
            template:
              spec:
                containers:
                  - (name): "*"
                    readinessProbe:
                      (httpGet | tcpSocket | exec): "*"

With this policy active: kubectl apply -f deployment-without-probe.yaml is rejected at the API server. The error message is the one you defined in message. The deployment never reaches etcd.

A policy that blocks containers running as root:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-root-containers
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-runAsNonRoot
      match:
        any:
          - resources:
              kinds: [Deployment, StatefulSet, DaemonSet]
      validate:
        message: "Containers must not run as root."
        pattern:
          spec:
            template:
              spec:
                containers:
                  - (name): "*"
                    securityContext:
                      runAsNonRoot: true

A policy that enforces resource limits (common in multi-tenant clusters):

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-limits
      match:
        any:
          - resources:
              kinds: [Deployment]
      validate:
        message: "CPU and memory limits are required."
        pattern:
          spec:
            template:
              spec:
                containers:
                  - resources:
                      limits:
                        memory: "?*"
                        cpu: "?*"

Kyverno can also mutate and generate

Policies aren’t only for validation. Kyverno can mutate incoming resources (add default labels, inject sidecars, set default resource requests) and generate new resources in response to events (create a NetworkPolicy whenever a new namespace is created).

Auto-add a standard label to every Deployment:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-labels
spec:
  rules:
    - name: add-team-label
      match:
        any:
          - resources:
              kinds: [Deployment]
      mutate:
        patchStrategicMerge:
          metadata:
            labels:
              managed-by: kyverno

Auto-create a default NetworkPolicy when a namespace is created:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-default-networkpolicy
spec:
  rules:
    - name: default-deny
      match:
        any:
          - resources:
              kinds: [Namespace]
      generate:
        kind: NetworkPolicy
        name: default-deny-all
        namespace: "{{request.object.metadata.name}}"
        data:
          spec:
            podSelector: {}
            policyTypes:
              - Ingress
              - Egress

The complete drift prevention picture

Developer runs: kubectl apply -f bad-deployment.yaml
  → API server receives request
  → Kyverno admission webhook intercepts
  → Policy check: no readiness probe → Rejected
  → API server returns 403 with Kyverno's message
  → Resource never reaches etcd

Developer runs: kubectl edit deployment myapp (valid change, just not via Git)
  → Edit succeeds (no policy violation)
  → Argo CD reconciliation fires (within 3 minutes)
  → Diff detected: cluster state ≠ Git state
  → selfHeal: revert to Git state
  → If audit logging enabled: event recorded with username and timestamp

Git is the audit trail for what should be there. kube-apiserver audit logs are the trail for what was attempted. Kyverno is the enforcer at admission time. Argo CD is the continuous reconciler. Four layers, each with a different job.

What interviewers are actually testing

The follow-up is usually: “What’s the difference between Kyverno and OPA Gatekeeper?”

Both are admission webhook policy engines. The practical differences:

Kyverno: policies are k8s-native YAML, no separate language to learn. Generate and mutate policies built in. Easier to get started with.
OPA Gatekeeper: policies are written in Rego, a purpose-built policy language that’s more expressive but has a steeper learning curve. Better if you’re already using OPA elsewhere (Terraform, microservice authorization).

For a Kubernetes-only environment, Kyverno is the pragmatic choice. For a platform team that uses OPA across the stack, Gatekeeper gives you policy consistency.

The deeper follow-up: “How do you test policies before enforcing them?” Use Audit mode first (validationFailureAction: Audit). Violations are logged as PolicyReport objects but requests aren’t rejected. Review the reports, fix the existing violations, then switch to Enforce. Never flip directly to Enforce in production — you’ll break things that were already running.

This is part of a series on Kubernetes interview questions. Previously: network isolation between services.

🛡️ How Do You Prevent a Compromised Pod From Calling Your Database?

Fri, 23 May 2025 00:00:00 +0000

The question

“How do you enforce network isolation between services in a Kubernetes cluster?”

The default Kubernetes network model is flat. Every pod can reach every other pod, in any namespace, on any port. There are no firewalls, no ACLs, no segmentation. A compromised frontend pod can connect directly to your PostgreSQL port, your Redis port, your internal admin API, and every other service in the cluster.

This is intentional — Kubernetes doesn’t assume you want isolation, because not everyone does. But if you do want it, you need to add it.

NetworkPolicy: the primitive

A NetworkPolicy is a Kubernetes resource that selects a set of pods and defines what traffic is allowed to reach them (ingress) and what traffic they’re allowed to send (egress). Traffic that isn’t explicitly allowed is dropped.

The catch: NetworkPolicy resources have no effect unless your CNI plugin supports them. The default k3s CNI (Flannel) does not. Calico, Cilium, and Canal do. If you’re running Flannel and you apply a NetworkPolicy, it will be silently ignored — no error, no warning.

The default-deny pattern

The correct starting point is a default-deny policy that blocks everything, applied to the namespace. You then add explicit allow policies for the traffic you actually need.

# Block all ingress and egress in this namespace by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: myapp
spec:
  podSelector: {}        # matches all pods in the namespace
  policyTypes:
    - Ingress
    - Egress

With this in place, your pods can’t receive traffic and can’t send traffic. You then add back what you need.

Allowing specific traffic

Allow the web frontend to receive traffic from the ingress controller:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-from-traefik
  namespace: myapp
spec:
  podSelector:
    matchLabels:
      app: frontend
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: sys-traefik

Allow the backend to talk to PostgreSQL:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-to-postgres
  namespace: myapp
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
    - Egress
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - port: 5432
          protocol: TCP

After these two policies: the frontend receives traffic from Traefik, and the backend can reach Postgres. The frontend cannot reach Postgres. The backend cannot receive traffic from the ingress controller. Neither can call anything else.

The DNS gotcha

Once you add a default-deny egress policy, DNS stops working. Your pods can no longer resolve service names because they can’t reach kube-dns in the kube-system namespace.

You need to explicitly allow it:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-dns
  namespace: myapp
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP

Missing this is the most common reason “everything broke after I added NetworkPolicies”. Add it to every namespace that has a default-deny policy.

Cilium: the same model with more power

Cilium implements the standard NetworkPolicy API and adds its own CiliumNetworkPolicy CRD with L7 capabilities.

Standard NetworkPolicy works at L3/L4 — IP addresses and ports. Cilium’s CRD adds:

L7 HTTP filtering: allow specific HTTP methods and paths, not just port 8080.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-api-reads
  namespace: myapp
spec:
  endpointSelector:
    matchLabels:
      app: api
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: "GET"
                path: "/api/v1/.*"

DNS-based egress: allow egress to github.com by hostname rather than IP address. This matters for external services with dynamic IPs.

egress:
  - toFQDNs:
      - matchName: "github.com"
    toPorts:
      - ports:
          - port: "443"
            protocol: TCP

Identity-based policies: Cilium assigns a cryptographic identity to each pod based on its labels. Policies are enforced by identity, not IP address. Pod restarts (which change IPs) don’t break policy enforcement.

What a real namespace policy set looks like

For a typical web app with frontend, backend, and database:

Namespace: myapp
├── default-deny-all (ingress + egress, all pods)
├── allow-egress-dns (egress, all pods, port 53)
├── allow-ingress-frontend (ingress frontend, from sys-traefik namespace)
├── allow-egress-frontend-to-backend (egress frontend, to backend:8080)
├── allow-ingress-backend (ingress backend, from frontend)
├── allow-egress-backend-to-postgres (egress backend, to postgres:5432)
└── allow-ingress-postgres (ingress postgres, from backend)

Eight policies. The database has exactly one inbound path: from the backend. The frontend has no path to the database at all. A compromised frontend pod cannot scan the internal network — egress to arbitrary destinations is blocked.

What interviewers are actually testing

The follow-up is usually: “How do you manage this at scale? Writing NetworkPolicies for every namespace by hand doesn’t scale.”

The answer: you don’t write them by hand. You template them. In a GitOps setup, your namespace configuration declares what network access the service needs in a structured form, and a Helm chart or operator generates the actual NetworkPolicy resources from those declarations.

For example, an applications.yml entry might look like:

networkPolicies:
  denyAll: true
  allowIngressFromIngress: true
  allowEgressToNamespaces: ["sys-postgres"]

And a Helm chart translates that into four concrete NetworkPolicy objects. The developer declares intent; the platform enforces it. No one writes raw YAML for each namespace.

The second follow-up: “What about east-west traffic between services in the same namespace?” Add allowIntraNamespace: true as a flag that generates a policy allowing all pod-to-pod traffic within the namespace, while still blocking cross-namespace traffic.

This is part of a series on Kubernetes interview questions. Previously: zero-downtime deployments. Next: preventing configuration drift.

🔑 Deploy to Kubernetes Without Storing Any Cluster Credentials in CI

Fri, 09 May 2025 00:00:00 +0000

The question

“How would you design a CI/CD pipeline that deploys to Kubernetes without storing any cluster credentials anywhere?”

The expected wrong answer: export your kubeconfig, base64-encode it, paste it into a CI secret named KUBE_CONFIG, and call it a day. This works. Most clusters that got hacked had this setup.

There are two correct answers in 2026, and which one you reach for depends on what you’re actually deploying.

Answer 1: GitOps (the one your interviewer probably wants)

In a GitOps setup, your CI pipeline never touches the cluster. It can’t leak credentials it doesn’t have.

The flow:

Developer pushes code
  → CI builds and tests
  → CI updates the image tag in the Git repo (a commit, not a kubectl command)
  → Argo CD detects the change
  → Argo CD applies it to the cluster

The cluster reaches out to Git. CI never reaches into the cluster. The only thing with cluster credentials is Argo CD itself — running inside the cluster, with no credentials to leak externally.

For self-hosted setups on Hetzner or Vultr, this is particularly clean because there’s no cloud IAM to configure. You point Argo CD at your GitLab repo, tell it which branch to watch, and you’re done.

# The Argo CD Application CRD — the only thing you need
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  source:
    repoURL: https://gitlab.example.com/myorg/myapp
    targetRevision: main
    path: helm-charts/myapp
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

selfHeal: true means if someone manually kubectl applys something, Argo CD reverts it. The Git repo is the only source of truth.

The CI image-tag update step looks like this:

# .gitlab-ci.yml
deploy:
  stage: deploy
  script:
    - |
      # Update the image tag in values.yaml and push
      sed -i "s/tag: .*/tag: ${CI_COMMIT_SHORT_SHA}/" values/myapp.yml
      git config user.email "ci@example.com"
      git config user.name "CI"
      git add values/myapp.yml
      git commit -m "chore: bump myapp to ${CI_COMMIT_SHORT_SHA}"
      git push

CI needs write access to the Git repo — but that’s a deploy key, not a cluster credential. If it leaks, someone can push code. You’d rotate the deploy key and audit the commits. If a cluster credential leaks, someone owns your cluster.

Answer 2: OIDC federation (for when you genuinely need push-based)

Some operations don’t fit the GitOps model. Infrastructure provisioning (terraform apply), one-off database migrations, or initial cluster bootstrapping — these need direct cluster access. The correct pattern here is OIDC federation.

The idea: your CI platform (GitLab, GitHub Actions) already issues JWT tokens to every job. These JWTs are signed by the CI platform and contain claims like which repo, which branch, which pipeline triggered the job. You configure your Kubernetes API server to trust those JWTs, and the CI job authenticates directly using the token it already has.

No stored credentials. Every job gets a fresh token. The token expires when the job ends.

For a self-hosted GitLab, configure your k8s API server to trust GitLab as an OIDC issuer:

# /etc/rancher/k3s/config.yaml (or kube-apiserver flags)
kube-apiserver-arg:
  - "oidc-issuer-url=https://gitlab.example.com"
  - "oidc-client-id=your_client_id"
  - "oidc-username-claim=sub"
  - "oidc-groups-claim=groups_direct"

Then create a ClusterRoleBinding that maps a specific GitLab identity to a Kubernetes role:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: gitlab-ci-deployer
subjects:
  - kind: User
    name: "project_path:myorg/myapp:ref_type:branch:ref:main"
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: deploy-role
  apiGroup: rbac.authorization.k8s.io

The subject name is the sub claim from the GitLab JWT — it encodes the repo path and branch. Only jobs running on main in myorg/myapp get this binding. A job on a feature branch gets nothing.

In the CI job:

deploy:
  stage: deploy
  id_tokens:
    K8S_TOKEN:
      aud: your_client_id
  script:
    - |
      kubectl config set-credentials gitlab-ci \
        --token="${K8S_TOKEN}"
      kubectl config set-context deploy \
        --cluster=mycluster \
        --user=gitlab-ci
      kubectl config use-context deploy
      kubectl rollout restart deployment/myapp -n myapp

The token in K8S_TOKEN is injected by GitLab. It expires with the job. The API server validates the signature against GitLab’s JWKS endpoint on every request.

Which one to use

	GitOps	OIDC federation
CI needs cluster access	No	Yes (short-lived token)
Audit trail	Git history	kube-apiserver audit log
Revocability	Revert the commit	Token expires with the job
Self-hosted setup effort	Low	Moderate (OIDC config)
Works for infra provisioning	Not really	Yes

For application deployments: GitOps. The cluster reconciles continuously, drift is impossible, and CI is completely decoupled from cluster state.

For infrastructure provisioning or one-off operations: OIDC federation. Short-lived credentials, branch-scoped permissions, nothing to rotate.

What you should never do: store a kubeconfig or a long-lived ServiceAccount token in CI secrets. Not because it’s hard to make work — it’s easy — but because the blast radius of a leak is unbounded, there’s no audit trail, and there’s no expiry. Everything that goes wrong with static secrets goes wrong eventually.

This is part of a series on Kubernetes interview questions. Next: how to handle secrets in a GitOps repository.

🤫 How Do You Handle Secrets in a GitOps Repository?

Fri, 25 Apr 2025 00:00:00 +0000

The question

“You’re using GitOps — everything goes through Git. How do you handle secrets?”

The wrong answer: base64-encode them and commit them as Kubernetes Secret objects. Base64 is not encryption. Anyone with read access to the repo has your secrets. If the repo is public, everyone does.

The slightly better wrong answer: use a private repo and just not think about it. This works until a deploy key leaks, someone joins and then leaves the company, or you need to rotate one secret and have to find every place it’s referenced.

There are three real answers. They make different tradeoffs.

The constraint

The constraint is actually tighter than “don’t commit secrets”. It’s: your Git repo should be safe to make public at any point, and secrets must be rotatable without touching Git.

If rotating a password requires a new commit, someone has to be awake to merge and deploy it. That’s not how you want to handle a 3am incident.

Option 1: External Secrets Operator + Vault

This is the most robust pattern and the one worth knowing for interviews.

The idea: secrets live in a dedicated secret store (HashiCorp Vault, or a cloud equivalent). A Kubernetes operator called ESO watches ExternalSecret CRD objects in the cluster and syncs the referenced secret into a real Kubernetes Secret. The CRD is safe to commit — it says where the secret lives, not what it is.

# This lives in Git — safe to commit
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: myapp-db-credentials
  namespace: myapp
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault
    kind: ClusterSecretStore
  target:
    name: myapp-db-credentials   # the k8s Secret it creates
  data:
    - secretKey: DB_PASSWORD
      remoteRef:
        key: secret/myapp
        property: db-password

Rotation: you update the secret in Vault. ESO syncs it to the cluster within refreshInterval. No Git commit, no deployment. The pod reads the updated Secret on the next restart (or immediately if you mount it as an env var and the app handles SIGHUP).

Audit trail: Vault logs every read and write. You know exactly which service account read which secret at what time.

The cost: you’re running Vault. For a homelab or small team, that’s an extra thing to operate. For production, it’s worth it.

Self-hosted setup:

# ClusterSecretStore — connects ESO to your Vault instance
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
  name: vault
spec:
  provider:
    vault:
      server: "http://sys-vault.sys-vault.svc.cluster.local:8200"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "external-secrets"

ESO authenticates to Vault using the pod’s Kubernetes ServiceAccount token. Vault validates it against the cluster’s token review endpoint. No static credentials anywhere.

Option 2: Sealed Secrets

Sealed Secrets uses asymmetric encryption. The cluster holds a private key. You use the kubeseal CLI to encrypt a secret with the cluster’s public key. The resulting SealedSecret object is safe to commit — only the cluster can decrypt it.

# Encrypt a secret for committing to Git
kubectl create secret generic myapp-db \
  --from-literal=DB_PASSWORD=hunter2 \
  --dry-run=client -o yaml \
  | kubeseal \
  > sealed-secrets/myapp-db.yaml

The resulting YAML looks like:

apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: myapp-db
  namespace: myapp
spec:
  encryptedData:
    DB_PASSWORD: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEq...

This gets committed. The Sealed Secrets controller in the cluster decrypts it and creates the real Secret automatically.

The tradeoff: rotation means re-sealing. You need the cluster’s public key (which is public) and access to the plaintext secret. You commit a new SealedSecret. That’s a Git commit, which means a review, a merge, and a deploy. For a 3am incident, that’s a lot of friction.

Also: if the cluster’s private key is lost, you can’t decrypt any of your sealed secrets. Back up the private key.

Good fit for: small teams, homelab, situations where secrets change rarely and the GitOps review process is actually desirable.

Option 3: SOPS

SOPS (Secrets OPerationS) encrypts files at rest using age keys or cloud KMS. You commit encrypted files. CI decrypts them during deployment using a key it holds in memory (not stored in Git).

# Encrypt a file for Git
sops --encrypt --age age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8q \
  secrets/myapp.yaml > secrets/myapp.enc.yaml

# In CI: decrypt to temp file, apply, delete
sops --decrypt secrets/myapp.enc.yaml | kubectl apply -f -

The difference from Sealed Secrets: SOPS encrypts at the file level, not the k8s object level. You can use it outside of Kubernetes (application configs, Terraform variables). The key can live in the CI environment, a cloud KMS, or a personal age key.

The tradeoff: CI needs the decryption key, which puts you back in “secret in CI” territory — just for the encryption key rather than the actual secrets. If you use a cloud KMS, OIDC federation handles that (no stored key). If you use an age key, it lives in CI secrets.

Good fit for: teams already using Helm and Helm Secrets, polyglot environments where not everything is Kubernetes, small teams where Vault feels like overengineering.

Comparison

	ESO + Vault	Sealed Secrets	SOPS
Rotation without Git commit	Yes	No	Depends
Audit trail	Full (Vault)	None	Depends on KMS
Complexity	High	Low	Medium
Works outside k8s	With effort	No	Yes
Recovery if key lost	Vault backup	Lose all secrets	Key backup
CI needs secret material	No	No	Yes (decrypt key)

What interviewers are actually testing

The interesting follow-up question is: “How do you rotate a secret without downtime?”

The answer requires you to understand that pods mount Secret objects at startup. Updating the Secret in Kubernetes doesn’t automatically restart the pod. Your options are:

Mount the secret as a volume and have the app watch for file changes (good)
Restart the deployment after rotation (kubectl rollout restart, automatable)
Use a sidecar like Vault Agent Injector that handles refresh in-process (complex but zero-restart)

The correct answer depends on the app. An API key that can be rotated gradually is different from a database password where the old one is invalidated immediately.

This is part of a series on Kubernetes interview questions. Previously: deploying without cluster credentials. Next: zero-downtime deployments.

📱 Building a QR Code Login for a Homelab (And Learning oauth2-proxy's Session Format the Hard Way)

Fri, 14 Mar 2025 00:00:00 +0000

The problem

My homelab runs a single-node k3s cluster with a full GitOps stack — Argo CD, Traefik, oauth2-proxy for GitLab SSO, the usual over-engineered personal project. One thing that always bothered me: when I want to show the Homer dashboard on the living room TV, I have to type my credentials on a keyboard that wasn’t designed for the living room.

The obvious fix is a QR code. Phone scans it, phone authenticates, TV unlocks. Conceptually simple. In practice, a two-day debugging adventure that took me deep into oauth2-proxy’s source code.

The design

The flow I wanted:

TV opens qr.hippotion.com, shows a QR code and polls for completion
Phone scans, opens the device URL, taps “Continue with GitLab”
Phone completes GitLab OAuth
Server marks the session as ready
TV’s poll fires, gets redirected to Homer
Later: phone taps “End Session”, TV locks immediately

This is the OAuth 2.0 Device Authorization Grant pattern adapted for a single trusted user. I wrote it in Go with Redis for session storage. The service generates a device token, stores it with a 5-minute TTL, and uses it as the OAuth state parameter. The phone completes GitLab OAuth and the callback handler links the resulting session to the device token. The TV’s poll loop picks it up and redirects.

That part was straightforward. The hard part was making the TV’s session work for all protected apps on the domain, not just the QR page.

The oauth2-proxy problem

My homelab uses oauth2-proxy as a ForwardAuth backend for Traefik. Every protected app (home.hippotion.com, argo.hippotion.com, grafana.hippotion.com, etc.) sends unauthenticated requests through oauth2-proxy, which redirects to GitLab if no valid _oauth2_proxy session cookie is present.

The QR auth service creates its own session cookie (qr_session), but oauth2-proxy knows nothing about it. After QR login, clicking any link from Homer would immediately ask for GitLab credentials again.

The obvious solution: after the phone authenticates, set a valid _oauth2_proxy cookie on the TV’s browser. If I can forge a cookie that oauth2-proxy accepts, all apps work instantly.

How hard can it be?

Attempt 1: AES-GCM + JSON

I looked at the oauth2-proxy source and found what looked like the session format: a JSON struct with short field names ("e" for email, "ca" for created-at, etc.), encrypted with AES-GCM, base64url-encoded.

type oauthSession struct {
    CreatedAt *time.Time `json:"ca"`
    ExpiresOn *time.Time `json:"ea"`
    Email     string     `json:"e"`
    User      string     `json:"u"`
}

SHA256-hash the cookie secret → 32-byte AES key → GCM encrypt → base64url encode. Set as _oauth2_proxy cookie. Clean, simple, wrong.

oauth2-proxy returned 302 every time. I added debug logging to print the cookie value, copied it, and tested it directly against the ForwardAuth endpoint with curl. The logs revealed everything:

Error loading cookied session: cookie signature not valid, removing session

Cookie signature not valid. Not “decryption failed”, not “session expired”. A signature check.

Finding the real format

The error came from pkg/middleware/stored_session.go:94. I fetched the source:

val, _, ok := encryption.Validate(c, secret, s.Cookie.Expire)
if !ok {
    return nil, errors.New("cookie signature not valid")
}

encryption.Validate splits the cookie value on | and expects three parts. Looking at utils.go:

func Validate(cookie *http.Cookie, seed string, expiration time.Duration) (value []byte, t time.Time, ok bool) {
    parts := strings.Split(cookie.Value, "|")
    if len(parts) != 3 {
        return
    }
    if checkSignature(parts[2], seed, cookie.Name, parts[0], parts[1]) {
        // ...
    }
}

The cookie format is encryptedValue|timestamp|hmac. My cookie was just encryptedValue. Three-part, not one. First problem found.

For the HMAC, I needed to verify against a real cookie to get the key format right. oauth2-proxy sets _oauth2_proxy_csrf cookies during the login flow — I captured one from a 302 response and reverse-engineered it in Python:

key = secret_raw.encode()  # raw string, not decoded
data = (cookie_name + enc_val + ts).encode()  # concatenated, NO separators
sig = base64.urlsafe_b64encode(hmac.new(key, data, hashlib.sha256).digest())

Two surprises: the HMAC key is the raw cookie secret string (not base64-decoded), and the input is a bare concatenation with no | separators between fields.

I ran the test. The CSRF cookie’s signature matched. I had the format.

But oauth2-proxy still rejected the session.

The wrong cipher

I switched from AES-GCM to the correct HMAC format and tried again. Still 302. cookie signature not valid again.

Wait — was it even getting to the signature check? If decryption failed first, it wouldn’t reach that error. I added more debug logging to print the full cookie value and tested it with Python’s cryptography library:

candidates = {
    '24-byte std-b64 decode':  base64.b64decode(secret_str),
    '32-byte raw string':      secret_str.encode(),
    '32-byte sha256 of b64':   hashlib.sha256(base64.b64decode(secret_str)).digest(),
    ...
}
for label, key in candidates.items():
    try:
        pt = AESGCM(key).decrypt(nonce, ct_tag, None)
        print(f'SUCCESS [{label}]: {pt.decode()}')
    except Exception as e:
        print(f'FAIL    [{label}]: {e}')

The 24-byte base64-decoded key decrypted successfully. The cookie was correctly decrypted. But still rejected. Which meant the signature check was passing but something else was wrong upstream — it wasn’t even getting to the signature.

I went back to the source. session_store.go → NewCookieSessionStore:

cipher, err := encryption.NewCFBCipher(encryption.SecretBytes(secret))

AES-CFB. Not GCM. The cookie session store uses CFB. GCM exists in the codebase for a different purpose (the Redis ticket store, which I hadn’t discovered yet). I had been encrypting with the wrong cipher the entire time.

And SecretBytes — a function I’d been reading but not understanding:

func SecretBytes(secret string) []byte {
    b, err := base64.RawURLEncoding.DecodeString(strings.TrimRight(secret, "="))
    if err == nil {
        for _, i := range []int{16, 24, 32} {
            if len(b) == i {
                return b
            }
        }
    }
    return []byte(secret)  // fallback: raw string
}

The cookie secret q7OF9sK2/Pnt9QKNoBBmxWRL3GAbWzvj contains /. That’s valid standard base64 but not URL-safe base64 — RawURLEncoding fails. Fallback to raw string: 32 bytes, valid AES-256 key. My Python test had used standard base64 decoding, which did succeed (and produced a different 24-byte key). My Go implementation had done the same. Both were deriving the wrong key.

I rewrote the cipher to AES-CFB with the raw-string key. New test. Same error. Still rejecting.

MessagePack and LZ4

Back to the source. EncodeSessionState:

func (s *SessionState) EncodeSessionState(c encryption.Cipher, compress bool) ([]byte, error) {
    packed, err := msgpack.Marshal(s)
    // ...
    compressed, err := lz4Compress(packed)
    // ...
    return c.Encrypt(compressed)
}

MessagePack. LZ4 compression. Then AES-CFB.

I had been encrypting raw JSON. The whole time.

The struct tags confirmed it:

type SessionState struct {
    CreatedAt *time.Time `msgpack:"ca,omitempty"`
    ExpiresOn *time.Time `msgpack:"eo,omitempty"`  // "eo", not "ea" as I'd assumed
    AccessToken string   `msgpack:"at,omitempty"`
    Email      string    `msgpack:"e,omitempty"`
    User       string    `msgpack:"u,omitempty"`
}

Even the ExpiresOn field name was different from what I’d guessed ("eo" not "ea").

I added the vmihailenco/msgpack and pierrec/lz4 dependencies, rewrote the encoding pipeline: msgpack → lz4 → AES-CFB(raw-string key) → base64url(encrypted) → sign with HMAC.

Ran the curl test. HTTP 200.

After three days and four complete rewrites of the encoding logic, oauth2-proxy accepted the forged session.

The access token problem

Celebrating was premature. The browser test worked from curl, but real ForwardAuth requests kept failing intermittently. Looking at the logs:

Error loading cookied session: session is invalid

This came from validateSession in the storedSessionLoader — after successfully loading the session, it was calling the provider’s ValidateSession method and getting false back. I checked the GitLab provider:

func (p *GitLabProvider) ValidateSession(ctx context.Context, s *sessions.SessionState) bool {
    return validateToken(ctx, p, s.AccessToken, makeOIDCHeader(s.IDToken))
}

oauth2-proxy calls GitLab’s /oauth/token/info endpoint with the access token to verify the session is still active. My forged session had an empty AccessToken field. Empty access token → validateToken returns false immediately → session rejected.

The fix: during the phone’s GitLab OAuth flow, exchangeCode was already calling GitLab’s token endpoint and receiving an access token, but I’d been discarding it. I changed the function signature to return it, stored it in the session, included it in the forged session’s at field.

The token was issued for my qr-auth GitLab app, not oauth2-proxy’s app. But GitLab’s /oauth/token/info endpoint doesn’t check the issuing application — it just validates the token is active and returns 200. oauth2-proxy only checks for a 200 response. The token worked.

Everything worked.

The End Session problem — three attempts

Attempt 1: Delete qr_session, lock the QR page

The first End Session implementation deleted the qr_session key from Redis. To make this actually lock the screen, I restored the Homer proxy at qr.hippotion.com — the TV would show Homer via an ExternalName Kubernetes service pointing at the Homer pod, guarded by a Traefik ForwardAuth middleware that checked the qr_session cookie. Homer makes status API calls every ~30 seconds, which re-triggered ForwardAuth, and deleting qr_session meant the screen would lock within 30 seconds automatically.

This worked for qr.hippotion.com, but the _oauth2_proxy cookie was stateless — a signed, self-contained encrypted blob in the browser. There was no server-side record to delete. Other apps (argo.hippotion.com, grafana.hippotion.com, etc.) kept working until the 8-hour cookie expiry.

The TV screen was locked. The session wasn’t.

The tempting quick fix: reduce the forged cookie’s TTL from 8 hours to something shorter, like 30 minutes. End Session would lock the TV immediately. Other apps would expire within 30 minutes on their own.

Rejected. 30 minutes of residual access on a shared TV is too long, and the TTL is arbitrary — it doesn’t match what End Session is supposed to mean.

Attempt 3: Redis-backed oauth2-proxy sessions

The correct fix is what oauth2-proxy calls persistence tickets. Instead of encoding the entire session into the cookie, oauth2-proxy stores the session in Redis and puts only a ticket reference in the cookie. When the ticket is deleted from Redis, the session is gone on the next request.

The ticket format, from pkg/sessions/persistence/ticket.go:

// ticketID format: "_oauth2_proxy-"
ticketID := fmt.Sprintf("%s-%s", cookieOpts.Name, hex.EncodeToString(rawID))

// ticket string in the cookie: "v2.."
func (t *ticket) encodeTicket() string {
    return fmt.Sprintf("v2.%s.%s",
        base64.RawURLEncoding.EncodeToString([]byte(t.id)),
        base64.RawURLEncoding.EncodeToString(t.secret))
}

// session stored in Redis, encrypted with the *ticket* secret (not the cookie secret)
func (t *ticket) saveSession(s *sessions.SessionState, saver saveFunc) error {
    c, err := encryption.NewGCMCipher(t.secret)  // GCM, not CFB
    // ...
    ciphertext, err := s.EncodeSessionState(c, false)  // msgpack, NO lz4
    return saver(t.id, ciphertext, t.options.Expire)
}

This is a completely different format from the cookie session:

	Cookie session	Redis session (ticket)
Cipher	AES-CFB	AES-128-GCM
Key	cookie secret (raw string)	per-session ticket secret
Serialization	msgpack	msgpack
Compression	lz4	none
Storage	in the cookie	Redis, keyed by ticket ID
Revocable	no	yes

I rewrote the session creation to generate a random ticket ID and secret, encrypt the msgpack session with AES-GCM using the ticket secret, store it in Redis, and set the signed ticket reference as the _oauth2_proxy cookie.

I stored the ticket ID alongside the qr_session in Redis:

{
  "email": "user@example.com",
  "username": "username",
  "access_token": "...",
  "oauth2_ticket_id": "_oauth2_proxy-eeeb18501625dee77f344c0a6193d0bc"
}

End Session now does two Redis deletes:

func handleLogout(w http.ResponseWriter, r *http.Request) {
    sessionID := r.FormValue("session_id")
    ctx := r.Context()
    if raw, err := rdb.Get(ctx, "session:"+sessionID).Result(); err == nil {
        var sd sessionData
        if json.Unmarshal([]byte(raw), &sd) == nil && sd.OAuth2TicketID != "" {
            rdb.Del(ctx, sd.OAuth2TicketID)  // kills oauth2-proxy session
        }
    }
    rdb.Del(ctx, "session:"+sessionID)  // kills qr session
}

I configured oauth2-proxy to use Redis session storage pointing at the same Redis instance, added the Cilium network policy to allow ingress from the oauth2-proxy namespace, and removed the Homer proxy from qr.hippotion.com — it was no longer needed.

One final gotcha: session_store_type = "redis" in oauth2-proxy’s legacy config file does nothing. There’s no error, no warning. It silently ignores the option. The flag only works when passed as an actual CLI argument via extraArgs in the Helm chart values:

extraArgs:
  session-store-type: redis
  redis-connection-url: "redis://qr-auth-redis:6379"

After that, End Session worked correctly. Phone taps the button, ticket is deleted from Redis, the next ForwardAuth request for any app on the domain immediately redirects to the QR lock screen.

What the final architecture looks like

Phone: scan QR
  → /device?token=xxx → intermediate page ("Continue with GitLab")
  → GitLab OAuth on phone (already logged in → direct callback)
  → /callback: exchange code → get email + access token
  → create Redis ticket: AES-128-GCM(msgpack(session), ticketSecret)
  → store ticket in Redis at "_oauth2_proxy-"
  → mark device token as authed, store ticketID in qr session

TV: poll fires
  → read qr session from Redis (has email, accessToken, ticketID)
  → set _oauth2_proxy cookie: signed ticket reference
  → set qr_session cookie
  → redirect to home.hippotion.com

Any protected app (home, argo, grafana, ...):
  → Traefik ForwardAuth → oauth2-proxy
  → oauth2-proxy reads _oauth2_proxy cookie → decodes ticket
  → looks up "_oauth2_proxy-" in Redis → decrypts session
  → validates email, access token → 200 OK

Phone: "End Session"
  → POST /logout with session_id
  → delete "session:" from Redis (qr session gone)
  → delete "_oauth2_proxy-" from Redis (oauth2 ticket gone)
  → next ForwardAuth on TV: Redis lookup fails → redirect to login

The intermediate page on the phone (“Continue with GitLab” button instead of auto-redirect) was an unexpected requirement. Mobile browsers opened by the camera app often don’t share sessions with the browser where GitLab is logged in. When you auto-redirect to GitLab in a browser with no existing session, GitLab redirects to the sign-in page. The OAuth state is stored in a session cookie that GitLab sets during the initial authorize request. On mobile, the sign-in form submission can lose this cookie due to SameSite restrictions — after sign-in, GitLab can’t resume the OAuth flow and falls back to /users/sign_in with no further redirect. The intermediate page gives the user a visible moment to confirm they’re in a browser with an active GitLab session before initiating the OAuth redirect.

Lessons

Read the source, not the docs. The docs say “AES encryption” without specifying the mode or how the key is derived. The source has the answer in twenty lines.

Test at the boundary. The curl test against the ForwardAuth endpoint was the most valuable debugging step. It isolated exactly which layer was failing and gave me the real error message instead of a browser redirect loop. Without it, I’d still be guessing.

Format assumptions are fragile. I assumed JSON because JSON is the default for everything. oauth2-proxy uses MessagePack because it produces smaller cookies. LZ4 because it decompresses fast. AES-CFB because that’s what was chosen when the code was written. None of this is unreasonable, but none of it is obvious from the outside.

Two formats, same codebase. Cookie sessions and Redis ticket sessions use different ciphers, different compression, different key derivation. The GCM cipher I found first is correct — but for Redis sessions, not cookie sessions. The CFB cipher is for cookie sessions. I had the right code in the wrong place.

Config files can silently ignore options. session_store_type = "redis" in oauth2-proxy’s legacy config file does nothing. --session-store-type=redis on the command line works. No error, no warning, no indication that the option was parsed but not applied.

Revocability requires server-side state. A self-contained encrypted cookie cannot be revoked without adding a denylist (which has its own scaling problems). If you need End Session to mean something, you need a server-side session store. oauth2-proxy supports Redis sessions precisely for this reason — the ticket design is clean and the revocation path is a single Redis delete.

The code is at github.com/janos-gyorgy/qr-device-login.

Security on hippotion

Two Birds That Read the Web for Me: One Hoards, One Scatters

The same fear, twice

The Magpie hoards what’s already shiny

The Blue Jay scatters acorns and forgets where

Why two birds and not one with a flag

Is Anyone Knocking? A Security Pass on My Homelab

The question I actually had

The scary finding wasn’t an attacker

Gap 1 — an alarm with no one to call

Gap 2 — there was a web terminal on the open internet

Gap 3 — no floor under the blast radius

What the audit was really about

🚩 I Built a Usage Dashboard and Tripped Claude Fable 5's Safety Net

The thing I was actually building

I wasn’t doing anything wrong. That’s the interesting part.

Surface vs. intent

What I’d do differently

🔒 Building a PII Guardrail Proxy for Cloud LLM Calls

The problem with cloud LLM access

The architecture

The implementation

Kubernetes deployment

The latency tradeoff

What it catches

Source

🕵️ Privacy-Preserving LLM Pipelines: Anonymize Before You Send

The problem with blocking

The pattern

Prior art — what already exists

The model: Anonymizer-1.7B

The implementation

The Kubernetes stack

The n8n pipeline

The caveats

Source

🔄 Someone kubectl apply'd a Hotfix Directly. How Do You Detect and Prevent It?

The question

Detection and remediation: Argo CD selfHeal

The gap selfHeal doesn’t close

Prevention: Kyverno

Kyverno can also mutate and generate

The complete drift prevention picture

What interviewers are actually testing

🛡️ How Do You Prevent a Compromised Pod From Calling Your Database?

The question

NetworkPolicy: the primitive

The default-deny pattern

Allowing specific traffic

The DNS gotcha

Cilium: the same model with more power

What a real namespace policy set looks like

What interviewers are actually testing

🔑 Deploy to Kubernetes Without Storing Any Cluster Credentials in CI

The question

Answer 1: GitOps (the one your interviewer probably wants)

Answer 2: OIDC federation (for when you genuinely need push-based)

Which one to use

🤫 How Do You Handle Secrets in a GitOps Repository?

The question

The constraint

Option 1: External Secrets Operator + Vault

Option 2: Sealed Secrets

Option 3: SOPS

Comparison

What interviewers are actually testing

📱 Building a QR Code Login for a Homelab (And Learning oauth2-proxy's Session Format the Hard Way)

The problem

The design

The oauth2-proxy problem

Attempt 1: AES-GCM + JSON

Finding the real format

The wrong cipher

MessagePack and LZ4

The access token problem

The End Session problem — three attempts

Attempt 1: Delete qr_session, lock the QR page

Attempt 2: Shorter cookie TTL

Attempt 3: Redis-backed oauth2-proxy sessions

What the final architecture looks like