hippotion.com — the hippo sheltering the hedgehog, and the tagline

🦛 What Is Hippotion

A name that works in two languages, hides two animals, and started as a kombucha label.

cartoon cover for: Five Ways to Manage Kubernetes Manifests (and Why They're Not All Equal)

📦 Five Ways to Manage Kubernetes Manifests (and Why They're Not All Equal)

Raw YAML, Kustomize, Helm, Jsonnet — there’s more than one way to describe what you want running in a cluster. Here’s what each actually looks like in practice and where each one breaks.

the little robot stands guard at a doorway like a friendly bouncer, holding up a hand to check a stack of papers, while the boy in the cap watches; a shield symbol floats above them, protective and watchful

🔒 Building a PII Guardrail Proxy for Cloud LLM Calls

A local model classifies every prompt before it leaves the cluster. If it’s sensitive, it’s blocked. If it’s clean, it goes to NVIDIA NIM. 150 lines of FastAPI, deployed on k3s.

cartoon cover for: Privacy-Preserving LLM Pipelines: Anonymize Before You Send

🕵️ Privacy-Preserving LLM Pipelines: Anonymize Before You Send

Replace PII with semantically realistic fakes before sending to a cloud LLM, then restore the originals from the response. Started with a general model and prompt engineering — then upgraded to a purpose-built 1.7B fine-tune via Ollama.

cartoon cover for: Observing Local LLM Inference: llama.cpp's Built-in Prometheus Metrics

📈 Observing Local LLM Inference: llama.cpp's Built-in Prometheus Metrics

llama.cpp’s inference server ships a /metrics endpoint. One flag, Prometheus scraping, a Grafana dashboard loaded via ConfigMap sidecar — AI observability without a proxy layer.

cartoon cover for: Local LLM Inference on Kubernetes, No GPU Required

🤖 Local LLM Inference on Kubernetes, No GPU Required

A CPU-only self-hosted LLM stack running on k3s: llama.cpp as the inference server, Open WebUI as the chat interface, deployed as a single Git push.

cartoon cover for: Don't Restart the Node. Quarantine It First.

🚨 Don't Restart the Node. Quarantine It First.

Rebooting a misbehaving node feels productive. It isn’t. You’re erasing your evidence and skipping the lesson.

Attack efficiency heatmap — win rate by attacker and defender dice count

📊 I Added a Stats Service to My Game to Answer One Question. It Multiplied.

Building a telemetry backend for Dice & Shrines — every attack logged, every guardian tracked, every die rolled accounted for. What the data revealed about balance, luck, and how people actually play.

Dice & Shrines mid-game — eight factions fighting over a procedurally generated hex map

🎲 I Built a Browser Game to Learn AI Coding Tools. It Turned Into Something Else.

What started as a Claude Code / Codex sandbox became a territory conquest game with five asymmetric guardians, procedurally generated hex maps, and a stats service to balance them. Here’s what happened.

cartoon cover for: Your Deployment Causes 30 Seconds of Downtime. What Went Wrong?

⚡ Your Deployment Causes 30 Seconds of Downtime. What Went Wrong?

Kubernetes rolling updates don’t give you zero-downtime for free. There are four separate things you have to get right, and most clusters get at least one wrong.