
🔒 Building a PII Guardrail Proxy for Cloud LLM Calls
A local model classifies every prompt before it leaves the cluster. If it’s sensitive, it’s blocked. If it’s clean, it goes to NVIDIA NIM. 150 lines of FastAPI, deployed on k3s.

A local model classifies every prompt before it leaves the cluster. If it’s sensitive, it’s blocked. If it’s clean, it goes to NVIDIA NIM. 150 lines of FastAPI, deployed on k3s.

Replace PII with semantically realistic fakes before sending to a cloud LLM, then restore the originals from the response. Started with a general model and prompt engineering — then upgraded to a purpose-built 1.7B fine-tune via Ollama.

Building a telemetry backend for Dice & Shrines — every attack logged, every guardian tracked, every die rolled accounted for. What the data revealed about balance, luck, and how people actually play.