I gave OpenAI’s Codex full read access to my k3s GitOps homelab — both repos: the applications.yml that generates every namespace, AppProject and NetworkPolicy; the umbrella charts; the Traefik IngressRoutes; the External Secrets config; the host scripts.

Anyone can run “AI, review my cluster” and get a list of confident findings. The work is what you do with them — which you accept, which you recalibrate, which the model got wrong. That’s the part most “I secured my X with AI” posts skip, so I’ll start there.

What I rejected

It flagged “committed secrets” — high-entropy strings next to keys like publicAccessToken. Wrong: those were secretRefKey mappings (names of keys in a Secret that External Secrets already syncs from Vault), not values. The entropy heuristic tripped on the key names. Real committed secrets there: zero.

It also said my scheduled agent runners’ blast-radius guard had the same flaw in all of them — fix identically. True for three; the other two regenerate a local index file every run, so a blanket “nothing may change outside the output dir” guard would flag their own tooling as a breach nightly. Right instinct, wrong generalization: strict single-file enforcement for the three, an index/ allow-list for the two.

That’s the loop. A reviewer with no context catches what you’ve stopped seeing; you catch where that missing context makes it overconfident. Skip the second half and you got autocomplete, not an audit.

Expose only what must be exposed

My n8n IngressRoute matched the whole host on one rule, behind only n8n’s own login. n8n runs arbitrary code (Code nodes), reaches the internal network, stores credentials — a public editor on one password is RCE-and-credential-theft waiting on a weak password or an auth-bypass CVE.

The fix fell out of a quick threat model:

  • Assets: stored credentials, code execution, internal reach.
  • Actors: an internet scanner; the holder of a leaked API key.
  • Decisions: the editor is a human surface → SSO at the edge. The API auths by key, not a cookie → edge OAuth would only break it, so move it off the public edge (in-cluster only). Webhooks must be public but carry no privileged auth → leave exactly those path-prefixes open.

One wide-open hostname → three webhook prefixes public, editor behind SSO, API reachable only in-cluster.

Stop routing internal traffic through the internet

Why it happened: the public hostname was easiest, and split-horizon DNS hid the cost — Pi-hole resolves the public name to the local ingress, so an internal caller on the public URL still works; it just hairpins out to the tunnel and back for nothing.

Fix: in-cluster producers use cluster DNS (<svc>.<ns>.svc.cluster.local); host-side cron scripts can’t resolve cluster DNS, so they hit the pinned ClusterIP (this cluster runs kube-proxy, so the node routes service IPs natively). No service mesh — a decision: on one node, internal hops are plain HTTP inside the trust boundary, and mTLS isn’t worth a control plane to operate. TLS at the edge, plain internal. Multi-node changes that answer.

Secrets: the manager is the control, the Secret is a projection

One genuinely committed secret — rendered into a ConfigMap, the worse hiding spot because it looks managed. It moved to Vault (encrypted at rest, audited) → External Secrets Operator → a Kubernetes Secret via envFrom. The detail that matters: a K8s Secret is base64, not encryption — only as private as the datastore at rest. So Vault is the control; the Secret is a projection ESO keeps in sync (rotate in Vault → re-sync → restart). Not SOPS or Sealed Secrets — those keep ciphertext in Git; I wanted plaintext to never touch the repo, plus an access audit trail.

Three things it earned

  1. A rule you don’t audit is a rule you don’t have. I had “no secrets in Git” and a secret manager, and still leaked one into a ConfigMap.
  2. Grade against a named threat, not a vibe. “Secure against a scanner,” “against a leaked backup,” and “against a shell on the box” are three problems; a control for one can be nothing for another.
  3. The interesting part of an AI review is which findings you can defend rejecting. Can’t independently validate every recommendation — false positives included? You ran autocomplete with good PR, not an audit.