The question I actually had
It started as a nervous-Sunday kind of question: is a third party trying to get into my server — over SSH, or some other way? I run a single-node Kubernetes homelab that hosts a couple dozen little apps, some of them public. You read about credential-stuffing bots and you start to wonder who’s been rattling the handle while you slept.
So I did the audit. The good news came first, and it’s worth saying plainly because it’s the part most homelabs get wrong: the front door is solid. Nothing is reachable from the internet except through a Cloudflare Tunnel — an outbound-only connection, zero open inbound ports on my router. Almost every service sits behind OAuth. The cluster has 140 network policies doing real east-west segmentation. And the login history? Eleven straight weeks where every single shell login came from one IP — my own workstation on the LAN. No strangers. No 3 a.m. logins from a VPS in another hemisphere.
I could have stopped there feeling good. That would have been a mistake.
The scary finding wasn’t an attacker
The useful question turned out not to be “is someone knocking?” but “if someone got in, would anything tell me?” And when I traced that wire, it ended in the dark.
I have a full monitoring stack — Prometheus, Grafana, Alertmanager, the works. Alertmanager was running. It was also configured to notify exactly no one: no receivers, and upstream, no alert rules at all. It was a smoke detector with the battery taken out and, for good measure, no smoke sensor either. If an attacker had walked in, the alarm would have stayed perfectly, silently green.
That reframed the whole job. Three gaps, in priority order.
Gap 1 — an alarm with no one to call
I built the missing chain end to end. A small exporter on the host parses the
SSH journal and fail2ban state and writes metrics into node_exporter’s
textfile collector — so it rides the monitoring I already had instead of adding
a new moving part. On top sit the alert rules that were never there. The one
that matters most is blunt:
A shell login succeeded from a non-LAN IP.
That should be impossible in normal life, so if it ever fires, I want it
shouting. It now emails me the instant it happens, alongside quieter alerts for
brute-force spikes, distributed scans, fail2ban going down, and — the
meta-alert I’m fondest of — the watchdog itself going stale, because a
security monitor that silently dies is worse than none. And fail2ban now
actually bans the bots, with escalating ban times and my LAN permanently on the
allow-list.
The honest lesson: I’d been treating “I have Prometheus” as if it meant “I have monitoring.” Dashboards you have to remember to look at are not monitoring. Monitoring is the thing that interrupts you. Until an alert can reach your phone, you don’t have a security alarm — you have a security museum.
Gap 2 — there was a web terminal on the open internet
This is the one that made me wince. Among my public hostnames was ttyd — a
browser-based shell. A full terminal on my server, reachable from anywhere,
sitting behind a single OAuth proxy. One misconfiguration, one OAuth bypass,
and that’s not “an app is compromised,” that’s root on the box from a browser
tab.
The fix here isn’t more locks. It’s the realization that the strongest control is not exposing the thing at all. I deleted the web terminal entirely — app, manifests, dashboard tile, all of it. Then I went down the public hostname list and pulled everything with no business being public off the tunnel: the secrets UI, the ingress dashboard, Prometheus, Alertmanager, the network-observability console, the DNS admin. They still work — on my LAN, over the same wildcard cert — they’re just not the internet’s business anymore. A service that isn’t exposed has no attack surface to harden.
Gap 3 — no floor under the blast radius
The network policies limit how far a compromised pod can talk sideways. But nothing stopped a workload from running as root, mounting the host filesystem, or grabbing the host network in the first place. So I turned on Kubernetes' built-in Pod Security Admission: every namespace now at least reports baseline violations, and the clean app namespaces enforce baseline — meaning a compromised app there simply cannot request privileged mode or a hostPath mount. It’s a floor. Floors are underrated.
What the audit was really about
I went looking for an intruder and didn’t find one — the logs were clean, the front door held. What I found instead was that I’d built something secure at the perimeter and then never asked the uncomfortable follow-up: what happens after the perimeter? The answer had been “nothing happens, and no one is told,” and I just hadn’t looked.
Three principles I’m taking with me:
- An alarm that can’t reach you is decoration. Wire the notification first; the rules are easy once something is listening.
- Don’t expose it beats add more auth. Every hostname you take off the public internet is a class of attack you no longer have to be clever about.
- Give the blast radius a floor. Assume one thing gets popped, and decide in advance how far it gets.
The best part: all of it is GitOps. The intrusion alerts, the un-exposing, the pod-security floor — every change is a commit, reviewable and revertible, and my cluster reconciles itself to match. The audit didn’t just make the homelab safer. It wrote down why it’s safer, in a form the next version of me can read.
Now if someone knocks, I’ll know. And the web terminal isn’t answering the door anymore — because it’s gone.
