Claude on hippotion

Every Robot in My House Can Text Me Now

Fri, 29 May 2026 00:00:00 +0000

The silence

My house runs on quiet little robots. A tracker watches my kombucha ferment. A job narrates kids’ books in Hungarian. A media stack pulls and files things. Home Assistant minds the sensors. A dozen services, all doing their jobs, all completely mute. When a batch finished or an import failed, I found out the same way every time: by going to look.

Then the silence got expensive. Claude Code stopped dead in the middle of a task because I’d burned through my plan’s usage window — no warning, no countdown, just a wall. The information existed; a dashboard in my own cluster was already polling it. It just had no way to reach my pocket.

So I built one thing: a push bus. One place anything in the cluster can POST to, that actually buzzes my phone. And the first job I gave it was to warn me before my AI assistant goes dark.

The boring part (said honestly)

The bus is ntfy — a self-hosted pub/sub notifier. Picking it took about five minutes, because self-hosting ntfy for a homelab is a thoroughly solved problem. There are at least three off-the-shelf bridges from Prometheus Alertmanager to ntfy. I’m not going to pretend the bus is the clever bit.

What I did do deliberately:

📦 Deployed it GitOps-native — one entry in my app-of-apps, reconciled by Argo CD, no docker run anywhere.
🔒 Locked it to deny-all auth with bearer tokens. Security alerts ride this bus; a world-readable topic on a public URL was a non-starter. (Which also means it sits outside my usual OAuth gate — the phone app can’t do an interactive login flow, so ntfy does its own token auth.)
🏷️ Topics by severity: hl-crit, hl-warn, hl-info, hl-event. Subscribe and mute by how much I care.

Then the interesting parts showed up at the edges, where they always do.

Edge one: my own firewall 403’d me

First test, the usage producer POSTing to https://ntfy.hippotion.com:

HTTP 403 Forbidden
error code: 1010

That 1010 looks like ntfy rejecting my token. It isn’t. It’s Cloudflare. Error 1010 means “your browser signature is banned” — Cloudflare’s bot protection took one look at a Python script’s urllib User-Agent and slammed the door.

My own producer couldn’t reach my own bus, because the request left the cluster, went all the way out to my own edge, and got flagged as a bot on the way back in.

The fix is the architecture I should’ve had from the start: in-cluster producers POST to the internal service address and never touch the public internet at all.

# wrong: out to Cloudflare and back, gets bot-blocked
https://ntfy.hippotion.com/hl-warn

# right: stays inside the cluster
http://ntfy.web-ntfy.svc.cluster.local/hl-warn

The phone still uses the public URL happily — the real ntfy app carries a signature Cloudflare trusts. Only scripts trip 1010. Lesson: your own edge is not your friend when you’re a script. Keep cluster traffic in the cluster.

Edge two: the obvious data source was lying

To warn me about Claude usage, the naïve move is to parse Claude Code’s local logs — they sit right there in ~/.claude/projects/.../*.jsonl, token counts and all.

Don’t. Those counts are unreliable for accounting — known to undercount, wildly, in some cases by ~100x. Every tool that parses that JSONL inherits the bug.

The number that’s actually true lives in the claude.ai usage API — the same five_hour and seven_day windows your plan enforces against. And I already had a service polling exactly that. So the producer is just a tiny sidecar on that existing pod, reading its /api/usage over localhost (same pod — no network policy to negotiate, no second credential, nothing else hammering claude.ai):

📈 ≥80% of a window → hl-warn (high).
🚨 ≥95% → hl-crit (urgent).
🔁 One ping per window per reset cycle, escalating warn→crit, keyed on the reset timestamp so it never spams.

The first time it mattered, my phone buzzed at 80% with hours of runway left instead of a brick wall mid-task.

What I’d tell past me

Three things, none of them about ntfy:

Reuse the signal you already have. I didn’t build a usage poller — I bolted a sidecar onto the one already running. The smallest producer is one that reads localhost.
Your own edge can betray you. A firewall that protects you from bots will happily block your own automation. In-cluster talks in-cluster.
Check whether your data source is telling the truth before you build an alert on it. An alert you don’t trust is worse than no alert — you’ll learn to ignore it, and then it’ll be right once.

Next, the high-leverage move: point Prometheus Alertmanager at the same bus, and every infra alert I have — plus every one I’ll ever add — lands on the phone through one bridge. The kombucha ping can wait. The disk-full one can’t.

The house is still full of quiet robots. The difference is now they know my number.

🚩 I Built a Usage Dashboard and Tripped Claude Fable 5's Safety Net

Fri, 24 Apr 2026 00:00:00 +0000

The thing I was actually building

I wanted a small web page on my homelab that shows my Claude usage — the 5-hour session window, the weekly limits, the per-model split. There’s a nice Electron widget out there that does this on the desktop, but I don’t want a desktop app; I want a URL behind my own OAuth that I can glance at from my phone.

The mechanics are unremarkable. The claude.ai web app reads those numbers from a couple of undocumented endpoints using your logged-in session cookie. So a self-hosted version does the same thing server-side: hold the session token as a secret, replay the same calls, cache the result, render some bars. An afternoon’s work. I was pairing with Claude Fable 5 on it — Anthropic’s newest model, and the one that ships with extra safety measures around dual-use capability.

Then, partway through, I got the message: Fable 5 flagged something in this session and switched to a more conservative model. It dropped me to Opus 4.8 for the rest of the conversation. Safe conversations sometimes trip it, the notice said. Send feedback.

I wasn’t doing anything wrong. That’s the interesting part.

My first reaction was the obvious one — what did I say? But I knew exactly what I’d built, and none of it was sketchy. It was my account, my usage data, my hardware, my OAuth in front of it.

So I went looking at the request the way a classifier would — not “what did he mean” but “what does this look like.” And from that angle it’s a different picture entirely. Stack up the surface features:

🔑 capturing a session token and storing it to replay later
🌐 sending it to an undocumented API that isn’t meant for third parties
🕵️ spoofing a browser User-Agent so the request blends in
🧱 detecting and working around a Cloudflare bot challenge

Read that list cold, with no context. That’s not a usage dashboard. That’s the exact signature of credential theft and scraping tooling. Every individual move is one a malicious script would also make. The only thing separating my afternoon project from the bad version is whose account it touches and why — and intent is precisely the part that doesn’t show up in the tokens.

Surface vs. intent

This is the part worth sitting with, because it’s not a Claude quirk — it’s the shape of every content classifier, every WAF rule, every fraud model I’ve ever run in production.

A detector scores what it can see. It cannot see intent; it sees features. And the features of “monitor my own usage” and “harvest someone else’s session” overlap almost completely, because the technique is identical — the difference lives entirely in context the model has been deliberately built not to over-trust. You can’t tune that gap away. You can only pick where to sit on the precision/recall curve, and Fable 5 — being the high-capability model with the extra dual-use measures bolted on — sits where it catches the pattern even when it costs some false positives, then hands off to Opus 4.8. I was the false positive. The system did roughly the right thing for roughly the right reason; it just doesn’t feel that way when it’s pointed at you.

The honest engineering takeaway is the one I keep relearning: if a benign task has the silhouette of an abusive one, expect to get treated like the silhouette. Not just by AI — by rate limiters, by bot detection, by the fraud team. The fix isn’t to be offended. It’s to recognize the silhouette, and where it matters, make the legitimate context legible up front.

What I’d do differently

Practically, very little — the project was fine, and it downshifted to a model that finished the job. But the framing changed how I built it. I leaned harder into the parts that make intent visible in the design: the session token never leaves the server, it lives in Vault and arrives as an injected secret, the whole thing sits behind OAuth, and it polls on a leash instead of hammering. Not because a classifier made me, but because those are the same choices that make it obviously a personal dashboard and not a harvesting bot — to a reviewer, to future-me, and yes, to a model reading over my shoulder.

The widget rides your credential on your desktop. Mine keeps it server-side behind my own front door. Turns out building it the trustworthy way and building it the legibly trustworthy way are the same work — and getting flagged is what made me notice the difference.