🚩 I Built a Usage Dashboard and Tripped Claude Fable 5's Safety Net

Fri, 24 Apr 2026 00:00:00 +0000

The thing I was actually building

I wanted a small web page on my homelab that shows my Claude usage — the 5-hour session window, the weekly limits, the per-model split. There’s a nice Electron widget out there that does this on the desktop, but I don’t want a desktop app; I want a URL behind my own OAuth that I can glance at from my phone.

The mechanics are unremarkable. The claude.ai web app reads those numbers from a couple of undocumented endpoints using your logged-in session cookie. So a self-hosted version does the same thing server-side: hold the session token as a secret, replay the same calls, cache the result, render some bars. An afternoon’s work. I was pairing with Claude Fable 5 on it — Anthropic’s newest model, and the one that ships with extra safety measures around dual-use capability.

Then, partway through, I got the message: Fable 5 flagged something in this session and switched to a more conservative model. It dropped me to Opus 4.8 for the rest of the conversation. Safe conversations sometimes trip it, the notice said. Send feedback.

I wasn’t doing anything wrong. That’s the interesting part.

My first reaction was the obvious one — what did I say? But I knew exactly what I’d built, and none of it was sketchy. It was my account, my usage data, my hardware, my OAuth in front of it.

So I went looking at the request the way a classifier would — not “what did he mean” but “what does this look like.” And from that angle it’s a different picture entirely. Stack up the surface features:

🔑 capturing a session token and storing it to replay later
🌐 sending it to an undocumented API that isn’t meant for third parties
🕵️ spoofing a browser User-Agent so the request blends in
🧱 detecting and working around a Cloudflare bot challenge

Read that list cold, with no context. That’s not a usage dashboard. That’s the exact signature of credential theft and scraping tooling. Every individual move is one a malicious script would also make. The only thing separating my afternoon project from the bad version is whose account it touches and why — and intent is precisely the part that doesn’t show up in the tokens.

Surface vs. intent

This is the part worth sitting with, because it’s not a Claude quirk — it’s the shape of every content classifier, every WAF rule, every fraud model I’ve ever run in production.

A detector scores what it can see. It cannot see intent; it sees features. And the features of “monitor my own usage” and “harvest someone else’s session” overlap almost completely, because the technique is identical — the difference lives entirely in context the model has been deliberately built not to over-trust. You can’t tune that gap away. You can only pick where to sit on the precision/recall curve, and Fable 5 — being the high-capability model with the extra dual-use measures bolted on — sits where it catches the pattern even when it costs some false positives, then hands off to Opus 4.8. I was the false positive. The system did roughly the right thing for roughly the right reason; it just doesn’t feel that way when it’s pointed at you.

The honest engineering takeaway is the one I keep relearning: if a benign task has the silhouette of an abusive one, expect to get treated like the silhouette. Not just by AI — by rate limiters, by bot detection, by the fraud team. The fix isn’t to be offended. It’s to recognize the silhouette, and where it matters, make the legitimate context legible up front.

What I’d do differently

Practically, very little — the project was fine, and it downshifted to a model that finished the job. But the framing changed how I built it. I leaned harder into the parts that make intent visible in the design: the session token never leaves the server, it lives in Vault and arrives as an injected secret, the whole thing sits behind OAuth, and it polls on a leash instead of hammering. Not because a classifier made me, but because those are the same choices that make it obviously a personal dashboard and not a harvesting bot — to a reviewer, to future-me, and yes, to a model reading over my shoulder.

The widget rides your credential on your desktop. Mine keeps it server-side behind my own front door. Turns out building it the trustworthy way and building it the legibly trustworthy way are the same work — and getting flagged is what made me notice the difference.

Self-Hosted on hippotion

🚩 I Built a Usage Dashboard and Tripped Claude Fable 5's Safety Net

The thing I was actually building

I wasn’t doing anything wrong. That’s the interesting part.

Surface vs. intent

What I’d do differently