<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Self-Hosted on hippotion</title><link>https://blog.hippotion.com/tags/self-hosted/</link><description>Recent content in Self-Hosted on hippotion</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 24 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.hippotion.com/tags/self-hosted/index.xml" rel="self" type="application/rss+xml"/><item><title>🚩 I Built a Usage Dashboard and Tripped Claude Fable 5's Safety Net</title><link>https://blog.hippotion.com/posts/when-claude-flagged-my-own-dashboard/</link><pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/when-claude-flagged-my-own-dashboard/</guid><description>I asked Claude Fable 5 to help me self-host a dashboard for my own Claude usage. Halfway through, its dual-use safety measures flagged the conversation and downshifted me to Opus 4.8. Nothing I did was wrong — the request just had the shape of something that is. That gap, between what a thing looks like and what it&amp;rsquo;s for, turns out to be the whole story.</description><content:encoded><![CDATA[<h2 id="the-thing-i-was-actually-building">The thing I was actually building</h2>
<p>I wanted a small web page on my homelab that shows my Claude usage — the 5-hour
session window, the weekly limits, the per-model split. There&rsquo;s a nice Electron
widget out there that does this on the desktop, but I don&rsquo;t want a desktop app; I
want a URL behind my own OAuth that I can glance at from my phone.</p>
<p>The mechanics are unremarkable. The claude.ai web app reads those numbers from a
couple of undocumented endpoints using your logged-in session cookie. So a
self-hosted version does the same thing server-side: hold the session token as a
secret, replay the same calls, cache the result, render some bars. An afternoon&rsquo;s
work. I was pairing with <strong>Claude Fable 5</strong> on it — Anthropic&rsquo;s newest model, and
the one that ships with extra safety measures around dual-use capability.</p>
<p>Then, partway through, I got the message: <em>Fable 5 flagged something in this
session and switched to a more conservative model.</em> It dropped me to <strong>Opus 4.8</strong>
for the rest of the conversation. Safe conversations sometimes trip it, the notice
said. Send feedback.</p>
<h2 id="i-wasnt-doing-anything-wrong-thats-the-interesting-part">I wasn&rsquo;t doing anything wrong. That&rsquo;s the interesting part.</h2>
<p>My first reaction was the obvious one — <em>what did I say?</em> But I knew exactly what
I&rsquo;d built, and none of it was sketchy. It was my account, my usage data, my
hardware, my OAuth in front of it.</p>
<p>So I went looking at the request the way a classifier would — not &ldquo;what did he
mean&rdquo; but &ldquo;what does this look like.&rdquo; And from that angle it&rsquo;s a different
picture entirely. Stack up the surface features:</p>
<ul>
<li>🔑 capturing a <strong>session token</strong> and storing it to replay later</li>
<li>🌐 sending it to an <strong>undocumented API</strong> that isn&rsquo;t meant for third parties</li>
<li>🕵️ spoofing a <strong>browser User-Agent</strong> so the request blends in</li>
<li>🧱 detecting and working around a <strong>Cloudflare bot challenge</strong></li>
</ul>
<p>Read that list cold, with no context. That&rsquo;s not a usage dashboard. That&rsquo;s the
exact signature of credential theft and scraping tooling. Every individual move
is one a malicious script would also make. The only thing separating my afternoon
project from the bad version is <em>whose</em> account it touches and <em>why</em> — and intent
is precisely the part that doesn&rsquo;t show up in the tokens.</p>
<h2 id="surface-vs-intent">Surface vs. intent</h2>
<p>This is the part worth sitting with, because it&rsquo;s not a Claude quirk — it&rsquo;s the
shape of every content classifier, every WAF rule, every fraud model I&rsquo;ve ever
run in production.</p>
<p>A detector scores what it can see. It cannot see intent; it sees features. And
the features of &ldquo;monitor my own usage&rdquo; and &ldquo;harvest someone else&rsquo;s session&rdquo;
overlap almost completely, because the <em>technique</em> is identical — the difference
lives entirely in context the model has been deliberately built not to over-trust.
You can&rsquo;t tune that gap away. You can only pick where to sit on the
precision/recall curve, and Fable 5 — being the high-capability model with the
extra dual-use measures bolted on — sits where it catches the pattern even when it
costs some false positives, then hands off to Opus 4.8. I was the false positive.
The system did roughly the right thing for roughly the right reason; it just
doesn&rsquo;t feel that way when it&rsquo;s pointed at you.</p>
<p>The honest engineering takeaway is the one I keep relearning: <strong>if a benign task
has the silhouette of an abusive one, expect to get treated like the silhouette.</strong>
Not just by AI — by rate limiters, by bot detection, by the fraud team. The fix
isn&rsquo;t to be offended. It&rsquo;s to recognize the silhouette, and where it matters,
make the legitimate context legible up front.</p>
<h2 id="what-id-do-differently">What I&rsquo;d do differently</h2>
<p>Practically, very little — the project was fine, and it downshifted to a model
that finished the job. But the framing changed how I built it. I leaned harder
into the parts that make intent <em>visible in the design</em>: the session token never
leaves the server, it lives in Vault and arrives as an injected secret, the whole
thing sits behind OAuth, and it polls on a leash instead of hammering. Not because
a classifier made me, but because those are the same choices that make it
obviously a personal dashboard and not a harvesting bot — to a reviewer, to
future-me, and yes, to a model reading over my shoulder.</p>
<p>The widget rides your credential on your desktop. Mine keeps it server-side behind
my own front door. Turns out building it the trustworthy way and building it the
<em>legibly</em> trustworthy way are the same work — and getting flagged is what made me
notice the difference.</p>
]]></content:encoded></item></channel></rss>