Kubernetes: init container crash loop leaves dirty emptyDir

When a pod’s init container crashes, Kubernetes restarts only the init container — not the whole pod. The emptyDir volume survives between retries. If your init container does a git clone into a fixed path, the second attempt fails with “destination path already exists.”

Fix: rm -rf the target dir before cloning.

rm -rf /git/repo
git clone --depth=10 --branch=main https://... /git/repo

After many restarts, no manual cleanup needed. Events expire in ~1h, old pods are replaced automatically by the Deployment controller. Check recovery with:

kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -10

A “CPU spike” that was actually memory thrashing (adding GA4 to Hugo)

Wanted Google Analytics on this blog. PaperMod already calls a google_analytics.html partial in head.html, but it’s gated behind hugo.IsProduction | or (eq site.Params.env "production"). My blog pod runs hugo server, which always reports the environment as development — so the partial never fires. I “fixed” that by setting env = "production".

That was the wrong lever. env = production flips on Hugo’s whole production path — minification, OpenGraph, Twitter cards, schema JSON across every page. The next full rebuild blew past the pod’s 128Mi memory limit and got OOMKilled (exit 137). Server load jumped.

The right way to add GA without touching the build mode: drop the tag in layouts/_partials/extend_head.html. PaperMod includes that partial unconditionally, above the production guard — so it loads under hugo server too.

But here’s the part that fooled me. After reverting env, load was still climbing — to ~14 on a single node — and ps showed hugo at “500% CPU”. Looked like a runaway compute loop. It wasn’t:

%Cpu(s): 2.1 us, 41.0 sy, 6.9 id, 50.0 wa     <- 50% iowait, 2% userspace
PID ... S  %CPU  COMMAND
... D  333  hugo    <- state D, RES pinned at 127MiB (the 128Mi cgroup limit)

Two lessons:

  1. ps %CPU is a lifetime average, not instantaneous. A process that ran hot for 1s then blocked still shows a big number for a while. Use top for what’s happening now.
  2. High load + high %wa + a D-state process sitting at its cgroup memory limit = memory thrashing, not CPU. Hugo wasn’t computing — it was wedged against the 128Mi ceiling, and every allocation triggered kernel reclaim/swap. A sub-second build dragged out for minutes in uninterruptible I/O sleep, and all those blocked tasks are what inflate load average (Linux counts D-state in load).

The actual fix was boring: 128Mi was always marginal for hugo-extended + PaperMod. Bumped the limit to 512Mi and the thrash vanished.

Takeaway: when load spikes, read %wa and process state before blaming the CPU. And don’t flip env=production on a long-lived hugo server just to ungate one partial — use extend_head.html.

Self-hosting Supabase (lean) on k3s: the gotcha checklist

Ran the community supabase/supabase chart on a 16Gi single node — enabled db, rest, auth, meta, studio, kong + the log pipeline (analytics/Logflare + vector); left realtime, storage, imgproxy, edge-functions off. The deploy is easy; these are the things that actually bit:

  • Studio shows “no tables”. Supabase is single-database by design — Studio, PostgREST and auth all use the database named postgres. App tables in a separate database are invisible to all of it. Put your schema in postgres’s public schema.
  • Studio won’t schedule with edge-functions disabled. Its Deployment mounts the functions PVC unconditionally. Either run functions, or create the PVC yourself and leave functions off.
  • edge-functions crashloops if you keep it: it boots by fetching a Deno module from the internet, which a deny-all egress policy blocks. You usually only want the PVC it leaves behind anyway.
  • vector (log collector) stays silent under a deny-all policy. It discovers pods via the Kubernetes API, so it needs API egress, not just app ports (allowEgressToKubeApi). A log shipper that can’t reach the API collects nothing and doesn’t say why.
  • secretRef must contain every key the chart maps — including non-secret ones like database and openAiApiKey. Miss one and pods sit in CreateContainerConfigError.
  • ESO ExternalSecret shows perpetual OutOfSync in Argo CD unless you spell out the remoteRef defaults (conversionStrategy: Default, decodingStrategy: None, metadataPolicy: None) — ESO writes them back, and the compact form drifts.
  • postgres is not a superuser. CREATE DATABASE … OWNER app fails with must be member of role. Supabase keeps the real superuser (supabase_admin) to itself; GRANT app TO postgres first.
  • Logflare needs no BigQuery. It runs on the self-hosted Postgres backend (the _supabase database, _analytics schema) — logs land in _analytics.log_events_*.

None of this is in the README. It’s the gap between “I deployed Supabase” and “I run it.”