Vault on hippotion

🧱 How Do You Isolate Two n8n Tenants on Kubernetes — and Prove Each Wall Holds?

Fri, 19 Dec 2025 00:00:00 +0000

The question

“You’re running n8n for multiple customers on the same Kubernetes cluster. What stops Customer A from reading Customer B’s API keys, calling Customer B’s services, or starving Customer B’s workflows by burning the whole node?”

Three different walls, three different mechanisms. Most articles I’ve read on K8s multi-tenancy list the primitives — namespaces, NetworkPolicies, ResourceQuotas, RBAC — without showing what each one actually catches when you try to cross it. This post does the second part. The receipts are the point.

The setup: two namespaces, web-tenant-acme and web-tenant-globex, each running their own n8n instance on the same node. The only thing keeping them apart is the walls we build around each namespace.

The mental model: subtractive isolation

Kubernetes is a flat network with shared everything by default. You don’t add isolation by writing allow rules. You subtract trust by adding default-deny rules, and then carefully allow back only the connections each tenant actually needs.

A tenant doesn’t have access to another tenant because there is no rule allowing it. The absence of an allow rule is the wall.

Three of these absences make up the picture:

Wall	Primitive	Failure mode when crossed
Network	Cilium NetworkPolicy, default-deny egress	Connection times out (silent drop)
Secret	Vault Kubernetes-auth, per-tenant policy	`403 permission denied` from Vault itself
Resource	ResourceQuota + LimitRange	Pod rejected at admission time

Different layers, different error messages. That’s how you can tell what stopped you.

Wall 1 — Network: Cilium NetworkPolicy

n8n in web-tenant-acme can reach whoami.web-tenant-acme.svc.cluster.local (its own service in its own namespace) but not whoami.web-tenant-globex.svc.cluster.local. The same DNS shape, the same cluster, the same node. One succeeds, the other hangs.

The primitive is a default-deny egress policy applied to every pod in the namespace, with two narrow exceptions: intra-namespace traffic (so n8n can still reach its own service) and DNS to kube-system (otherwise nothing resolves anything).

# Effective policy on every pod in web-tenant-acme:
spec:
  podSelector: {}
  policyTypes: [Egress, Ingress]
  egress:
    - to:                                     # intra-namespace traffic OK
        - podSelector: {}
    - to:                                     # DNS to kube-dns OK
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports: [{port: 53, protocol: UDP}]

There is no rule for web-tenant-globex. Cilium’s eBPF datapath drops the SYN packet on the way out.

The receipt — an n8n HTTP node configured to GET http://whoami.web-tenant-globex.svc.cluster.local/. It hangs for the full timeout, then errors with AxiosError: timeout of 5000ms exceeded / code: ECONNABORTED.

The interesting bit: DNS still works. kube-dns is allowed, so the cross-namespace Service still resolves. The TCP handshake is what gets dropped. That’s a useful signal in real incident response — “DNS resolves but the connection hangs” almost always means a NetworkPolicy is the cause.

Wall 2 — Secret: Vault Kubernetes-auth + ESO

Now imagine Acme’s n8n misbehaves: somebody pushes a workflow that tries to read Globex’s API keys via an ExternalSecret. The network isn’t the issue — both tenants need to reach Vault, so they both have an egress rule for sys-vault. The wall has to be at the identity layer.

Each tenant gets three things:

A dedicated ServiceAccount (n8n-acme, n8n-globex).
A Vault Kubernetes-auth role bound to that SA in that namespace, mapped to a Vault policy that grants read on only its own KV path.
A namespaced External Secrets SecretStore that authenticates as the SA via the Kubernetes TokenRequest API.

# Vault policy: tenant-acme can read its own secrets, nothing else.
path "secret/data/web-tenant-acme"     { capabilities = ["read"] }
path "secret/metadata/web-tenant-acme" { capabilities = ["read"] }

vault write auth/kubernetes/role/tenant-acme \
  bound_service_account_names=n8n-acme \
  bound_service_account_namespaces=web-tenant-acme \
  policies=tenant-acme \
  ttl=1h

When Acme’s n8n tries an ExternalSecret pointing at secret/web-tenant-globex/..., ESO authenticates fine (the SA is valid), Vault recognises the caller, looks up the tenant-acme policy, and answers with the most satisfying line in this whole demo:

URL: GET http://sys-vault.sys-vault.svc.cluster.local:8200/v1/secret/data/web-tenant-globex
Code: 403. Errors:
* permission denied

This is the bit that separates “namespace isolation” from real multi-tenant secret isolation. Plain Kubernetes Secrets + RBAC stop a tenant from listing another tenant’s Secret objects, but the moment you go upstream — to Vault, to a cloud KMS, to an SSM Parameter Store — the secret store needs to enforce identity itself. The network said yes; the secret store still says no.

Wall 3 — Resource: ResourceQuota + LimitRange

The third concern is the noisy neighbour: Acme’s runaway workflow allocating a 4Gi pod and OOM-killing everything else on the node. The network policy doesn’t catch this (no network call), and Vault doesn’t catch this (no secret request). The kernel will, eventually — but you don’t want eventually. You want admission-time rejection.

Two primitives:

apiVersion: v1
kind: ResourceQuota
metadata: { name: tenant-quota, namespace: web-tenant-acme }
spec:
  hard:
    requests.cpu:    "1"
    requests.memory: 1Gi
    limits.cpu:      "2"
    limits.memory:   2Gi
    pods:            "10"
---
apiVersion: v1
kind: LimitRange
metadata: { name: tenant-limits, namespace: web-tenant-acme }
spec:
  limits:
    - type: Container
      default:        { cpu: 500m, memory: 512Mi }
      defaultRequest: { cpu: 50m,  memory: 128Mi }
      max:            { cpu: "2",  memory: 1Gi }

ResourceQuota caps the namespace total. LimitRange bounds any individual container and supplies defaults so pods that don’t declare requests/limits still get reasonable ones — important because a missing limit on a single container can blow past the quota in one allocation.

The receipt — a server-side dry-run of a single 4Gi pod, which never gets created:

$ kubectl apply -n web-tenant-acme --dry-run=server -f noisy-neighbor.yaml
Error from server (Forbidden): error when creating "STDIN":
pods "noisy-neighbor" is forbidden:
  maximum memory usage per Container is 1Gi, but limit is 4Gi

Not a kernel OOMKill. Not a pod stuck in Pending. A flat refusal from the API server before the scheduler even sees the request.

What this does not prove

A homelab demo on one node with two synthetic tenants is not n8n Cloud. The honest gaps:

Execution sandboxing. A workflow can still run arbitrary code via the Code node or shell-outs. These walls stop infrastructure leakage; they don’t sandbox what n8n itself executes. Real n8n Cloud needs more than namespace walls for that — gVisor / Firecracker / per-tenant worker pools are the usual answers, and n8n’s queue mode lends itself to the last.
Pooled worker queues. Queue mode runs main/webhook/worker as separate deployments backed by Redis + Postgres. Two tenants sharing a worker pool need additional checks at the job-routing layer to keep workflows from accessing the wrong tenant’s binary data. Out of scope for the homelab demo.
Control plane. Both tenants reach the same API server. A cluster-admin-equivalent compromise breaks everything. This is the assumption every shared K8s setup makes.
Node-level. Same kernel. Container escape, CPU side channels, the usual list — all apply. For paranoid tenants the answer is dedicated nodes via taints/tolerations or separate clusters entirely.

The demo proves the namespace-shaped walls hold. It does not prove the whole stack is safe against a determined attacker already running code inside a tenant. That’s a different post.

Part of a Kubernetes-on-the-homelab series — previously: preventing a compromised pod from calling your database, GitOps secrets.

🤫 How Do You Handle Secrets in a GitOps Repository?

Fri, 25 Apr 2025 00:00:00 +0000

The question

“You’re using GitOps — everything goes through Git. How do you handle secrets?”

The wrong answer: base64-encode them and commit them as Kubernetes Secret objects. Base64 is not encryption. Anyone with read access to the repo has your secrets. If the repo is public, everyone does.

The slightly better wrong answer: use a private repo and just not think about it. This works until a deploy key leaks, someone joins and then leaves the company, or you need to rotate one secret and have to find every place it’s referenced.

There are three real answers. They make different tradeoffs.

The constraint

The constraint is actually tighter than “don’t commit secrets”. It’s: your Git repo should be safe to make public at any point, and secrets must be rotatable without touching Git.

If rotating a password requires a new commit, someone has to be awake to merge and deploy it. That’s not how you want to handle a 3am incident.

Option 1: External Secrets Operator + Vault

This is the most robust pattern and the one worth knowing for interviews.

The idea: secrets live in a dedicated secret store (HashiCorp Vault, or a cloud equivalent). A Kubernetes operator called ESO watches ExternalSecret CRD objects in the cluster and syncs the referenced secret into a real Kubernetes Secret. The CRD is safe to commit — it says where the secret lives, not what it is.

# This lives in Git — safe to commit
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: myapp-db-credentials
  namespace: myapp
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault
    kind: ClusterSecretStore
  target:
    name: myapp-db-credentials   # the k8s Secret it creates
  data:
    - secretKey: DB_PASSWORD
      remoteRef:
        key: secret/myapp
        property: db-password

Rotation: you update the secret in Vault. ESO syncs it to the cluster within refreshInterval. No Git commit, no deployment. The pod reads the updated Secret on the next restart (or immediately if you mount it as an env var and the app handles SIGHUP).

Audit trail: Vault logs every read and write. You know exactly which service account read which secret at what time.

The cost: you’re running Vault. For a homelab or small team, that’s an extra thing to operate. For production, it’s worth it.

Self-hosted setup:

# ClusterSecretStore — connects ESO to your Vault instance
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
  name: vault
spec:
  provider:
    vault:
      server: "http://sys-vault.sys-vault.svc.cluster.local:8200"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "external-secrets"

ESO authenticates to Vault using the pod’s Kubernetes ServiceAccount token. Vault validates it against the cluster’s token review endpoint. No static credentials anywhere.

Option 2: Sealed Secrets

Sealed Secrets uses asymmetric encryption. The cluster holds a private key. You use the kubeseal CLI to encrypt a secret with the cluster’s public key. The resulting SealedSecret object is safe to commit — only the cluster can decrypt it.

# Encrypt a secret for committing to Git
kubectl create secret generic myapp-db \
  --from-literal=DB_PASSWORD=hunter2 \
  --dry-run=client -o yaml \
  | kubeseal \
  > sealed-secrets/myapp-db.yaml

The resulting YAML looks like:

apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: myapp-db
  namespace: myapp
spec:
  encryptedData:
    DB_PASSWORD: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEq...

This gets committed. The Sealed Secrets controller in the cluster decrypts it and creates the real Secret automatically.

The tradeoff: rotation means re-sealing. You need the cluster’s public key (which is public) and access to the plaintext secret. You commit a new SealedSecret. That’s a Git commit, which means a review, a merge, and a deploy. For a 3am incident, that’s a lot of friction.

Also: if the cluster’s private key is lost, you can’t decrypt any of your sealed secrets. Back up the private key.

Good fit for: small teams, homelab, situations where secrets change rarely and the GitOps review process is actually desirable.

Option 3: SOPS

SOPS (Secrets OPerationS) encrypts files at rest using age keys or cloud KMS. You commit encrypted files. CI decrypts them during deployment using a key it holds in memory (not stored in Git).

# Encrypt a file for Git
sops --encrypt --age age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8q \
  secrets/myapp.yaml > secrets/myapp.enc.yaml

# In CI: decrypt to temp file, apply, delete
sops --decrypt secrets/myapp.enc.yaml | kubectl apply -f -

The difference from Sealed Secrets: SOPS encrypts at the file level, not the k8s object level. You can use it outside of Kubernetes (application configs, Terraform variables). The key can live in the CI environment, a cloud KMS, or a personal age key.

The tradeoff: CI needs the decryption key, which puts you back in “secret in CI” territory — just for the encryption key rather than the actual secrets. If you use a cloud KMS, OIDC federation handles that (no stored key). If you use an age key, it lives in CI secrets.

Good fit for: teams already using Helm and Helm Secrets, polyglot environments where not everything is Kubernetes, small teams where Vault feels like overengineering.

Comparison

	ESO + Vault	Sealed Secrets	SOPS
Rotation without Git commit	Yes	No	Depends
Audit trail	Full (Vault)	None	Depends on KMS
Complexity	High	Low	Medium
Works outside k8s	With effort	No	Yes
Recovery if key lost	Vault backup	Lose all secrets	Key backup
CI needs secret material	No	No	Yes (decrypt key)

What interviewers are actually testing

The interesting follow-up question is: “How do you rotate a secret without downtime?”

The answer requires you to understand that pods mount Secret objects at startup. Updating the Secret in Kubernetes doesn’t automatically restart the pod. Your options are:

Mount the secret as a volume and have the app watch for file changes (good)
Restart the deployment after rotation (kubectl rollout restart, automatable)
Use a sidecar like Vault Agent Injector that handles refresh in-process (complex but zero-restart)

The correct answer depends on the app. An API key that can be rotated gradually is different from a database password where the old one is invalidated immediately.

This is part of a series on Kubernetes interview questions. Previously: deploying without cluster credentials. Next: zero-downtime deployments.