<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Devops-Interviews on hippotion</title><link>https://blog.hippotion.com/tags/devops-interviews/</link><description>Recent content in Devops-Interviews on hippotion</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 20 Jun 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.hippotion.com/tags/devops-interviews/index.xml" rel="self" type="application/rss+xml"/><item><title>⚡ Your Deployment Causes 30 Seconds of Downtime. What Went Wrong?</title><link>https://blog.hippotion.com/posts/k8s-zero-downtime/</link><pubDate>Fri, 20 Jun 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-zero-downtime/</guid><description>Kubernetes rolling updates don&amp;rsquo;t give you zero-downtime for free. There are four separate things you have to get right, and most clusters get at least one wrong.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;How do you achieve zero-downtime deployments in Kubernetes?&rdquo;</em></p>
<p>The expected answer: rolling updates. That&rsquo;s correct but incomplete. Rolling updates are the mechanism. They don&rsquo;t give you zero downtime automatically — they give you a framework in which zero downtime is achievable, if you configure everything correctly.</p>
<p>Most clusters cause brief downtime on every deployment. Usually 5–30 seconds. Usually blamed on &ldquo;the load balancer&rdquo; or &ldquo;DNS&rdquo;. Almost always caused by one of four missing pieces.</p>
<hr>
<h2 id="the-rolling-update-baseline">The rolling update baseline</h2>
<p>Kubernetes replaces pods in waves. You control the pace:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">strategy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">RollingUpdate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">rollingUpdate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">maxSurge</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">        </span><span class="c"># how many extra pods can exist during update</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">maxUnavailable</span><span class="p">:</span><span class="w"> </span><span class="m">0</span><span class="w">  </span><span class="c"># how many pods can be unavailable during update</span><span class="w">
</span></span></span></code></pre></div><p><code>maxUnavailable: 0</code> means Kubernetes never terminates a pod until a replacement is ready. This prevents the obvious failure mode where you have zero running pods mid-deployment.</p>
<p><code>maxSurge: 1</code> means one extra pod beyond the desired count runs during the update. For a deployment with 3 replicas, you&rsquo;ll briefly have 4 pods running.</p>
<p>This alone doesn&rsquo;t prevent downtime.</p>
<hr>
<h2 id="piece-1-the-readiness-probe-the-most-common-missing-piece">Piece 1: The readiness probe (the most common missing piece)</h2>
<p>Kubernetes considers a pod &ldquo;ready&rdquo; when all its containers pass their readiness probes. If you don&rsquo;t define a readiness probe, Kubernetes considers the pod ready as soon as the container starts. Containers start before applications are ready to serve traffic.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Without this, traffic arrives before your app is listening</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">readinessProbe</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">httpGet</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">/healthz</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">initialDelaySeconds</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">periodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">failureThreshold</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span></code></pre></div><p>What happens without it: Kubernetes starts the new pod, marks it ready immediately, adds it to the Service endpoints, routes traffic to it — while your app is still initialising (loading config, connecting to the database, warming caches). The first few requests to the new pod fail or time out.</p>
<p>The fix: define a readiness probe that actually checks application readiness. An HTTP endpoint that returns 200 only after the app has finished starting is the minimum. A deeper check that verifies the database connection is better.</p>
<p>Common mistake: using the same endpoint for liveness and readiness with the same thresholds. They serve different purposes:</p>
<ul>
<li><strong>Readiness</strong>: &ldquo;am I ready to accept traffic?&rdquo; — controls whether traffic is sent</li>
<li><strong>Liveness</strong>: &ldquo;am I still alive?&rdquo; — controls whether the pod is restarted</li>
</ul>
<p>A pod can fail its readiness probe (temporarily overloaded, warming up) without failing its liveness probe. If you make liveness too aggressive, Kubernetes restarts pods that would have recovered on their own.</p>
<hr>
<h2 id="piece-2-the-termination-grace-period-the-other-common-missing-piece">Piece 2: The termination grace period (the other common missing piece)</h2>
<p>When Kubernetes wants to terminate a pod, it sends <code>SIGTERM</code>. Your application has <code>terminationGracePeriodSeconds</code> (default: 30) to finish in-flight requests and shut down cleanly. After that, Kubernetes sends <code>SIGKILL</code>.</p>
<p>The problem: there&rsquo;s a race condition. Kubernetes removes the pod from the Service endpoints and sends <code>SIGTERM</code> roughly simultaneously. The endpoint update has to propagate through the control plane, kube-proxy, and the load balancer. During that propagation window — typically 1–10 seconds — traffic can still arrive at a pod that has already started shutting down.</p>
<p>The fix is a <code>preStop</code> hook that adds a short sleep before the termination sequence:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">lifecycle</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">preStop</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">exec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&#34;sleep&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;5&#34;</span><span class="p">]</span><span class="w">
</span></span></span></code></pre></div><p>This gives the endpoint removal time to propagate before your app receives <code>SIGTERM</code>. The total shutdown sequence is then:</p>
<ol>
<li>Kubernetes removes pod from endpoints</li>
<li><code>preStop</code> hook runs (sleep 5s — enough for endpoint propagation)</li>
<li><code>SIGTERM</code> is sent</li>
<li>App drains in-flight requests and shuts down</li>
<li>If still running after <code>terminationGracePeriodSeconds</code>: <code>SIGKILL</code></li>
</ol>
<p>Set <code>terminationGracePeriodSeconds</code> to cover the sleep plus your app&rsquo;s actual shutdown time:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">terminationGracePeriodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">60</span><span class="w">  </span><span class="c"># 5s preStop + up to 55s for app shutdown</span><span class="w">
</span></span></span></code></pre></div><p>Without the sleep: requests fail during the propagation window. With it: the window is covered.</p>
<hr>
<h2 id="piece-3-poddisruptionbudgets-for-node-maintenance">Piece 3: PodDisruptionBudgets (for node maintenance)</h2>
<p>Rolling updates handle normal deployments. Node drains (<code>kubectl drain</code>, cloud provider maintenance windows, k3s upgrades) are a different code path that bypasses your rolling update strategy entirely.</p>
<p>When a node is drained, Kubernetes evicts all pods on it as fast as it can. Without constraints, it will evict all replicas of your deployment simultaneously if they all happen to land on the same node.</p>
<p>A <code>PodDisruptionBudget</code> sets a floor:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">policy/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">PodDisruptionBudget</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-pdb</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">minAvailable</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">   </span><span class="c"># at least 1 replica must stay up during disruption</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">selector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span></code></pre></div><p>Now node drain will evict pods one at a time, waiting for replacement pods to come up before evicting the next one. If no replacement can be scheduled (e.g., you&rsquo;re draining the only node), the drain will block rather than cause downtime.</p>
<p><code>minAvailable: 1</code> is the minimum. For production with 3+ replicas, <code>minAvailable: 2</code> or <code>maxUnavailable: 1</code> is more appropriate.</p>
<hr>
<h2 id="piece-4-minreadyseconds-the-one-everyone-forgets">Piece 4: minReadySeconds (the one everyone forgets)</h2>
<p>Even with a correct readiness probe, there&rsquo;s a subtle risk: a pod that passes its readiness probe briefly and then fails due to a transient startup issue (flapping). Kubernetes would add it to the endpoint pool, route traffic to it, watch it fail the readiness probe, remove it — and during that window, some requests fail.</p>
<p><code>minReadySeconds</code> says: a pod must pass its readiness probe continuously for this many seconds before Kubernetes considers it &ldquo;available&rdquo; and allows the next pod in the rolling update to be terminated:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">minReadySeconds</span><span class="p">:</span><span class="w"> </span><span class="m">10</span><span class="w">
</span></span></span></code></pre></div><p>This slows deployments slightly but catches flapping probes before they cause production traffic to hit an unstable pod.</p>
<hr>
<h2 id="the-complete-deployment-snippet">The complete deployment snippet</h2>
<p>Putting it together:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">apps/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">minReadySeconds</span><span class="p">:</span><span class="w"> </span><span class="m">10</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">strategy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">RollingUpdate</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">rollingUpdate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">maxSurge</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">maxUnavailable</span><span class="p">:</span><span class="w"> </span><span class="m">0</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">selector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">terminationGracePeriodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">60</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">myapp:latest</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">lifecycle</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">preStop</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">exec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&#34;sleep&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;5&#34;</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">readinessProbe</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">httpGet</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">/healthz</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">initialDelaySeconds</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">periodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">failureThreshold</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">livenessProbe</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">httpGet</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">/healthz</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">initialDelaySeconds</span><span class="p">:</span><span class="w"> </span><span class="m">15</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">periodSeconds</span><span class="p">:</span><span class="w"> </span><span class="m">10</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">failureThreshold</span><span class="p">:</span><span class="w"> </span><span class="m">5</span><span class="w">
</span></span></span></code></pre></div><p>And the PDB alongside it:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">policy/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">PodDisruptionBudget</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-pdb</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">minAvailable</span><span class="p">:</span><span class="w"> </span><span class="m">2</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">selector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span></code></pre></div><hr>
<h2 id="what-interviewers-are-actually-testing">What interviewers are actually testing</h2>
<p>The follow-up is usually: <em>&ldquo;What if your new version has a bug that isn&rsquo;t caught immediately — how do you roll back?&rdquo;</em></p>
<p><code>kubectl rollout undo deployment/myapp</code> reverts to the previous ReplicaSet. Kubernetes stores the last few ReplicaSets by default (<code>revisionHistoryLimit</code>, default 10). The rollback uses the same rolling update mechanism, so it&rsquo;s also zero-downtime.</p>
<p>The harder follow-up: <em>&ldquo;What if the bug only shows up after 10 minutes of load?&rdquo;</em> That&rsquo;s where you need a canary deployment — send a small percentage of traffic to the new version, observe, then shift the rest. Argo Rollouts handles this natively. Without it, you&rsquo;re doing it manually with two Deployments and weighted Services.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Previously: <a href="/posts/k8s-gitops-secrets/">secrets in a GitOps repo</a>. Next: <a href="/posts/k8s-network-isolation/">network isolation between services</a>.</em></p>
]]></content:encoded></item><item><title>🔄 Someone kubectl apply'd a Hotfix Directly. How Do You Detect and Prevent It?</title><link>https://blog.hippotion.com/posts/k8s-config-drift/</link><pubDate>Fri, 06 Jun 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-config-drift/</guid><description>Manual kubectl in production is the Kubernetes equivalent of SSH&amp;rsquo;ing into a server and editing files. It works until it doesn&amp;rsquo;t, and when it doesn&amp;rsquo;t, nobody knows why.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;How do you prevent configuration drift in a Kubernetes cluster?&rdquo;</em></p>
<p>Configuration drift: the cluster&rsquo;s actual state diverges from what&rsquo;s declared in your source of truth. Someone runs <code>kubectl edit deployment myapp</code> to bump a memory limit during an incident. Someone adds a debug sidecar directly. Someone applies a YAML file from their laptop that was never committed to Git. The fix works. It goes undocumented. Six months later, a new deployment overwrites it. The incident recurs.</p>
<p>There are two distinct problems here that require different solutions:</p>
<ol>
<li><strong>Detection and remediation</strong>: how do you notice drift and revert it?</li>
<li><strong>Prevention</strong>: how do you stop non-compliant resources from being created in the first place?</li>
</ol>
<hr>
<h2 id="detection-and-remediation-argo-cd-selfheal">Detection and remediation: Argo CD selfHeal</h2>
<p>If you&rsquo;re using GitOps with Argo CD, detection and remediation are handled for you:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">syncPolicy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">automated</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">prune</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">selfHeal</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p><code>selfHeal: true</code> means Argo CD continuously compares the cluster state to the Git repo and reverts any divergence. Someone runs <code>kubectl edit deployment myapp</code> and changes the replica count? Argo CD detects the diff on its next reconciliation cycle (default: every 3 minutes) and reverts it.</p>
<p><code>prune: true</code> means resources that exist in the cluster but not in Git are deleted. Someone <code>kubectl apply</code>&rsquo;d a debug pod directly? Gone on the next sync.</p>
<p>This is the audit trail story too. Every legitimate change is a Git commit with an author, a timestamp, and a commit message. Everything that isn&rsquo;t in Git doesn&rsquo;t survive past the next reconciliation. If you want to know what changed and when, <code>git log</code> is the answer.</p>
<hr>
<h2 id="the-gap-selfheal-doesnt-close">The gap selfHeal doesn&rsquo;t close</h2>
<p><code>selfHeal</code> reverts drift after the fact. There&rsquo;s a window — up to 3 minutes — where a drifted resource is serving traffic. For most changes, that&rsquo;s fine. For a bad resource (wrong RBAC, missing network policy, container running as root), 3 minutes is enough to be a problem.</p>
<p>The other gap: <code>selfHeal</code> doesn&rsquo;t tell you <em>who</em> made the change or generate an alert. It just silently fixes it. You need audit logging (<code>kube-apiserver --audit-log-path</code>) or an alerting rule on Argo CD&rsquo;s health events to know that drift happened.</p>
<hr>
<h2 id="prevention-kyverno">Prevention: Kyverno</h2>
<p>Kyverno is a policy engine that runs as a Kubernetes admission webhook. Every resource creation or modification goes through it before being persisted. If the resource violates a policy, Kyverno can reject it outright (enforce mode) or allow it with a warning (audit mode).</p>
<p>The policies are Kubernetes resources themselves — they live in Git, they&rsquo;re applied via GitOps, they&rsquo;re versioned. No separate policy language to learn.</p>
<p>A policy that requires readiness probes on all Deployments:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">require-readiness-probe</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">validationFailureAction</span><span class="p">:</span><span class="w"> </span><span class="l">Enforce</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">check-readiness-probe</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span>- <span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">validate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Deployments must define a readiness probe.&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">pattern</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                  </span>- <span class="nt">(name)</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;*&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                    </span><span class="nt">readinessProbe</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                      </span><span class="nt">(httpGet | tcpSocket | exec)</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;*&#34;</span><span class="w">
</span></span></span></code></pre></div><p>With this policy active: <code>kubectl apply -f deployment-without-probe.yaml</code> is rejected at the API server. The error message is the one you defined in <code>message</code>. The deployment never reaches etcd.</p>
<p>A policy that blocks containers running as root:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">disallow-root-containers</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">validationFailureAction</span><span class="p">:</span><span class="w"> </span><span class="l">Enforce</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">check-runAsNonRoot</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Deployment, StatefulSet, DaemonSet]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">validate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;Containers must not run as root.&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">pattern</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                  </span>- <span class="nt">(name)</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;*&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                    </span><span class="nt">securityContext</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                      </span><span class="nt">runAsNonRoot</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p>A policy that enforces resource limits (common in multi-tenant clusters):</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">require-resource-limits</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">validationFailureAction</span><span class="p">:</span><span class="w"> </span><span class="l">Enforce</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">check-limits</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Deployment]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">validate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;CPU and memory limits are required.&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">pattern</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                  </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                      </span><span class="nt">limits</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                        </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;?*&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                        </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;?*&#34;</span><span class="w">
</span></span></span></code></pre></div><hr>
<h2 id="kyverno-can-also-mutate-and-generate">Kyverno can also mutate and generate</h2>
<p>Policies aren&rsquo;t only for validation. Kyverno can mutate incoming resources (add default labels, inject sidecars, set default resource requests) and generate new resources in response to events (create a NetworkPolicy whenever a new namespace is created).</p>
<p>Auto-add a standard label to every Deployment:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">add-labels</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">add-team-label</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Deployment]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">mutate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">patchStrategicMerge</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">managed-by</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno</span><span class="w">
</span></span></span></code></pre></div><p>Auto-create a default NetworkPolicy when a namespace is created:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kyverno.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">add-default-networkpolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">default-deny</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">match</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">any</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kinds</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">Namespace]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">generate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">NetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">default-deny-all</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;{{request.object.metadata.name}}&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">podSelector</span><span class="p">:</span><span class="w"> </span>{}<span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span>- <span class="l">Ingress</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span>- <span class="l">Egress</span><span class="w">
</span></span></span></code></pre></div><hr>
<h2 id="the-complete-drift-prevention-picture">The complete drift prevention picture</h2>
<pre tabindex="0"><code>Developer runs: kubectl apply -f bad-deployment.yaml
  → API server receives request
  → Kyverno admission webhook intercepts
  → Policy check: no readiness probe → Rejected
  → API server returns 403 with Kyverno&#39;s message
  → Resource never reaches etcd

Developer runs: kubectl edit deployment myapp (valid change, just not via Git)
  → Edit succeeds (no policy violation)
  → Argo CD reconciliation fires (within 3 minutes)
  → Diff detected: cluster state ≠ Git state
  → selfHeal: revert to Git state
  → If audit logging enabled: event recorded with username and timestamp
</code></pre><p>Git is the audit trail for what <em>should</em> be there. kube-apiserver audit logs are the trail for what <em>was attempted</em>. Kyverno is the enforcer at admission time. Argo CD is the continuous reconciler. Four layers, each with a different job.</p>
<hr>
<h2 id="what-interviewers-are-actually-testing">What interviewers are actually testing</h2>
<p>The follow-up is usually: <em>&ldquo;What&rsquo;s the difference between Kyverno and OPA Gatekeeper?&rdquo;</em></p>
<p>Both are admission webhook policy engines. The practical differences:</p>
<ul>
<li><strong>Kyverno</strong>: policies are k8s-native YAML, no separate language to learn. Generate and mutate policies built in. Easier to get started with.</li>
<li><strong>OPA Gatekeeper</strong>: policies are written in Rego, a purpose-built policy language that&rsquo;s more expressive but has a steeper learning curve. Better if you&rsquo;re already using OPA elsewhere (Terraform, microservice authorization).</li>
</ul>
<p>For a Kubernetes-only environment, Kyverno is the pragmatic choice. For a platform team that uses OPA across the stack, Gatekeeper gives you policy consistency.</p>
<p>The deeper follow-up: <em>&ldquo;How do you test policies before enforcing them?&rdquo;</em> Use <code>Audit</code> mode first (<code>validationFailureAction: Audit</code>). Violations are logged as PolicyReport objects but requests aren&rsquo;t rejected. Review the reports, fix the existing violations, then switch to <code>Enforce</code>. Never flip directly to Enforce in production — you&rsquo;ll break things that were already running.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Previously: <a href="/posts/k8s-network-isolation/">network isolation between services</a>.</em></p>
]]></content:encoded></item><item><title>🛡️ How Do You Prevent a Compromised Pod From Calling Your Database?</title><link>https://blog.hippotion.com/posts/k8s-network-isolation/</link><pubDate>Fri, 23 May 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-network-isolation/</guid><description>Default Kubernetes is a flat network. Every pod can reach every other pod. In a cluster with ten services, that&amp;rsquo;s ten potential blast radiuses instead of one.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;How do you enforce network isolation between services in a Kubernetes cluster?&rdquo;</em></p>
<p>The default Kubernetes network model is flat. Every pod can reach every other pod, in any namespace, on any port. There are no firewalls, no ACLs, no segmentation. A compromised frontend pod can connect directly to your PostgreSQL port, your Redis port, your internal admin API, and every other service in the cluster.</p>
<p>This is intentional — Kubernetes doesn&rsquo;t assume you want isolation, because not everyone does. But if you do want it, you need to add it.</p>
<hr>
<h2 id="networkpolicy-the-primitive">NetworkPolicy: the primitive</h2>
<p>A <code>NetworkPolicy</code> is a Kubernetes resource that selects a set of pods and defines what traffic is allowed to reach them (ingress) and what traffic they&rsquo;re allowed to send (egress). Traffic that isn&rsquo;t explicitly allowed is dropped.</p>
<p>The catch: <code>NetworkPolicy</code> resources have no effect unless your CNI plugin supports them. The default k3s CNI (Flannel) does not. Calico, Cilium, and Canal do. If you&rsquo;re running Flannel and you apply a NetworkPolicy, it will be silently ignored — no error, no warning.</p>
<hr>
<h2 id="the-default-deny-pattern">The default-deny pattern</h2>
<p>The correct starting point is a default-deny policy that blocks everything, applied to the namespace. You then add explicit allow policies for the traffic you actually need.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Block all ingress and egress in this namespace by default</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">networking.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">NetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">default-deny-all</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">podSelector</span><span class="p">:</span><span class="w"> </span>{}<span class="w">        </span><span class="c"># matches all pods in the namespace</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">Ingress</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">Egress</span><span class="w">
</span></span></span></code></pre></div><p>With this in place, your pods can&rsquo;t receive traffic and can&rsquo;t send traffic. You then add back what you need.</p>
<hr>
<h2 id="allowing-specific-traffic">Allowing specific traffic</h2>
<p>Allow the web frontend to receive traffic from the ingress controller:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">networking.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">NetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">allow-ingress-from-traefik</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">podSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">frontend</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">Ingress</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">ingress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">from</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">namespaceSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kubernetes.io/metadata.name</span><span class="p">:</span><span class="w"> </span><span class="l">sys-traefik</span><span class="w">
</span></span></span></code></pre></div><p>Allow the backend to talk to PostgreSQL:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">networking.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">NetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">allow-egress-to-postgres</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">podSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">backend</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">Egress</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">egress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">to</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">podSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">postgres</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">5432</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span></code></pre></div><p>After these two policies: the frontend receives traffic from Traefik, and the backend can reach Postgres. The frontend cannot reach Postgres. The backend cannot receive traffic from the ingress controller. Neither can call anything else.</p>
<hr>
<h2 id="the-dns-gotcha">The DNS gotcha</h2>
<p>Once you add a default-deny egress policy, DNS stops working. Your pods can no longer resolve service names because they can&rsquo;t reach <code>kube-dns</code> in the <code>kube-system</code> namespace.</p>
<p>You need to explicitly allow it:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">networking.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">NetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">allow-egress-dns</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">podSelector</span><span class="p">:</span><span class="w"> </span>{}<span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">policyTypes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="l">Egress</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">egress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">to</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">namespaceSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">kubernetes.io/metadata.name</span><span class="p">:</span><span class="w"> </span><span class="l">kube-system</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">53</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">UDP</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">53</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span></code></pre></div><p>Missing this is the most common reason &ldquo;everything broke after I added NetworkPolicies&rdquo;. Add it to every namespace that has a default-deny policy.</p>
<hr>
<h2 id="cilium-the-same-model-with-more-power">Cilium: the same model with more power</h2>
<p>Cilium implements the standard <code>NetworkPolicy</code> API and adds its own <code>CiliumNetworkPolicy</code> CRD with L7 capabilities.</p>
<p>Standard NetworkPolicy works at L3/L4 — IP addresses and ports. Cilium&rsquo;s CRD adds:</p>
<p><strong>L7 HTTP filtering</strong>: allow specific HTTP methods and paths, not just port 8080.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">cilium.io/v2</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">CiliumNetworkPolicy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">allow-api-reads</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">endpointSelector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">api</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">ingress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">fromEndpoints</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">matchLabels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">frontend</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">toPorts</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span>- <span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;8080&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">rules</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">http</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span>- <span class="nt">method</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;GET&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;/api/v1/.*&#34;</span><span class="w">
</span></span></span></code></pre></div><p><strong>DNS-based egress</strong>: allow egress to <code>github.com</code> by hostname rather than IP address. This matters for external services with dynamic IPs.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">egress</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">toFQDNs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">matchName</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;github.com&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">toPorts</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span>- <span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;443&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span></code></pre></div><p><strong>Identity-based policies</strong>: Cilium assigns a cryptographic identity to each pod based on its labels. Policies are enforced by identity, not IP address. Pod restarts (which change IPs) don&rsquo;t break policy enforcement.</p>
<hr>
<h2 id="what-a-real-namespace-policy-set-looks-like">What a real namespace policy set looks like</h2>
<p>For a typical web app with frontend, backend, and database:</p>
<pre tabindex="0"><code>Namespace: myapp
├── default-deny-all (ingress + egress, all pods)
├── allow-egress-dns (egress, all pods, port 53)
├── allow-ingress-frontend (ingress frontend, from sys-traefik namespace)
├── allow-egress-frontend-to-backend (egress frontend, to backend:8080)
├── allow-ingress-backend (ingress backend, from frontend)
├── allow-egress-backend-to-postgres (egress backend, to postgres:5432)
└── allow-ingress-postgres (ingress postgres, from backend)
</code></pre><p>Eight policies. The database has exactly one inbound path: from the backend. The frontend has no path to the database at all. A compromised frontend pod cannot scan the internal network — egress to arbitrary destinations is blocked.</p>
<hr>
<h2 id="what-interviewers-are-actually-testing">What interviewers are actually testing</h2>
<p>The follow-up is usually: <em>&ldquo;How do you manage this at scale? Writing NetworkPolicies for every namespace by hand doesn&rsquo;t scale.&rdquo;</em></p>
<p>The answer: you don&rsquo;t write them by hand. You template them. In a GitOps setup, your namespace configuration declares what network access the service needs in a structured form, and a Helm chart or operator generates the actual NetworkPolicy resources from those declarations.</p>
<p>For example, an <code>applications.yml</code> entry might look like:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">networkPolicies</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">denyAll</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">allowIngressFromIngress</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">allowEgressToNamespaces</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&#34;sys-postgres&#34;</span><span class="p">]</span><span class="w">
</span></span></span></code></pre></div><p>And a Helm chart translates that into four concrete NetworkPolicy objects. The developer declares intent; the platform enforces it. No one writes raw YAML for each namespace.</p>
<p>The second follow-up: <em>&ldquo;What about east-west traffic between services in the same namespace?&rdquo;</em> Add <code>allowIntraNamespace: true</code> as a flag that generates a policy allowing all pod-to-pod traffic within the namespace, while still blocking cross-namespace traffic.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Previously: <a href="/posts/k8s-zero-downtime/">zero-downtime deployments</a>. Next: <a href="/posts/k8s-config-drift/">preventing configuration drift</a>.</em></p>
]]></content:encoded></item><item><title>🔑 Deploy to Kubernetes Without Storing Any Cluster Credentials in CI</title><link>https://blog.hippotion.com/posts/k8s-cicd-no-credentials/</link><pubDate>Fri, 09 May 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-cicd-no-credentials/</guid><description>A common interview question in 2026. If your answer is &amp;lsquo;kubeconfig in a CI secret&amp;rsquo;, you&amp;rsquo;re not wrong — but you&amp;rsquo;re also not getting the job.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;How would you design a CI/CD pipeline that deploys to Kubernetes without storing any cluster credentials anywhere?&rdquo;</em></p>
<p>The expected wrong answer: export your kubeconfig, base64-encode it, paste it into a CI secret named <code>KUBE_CONFIG</code>, and call it a day. This works. Most clusters that got hacked had this setup.</p>
<p>There are two correct answers in 2026, and which one you reach for depends on what you&rsquo;re actually deploying.</p>
<hr>
<h2 id="answer-1-gitops-the-one-your-interviewer-probably-wants">Answer 1: GitOps (the one your interviewer probably wants)</h2>
<p>In a GitOps setup, your CI pipeline never touches the cluster. It can&rsquo;t leak credentials it doesn&rsquo;t have.</p>
<p>The flow:</p>
<pre tabindex="0"><code>Developer pushes code
  → CI builds and tests
  → CI updates the image tag in the Git repo (a commit, not a kubectl command)
  → Argo CD detects the change
  → Argo CD applies it to the cluster
</code></pre><p>The cluster reaches out to Git. CI never reaches into the cluster. The only thing with cluster credentials is Argo CD itself — running inside the cluster, with no credentials to leak externally.</p>
<p>For self-hosted setups on Hetzner or Vultr, this is particularly clean because there&rsquo;s no cloud IAM to configure. You point Argo CD at your GitLab repo, tell it which branch to watch, and you&rsquo;re done.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># The Argo CD Application CRD — the only thing you need</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">argoproj.io/v1alpha1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Application</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">argocd</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">source</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">repoURL</span><span class="p">:</span><span class="w"> </span><span class="l">https://gitlab.example.com/myorg/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">targetRevision</span><span class="p">:</span><span class="w"> </span><span class="l">main</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">helm-charts/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">destination</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">server</span><span class="p">:</span><span class="w"> </span><span class="l">https://kubernetes.default.svc</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">syncPolicy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">automated</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">prune</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">selfHeal</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><p><code>selfHeal: true</code> means if someone manually <code>kubectl apply</code>s something, Argo CD reverts it. The Git repo is the only source of truth.</p>
<p>The CI image-tag update step looks like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># .gitlab-ci.yml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">stage</span><span class="p">:</span><span class="w"> </span><span class="l">deploy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">script</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">      # Update the image tag in values.yaml and push
</span></span></span><span class="line"><span class="cl"><span class="sd">      sed -i &#34;s/tag: .*/tag: ${CI_COMMIT_SHORT_SHA}/&#34; values/myapp.yml
</span></span></span><span class="line"><span class="cl"><span class="sd">      git config user.email &#34;ci@example.com&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      git config user.name &#34;CI&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      git add values/myapp.yml
</span></span></span><span class="line"><span class="cl"><span class="sd">      git commit -m &#34;chore: bump myapp to ${CI_COMMIT_SHORT_SHA}&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      git push</span><span class="w">
</span></span></span></code></pre></div><p>CI needs write access to the Git repo — but that&rsquo;s a deploy key, not a cluster credential. If it leaks, someone can push code. You&rsquo;d rotate the deploy key and audit the commits. If a cluster credential leaks, someone owns your cluster.</p>
<hr>
<h2 id="answer-2-oidc-federation-for-when-you-genuinely-need-push-based">Answer 2: OIDC federation (for when you genuinely need push-based)</h2>
<p>Some operations don&rsquo;t fit the GitOps model. Infrastructure provisioning (<code>terraform apply</code>), one-off database migrations, or initial cluster bootstrapping — these need direct cluster access. The correct pattern here is OIDC federation.</p>
<p>The idea: your CI platform (GitLab, GitHub Actions) already issues JWT tokens to every job. These JWTs are signed by the CI platform and contain claims like which repo, which branch, which pipeline triggered the job. You configure your Kubernetes API server to trust those JWTs, and the CI job authenticates directly using the token it already has.</p>
<p>No stored credentials. Every job gets a fresh token. The token expires when the job ends.</p>
<p>For a self-hosted GitLab, configure your k8s API server to trust GitLab as an OIDC issuer:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># /etc/rancher/k3s/config.yaml (or kube-apiserver flags)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kube-apiserver-arg</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-issuer-url=https://gitlab.example.com&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-client-id=your_client_id&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-username-claim=sub&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="s2">&#34;oidc-groups-claim=groups_direct&#34;</span><span class="w">
</span></span></span></code></pre></div><p>Then create a <code>ClusterRoleBinding</code> that maps a specific GitLab identity to a Kubernetes role:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">rbac.authorization.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterRoleBinding</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">gitlab-ci-deployer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">subjects</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">User</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;project_path:myorg/myapp:ref_type:branch:ref:main&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">apiGroup</span><span class="p">:</span><span class="w"> </span><span class="l">rbac.authorization.k8s.io</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">roleRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterRole</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">deploy-role</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">apiGroup</span><span class="p">:</span><span class="w"> </span><span class="l">rbac.authorization.k8s.io</span><span class="w">
</span></span></span></code></pre></div><p>The subject name is the <code>sub</code> claim from the GitLab JWT — it encodes the repo path and branch. Only jobs running on <code>main</code> in <code>myorg/myapp</code> get this binding. A job on a feature branch gets nothing.</p>
<p>In the CI job:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">stage</span><span class="p">:</span><span class="w"> </span><span class="l">deploy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">id_tokens</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">K8S_TOKEN</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">aud</span><span class="p">:</span><span class="w"> </span><span class="l">your_client_id</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">script</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl config set-credentials gitlab-ci \
</span></span></span><span class="line"><span class="cl"><span class="sd">        --token=&#34;${K8S_TOKEN}&#34;
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl config set-context deploy \
</span></span></span><span class="line"><span class="cl"><span class="sd">        --cluster=mycluster \
</span></span></span><span class="line"><span class="cl"><span class="sd">        --user=gitlab-ci
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl config use-context deploy
</span></span></span><span class="line"><span class="cl"><span class="sd">      kubectl rollout restart deployment/myapp -n myapp</span><span class="w">
</span></span></span></code></pre></div><p>The token in <code>K8S_TOKEN</code> is injected by GitLab. It expires with the job. The API server validates the signature against GitLab&rsquo;s JWKS endpoint on every request.</p>
<hr>
<h2 id="which-one-to-use">Which one to use</h2>
<table>
	<thead>
			<tr>
					<th></th>
					<th>GitOps</th>
					<th>OIDC federation</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>CI needs cluster access</td>
					<td>No</td>
					<td>Yes (short-lived token)</td>
			</tr>
			<tr>
					<td>Audit trail</td>
					<td>Git history</td>
					<td>kube-apiserver audit log</td>
			</tr>
			<tr>
					<td>Revocability</td>
					<td>Revert the commit</td>
					<td>Token expires with the job</td>
			</tr>
			<tr>
					<td>Self-hosted setup effort</td>
					<td>Low</td>
					<td>Moderate (OIDC config)</td>
			</tr>
			<tr>
					<td>Works for infra provisioning</td>
					<td>Not really</td>
					<td>Yes</td>
			</tr>
	</tbody>
</table>
<p>For application deployments: GitOps. The cluster reconciles continuously, drift is impossible, and CI is completely decoupled from cluster state.</p>
<p>For infrastructure provisioning or one-off operations: OIDC federation. Short-lived credentials, branch-scoped permissions, nothing to rotate.</p>
<p>What you should never do: store a kubeconfig or a long-lived ServiceAccount token in CI secrets. Not because it&rsquo;s hard to make work — it&rsquo;s easy — but because the blast radius of a leak is unbounded, there&rsquo;s no audit trail, and there&rsquo;s no expiry. Everything that goes wrong with static secrets goes wrong eventually.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Next: <a href="/posts/k8s-gitops-secrets/">how to handle secrets in a GitOps repository</a>.</em></p>
]]></content:encoded></item><item><title>🤫 How Do You Handle Secrets in a GitOps Repository?</title><link>https://blog.hippotion.com/posts/k8s-gitops-secrets/</link><pubDate>Fri, 25 Apr 2025 00:00:00 +0000</pubDate><guid>https://blog.hippotion.com/posts/k8s-gitops-secrets/</guid><description>GitOps says Git is the source of truth. Secrets say don&amp;rsquo;t put them in Git. These two things appear to be in direct conflict. They&amp;rsquo;re not.</description><content:encoded><![CDATA[<h2 id="the-question">The question</h2>
<p><em>&ldquo;You&rsquo;re using GitOps — everything goes through Git. How do you handle secrets?&rdquo;</em></p>
<p>The wrong answer: base64-encode them and commit them as Kubernetes <code>Secret</code> objects. Base64 is not encryption. Anyone with read access to the repo has your secrets. If the repo is public, everyone does.</p>
<p>The slightly better wrong answer: use a private repo and just not think about it. This works until a deploy key leaks, someone joins and then leaves the company, or you need to rotate one secret and have to find every place it&rsquo;s referenced.</p>
<p>There are three real answers. They make different tradeoffs.</p>
<hr>
<h2 id="the-constraint">The constraint</h2>
<p>The constraint is actually tighter than &ldquo;don&rsquo;t commit secrets&rdquo;. It&rsquo;s: <strong>your Git repo should be safe to make public at any point</strong>, and <strong>secrets must be rotatable without touching Git</strong>.</p>
<p>If rotating a password requires a new commit, someone has to be awake to merge and deploy it. That&rsquo;s not how you want to handle a 3am incident.</p>
<hr>
<h2 id="option-1-external-secrets-operator--vault">Option 1: External Secrets Operator + Vault</h2>
<p>This is the most robust pattern and the one worth knowing for interviews.</p>
<p>The idea: secrets live in a dedicated secret store (HashiCorp Vault, or a cloud equivalent). A Kubernetes operator called ESO watches <code>ExternalSecret</code> CRD objects in the cluster and syncs the referenced secret into a real Kubernetes <code>Secret</code>. The CRD is safe to commit — it says where the secret lives, not what it is.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># This lives in Git — safe to commit</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">external-secrets.io/v1beta1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ExternalSecret</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-db-credentials</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">refreshInterval</span><span class="p">:</span><span class="w"> </span><span class="l">1h</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">secretStoreRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">vault</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterSecretStore</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">target</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-db-credentials  </span><span class="w"> </span><span class="c"># the k8s Secret it creates</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span>- <span class="nt">secretKey</span><span class="p">:</span><span class="w"> </span><span class="l">DB_PASSWORD</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">remoteRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">key</span><span class="p">:</span><span class="w"> </span><span class="l">secret/myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">property</span><span class="p">:</span><span class="w"> </span><span class="l">db-password</span><span class="w">
</span></span></span></code></pre></div><p>Rotation: you update the secret in Vault. ESO syncs it to the cluster within <code>refreshInterval</code>. No Git commit, no deployment. The pod reads the updated <code>Secret</code> on the next restart (or immediately if you mount it as an env var and the app handles <code>SIGHUP</code>).</p>
<p>Audit trail: Vault logs every read and write. You know exactly which service account read which secret at what time.</p>
<p>The cost: you&rsquo;re running Vault. For a homelab or small team, that&rsquo;s an extra thing to operate. For production, it&rsquo;s worth it.</p>
<p>Self-hosted setup:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># ClusterSecretStore — connects ESO to your Vault instance</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">external-secrets.io/v1beta1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterSecretStore</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">vault</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">provider</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">vault</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">server</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;http://sys-vault.sys-vault.svc.cluster.local:8200&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;secret&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;v2&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">auth</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">kubernetes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">mountPath</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;kubernetes&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">role</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;external-secrets&#34;</span><span class="w">
</span></span></span></code></pre></div><p>ESO authenticates to Vault using the pod&rsquo;s Kubernetes ServiceAccount token. Vault validates it against the cluster&rsquo;s token review endpoint. No static credentials anywhere.</p>
<hr>
<h2 id="option-2-sealed-secrets">Option 2: Sealed Secrets</h2>
<p>Sealed Secrets uses asymmetric encryption. The cluster holds a private key. You use the <code>kubeseal</code> CLI to encrypt a secret with the cluster&rsquo;s public key. The resulting <code>SealedSecret</code> object is safe to commit — only the cluster can decrypt it.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># Encrypt a secret for committing to Git</span>
</span></span><span class="line"><span class="cl">kubectl create secret generic myapp-db <span class="se">\
</span></span></span><span class="line"><span class="cl">  --from-literal<span class="o">=</span><span class="nv">DB_PASSWORD</span><span class="o">=</span>hunter2 <span class="se">\
</span></span></span><span class="line"><span class="cl">  --dry-run<span class="o">=</span>client -o yaml <span class="se">\
</span></span></span><span class="line"><span class="cl">  <span class="p">|</span> kubeseal <span class="se">\
</span></span></span><span class="line"><span class="cl">  &gt; sealed-secrets/myapp-db.yaml
</span></span></code></pre></div><p>The resulting YAML looks like:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">bitnami.com/v1alpha1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">SealedSecret</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">myapp-db</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">myapp</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">encryptedData</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">DB_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l">AgBy3i4OJSWK+PiTySYZZA9rO43cGDEq...</span><span class="w">
</span></span></span></code></pre></div><p>This gets committed. The Sealed Secrets controller in the cluster decrypts it and creates the real <code>Secret</code> automatically.</p>
<p>The tradeoff: rotation means re-sealing. You need the cluster&rsquo;s public key (which is public) and access to the plaintext secret. You commit a new <code>SealedSecret</code>. That&rsquo;s a Git commit, which means a review, a merge, and a deploy. For a 3am incident, that&rsquo;s a lot of friction.</p>
<p>Also: if the cluster&rsquo;s private key is lost, you can&rsquo;t decrypt any of your sealed secrets. Back up the private key.</p>
<p>Good fit for: small teams, homelab, situations where secrets change rarely and the GitOps review process is actually desirable.</p>
<hr>
<h2 id="option-3-sops">Option 3: SOPS</h2>
<p>SOPS (Secrets OPerationS) encrypts files at rest using age keys or cloud KMS. You commit encrypted files. CI decrypts them during deployment using a key it holds in memory (not stored in Git).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># Encrypt a file for Git</span>
</span></span><span class="line"><span class="cl">sops --encrypt --age age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8q <span class="se">\
</span></span></span><span class="line"><span class="cl">  secrets/myapp.yaml &gt; secrets/myapp.enc.yaml
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># In CI: decrypt to temp file, apply, delete</span>
</span></span><span class="line"><span class="cl">sops --decrypt secrets/myapp.enc.yaml <span class="p">|</span> kubectl apply -f -
</span></span></code></pre></div><p>The difference from Sealed Secrets: SOPS encrypts at the file level, not the k8s object level. You can use it outside of Kubernetes (application configs, Terraform variables). The key can live in the CI environment, a cloud KMS, or a personal age key.</p>
<p>The tradeoff: CI needs the decryption key, which puts you back in &ldquo;secret in CI&rdquo; territory — just for the encryption key rather than the actual secrets. If you use a cloud KMS, OIDC federation handles that (no stored key). If you use an age key, it lives in CI secrets.</p>
<p>Good fit for: teams already using Helm and Helm Secrets, polyglot environments where not everything is Kubernetes, small teams where Vault feels like overengineering.</p>
<hr>
<h2 id="comparison">Comparison</h2>
<table>
	<thead>
			<tr>
					<th></th>
					<th>ESO + Vault</th>
					<th>Sealed Secrets</th>
					<th>SOPS</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Rotation without Git commit</td>
					<td>Yes</td>
					<td>No</td>
					<td>Depends</td>
			</tr>
			<tr>
					<td>Audit trail</td>
					<td>Full (Vault)</td>
					<td>None</td>
					<td>Depends on KMS</td>
			</tr>
			<tr>
					<td>Complexity</td>
					<td>High</td>
					<td>Low</td>
					<td>Medium</td>
			</tr>
			<tr>
					<td>Works outside k8s</td>
					<td>With effort</td>
					<td>No</td>
					<td>Yes</td>
			</tr>
			<tr>
					<td>Recovery if key lost</td>
					<td>Vault backup</td>
					<td>Lose all secrets</td>
					<td>Key backup</td>
			</tr>
			<tr>
					<td>CI needs secret material</td>
					<td>No</td>
					<td>No</td>
					<td>Yes (decrypt key)</td>
			</tr>
	</tbody>
</table>
<hr>
<h2 id="what-interviewers-are-actually-testing">What interviewers are actually testing</h2>
<p>The interesting follow-up question is: <em>&ldquo;How do you rotate a secret without downtime?&rdquo;</em></p>
<p>The answer requires you to understand that pods mount <code>Secret</code> objects at startup. Updating the <code>Secret</code> in Kubernetes doesn&rsquo;t automatically restart the pod. Your options are:</p>
<ol>
<li>Mount the secret as a volume and have the app watch for file changes (good)</li>
<li>Restart the deployment after rotation (<code>kubectl rollout restart</code>, automatable)</li>
<li>Use a sidecar like Vault Agent Injector that handles refresh in-process (complex but zero-restart)</li>
</ol>
<p>The correct answer depends on the app. An API key that can be rotated gradually is different from a database password where the old one is invalidated immediately.</p>
<hr>
<p><em>This is part of a series on Kubernetes interview questions. Previously: <a href="/posts/k8s-cicd-no-credentials/">deploying without cluster credentials</a>. Next: <a href="/posts/k8s-zero-downtime/">zero-downtime deployments</a>.</em></p>
]]></content:encoded></item></channel></rss>