
📈 Observing Local LLM Inference: llama.cpp's Built-in Prometheus Metrics
llama.cpp’s inference server ships a /metrics endpoint. One flag, Prometheus scraping, a Grafana dashboard loaded via ConfigMap sidecar — AI observability without a proxy layer.