Scaling¶

kpod-metrics is tested for clusters up to 1,000 nodes / 100,000 pods.

Resource Usage¶

resources:
  requests:
    cpu: 150m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

For large clusters, use the standard profile (not comprehensive) to keep Prometheus cardinality under 4M time series.

Use per-collector intervals to reduce overhead for heavy collectors (syscall, biolatency)
Use namespace filtering to limit scope
Use the minimal profile if you only need CPU and memory metrics
BPF maps use LRU eviction — no manual cleanup needed