Troubleshooting¶

BPF Programs Failed to Load¶

Symptom: /actuator/health shows bpf: DOWN with failed programs.

Check logs:

kubectl logs ds/kpod-metrics | grep -i "failed to load"

Common causes:

Cause	Fix
Kernel too old (< 4.18)	Upgrade to 4.18+ (legacy) or 5.2+ (CO-RE)
Missing BTF (kernel 5.2+ without `CONFIG_DEBUG_INFO_BTF`)	Legacy BPF is used automatically; check if `/sys/kernel/btf/vmlinux` exists
Missing capabilities	Ensure `BPF`, `PERFMON`, `SYS_RESOURCE`, `NET_ADMIN` capabilities are granted
Seccomp blocking BPF syscall	Use `RuntimeDefault` or add `bpf` to allowed syscalls
SELinux denying BPF access	Check `ausearch -m avc -ts recent` for BPF denials

Tip

kpod-metrics degrades gracefully — if a BPF program fails to load, other collectors continue working. Cgroup-based collectors never require BPF.

Check namespace filter:

kubectl get configmap -n kpod-metrics kpod-metrics -o yaml | grep -A5 filter

By default, kube-system and kube-public are excluded.

Check pod discovery:

curl http://localhost:9090/actuator/prometheus | grep kpod_discovery_pods_total

If this is 0, the pod informer may not be watching the correct node.

Possible causes:

Symptom: kpod_collection_timeouts_total is increasing.

Causes:

Fix: Increase config.collectionTimeout or use per-collector intervals:

config:
  collectionTimeout: 30000
  collectorIntervals:
    syscall: 60000
    biolatency: 60000

Distro	Kernel	Mode	Status
Ubuntu 22.04+	5.15+	CO-RE	Fully supported
Ubuntu 20.04	5.4	CO-RE	Fully supported
RHEL 8.2+	4.18	Legacy	Fully supported
RHEL 9+	5.14+	CO-RE	Fully supported
Debian 11+	5.10+	CO-RE	Fully supported
Amazon Linux 2023	6.1+	CO-RE	Fully supported
Bottlerocket	5.10+	CO-RE	Fully supported