Observing Kubernetes clusters at scale is difficult. While most companies operate a small number of Kubernetes clusters, Giant Swarm is responsible for many more, in multiple regions. This scale makes maintaining a responsible level of observability harder.
Our infrastructure benefits from our learnings with this level of operations, such as building tooling for automatically managing Prometheus for on-demand Kubernetes clusters, or new Prometheus exporters to address hard-to-monitor problems.
This talk presents our learnings of handling observability at scale, with in-depth examples from our infrastructure.
Audience requirements:
Some level of Kubernetes, observability and/or operations experience
Objective of the talk:
To present our learnings of handling observability of enterprise Kubernetes clusters at scale.
You can view Joe’s slides below: