google-cloud-observability — Google Cloud observability tooling¶
View on GitHub |
applications/google-cloud-observability Application template |
|||||||||||||||
Type |
||||||||||||||||
Namespace |
google-cloud-observability |
|||||||||||||||
Argo CD Project |
monitoring |
|||||||||||||||
Environments |
Google provides a managed service for Prometheus.
In Phalanx environments provisioned on GKE, we’d like to use this for as much as we can to avoid the effort of running our own metrics and monitoring infrastructure.
Unfortunately, the managed kube-state-metrics package does not provide kube_pod_container_status_last_terminated_reason
or kube_pod_container_status_restarts_total
, both of which are needed to alert on container OOM kills in the most reliable way.
This app installs our own kube-state-metrics and configures the Google Cloud managed service for Prometheus to scrape it.
Prerequisites¶
Managed service for Prometheus is installed in the GKE cluster. This is probably configured in the idf_deploy repo.