Skip to main content

Cluster Monitoring with Grafana

Context

K8SaaS provides a dashboarding capability to look at the cluster metrics as well as your applicative metrics.

Design

Several types of metrics are collected:

  • system metrics
  • kubernetes metrics
  • application metrics

Use case

  • Monitor an SLA, a capacity planning
  • Post-mortem diagnosis
  • Real-time monitoring & dashboarding

Access to the Monitoring Portal

First, get the following information:

  • K8SAAS_SUBSCRIPTION_ID
  • K8SAAS_INSTANCE_NAME

Then, access to the URLS: https://grafana.K8SAAS_INSTANCE_NAME_WITHOUT_k8saas.region.k8saas.thalesdigital.io/

For instance: https://grafana.lja-sandbox.eu.k8saas.thalesdigital.io rather than https://grafana.k8saas-lja-sandbox.kaas.thalesdigital.io

info

From Babel Release 2.1 you should see the k8saas homepage:

img

On this dashboard, you have all the common links to access to blog, online documentation or k8saas custom dashboards.

Here are the default dashboards imported:

Overview

Goal: Get the status of the cluster and discover the cloud resources used by AKS

img

Azure Log Analytics (app CL)

info

Previous Babel, Azure Log Analytics (app CL) and Azure Log Analytics (ingress CL) were merged together and named Azure Log Analytics.

Goal: Select your own namespace and look at the stdout and stderr logs of your applications

img

Azure Log Analytics (ingress CL)

info

Previous Babel, Azure Log Analytics (app CL) and Azure Log Analytics (ingress CL) were merged together and named Azure Log Analytics.

Goal: Look at NGINX ingress controller logs

img

Cert Manager

info

Available only on version Babel and more.

Goal: Monitor your certificates

img

Flux Cluster stats

Goal: Monitor the k8saas transversal application deployed by default

img

Flux Control plane

Goal: Monitor flux controllers

img

Kubernetes events

Goal: Monitor Kubernetes events

img

Linkerd Top Line

Goal: Monitor Linkerd service mesh

img

NGINX ingress controller

Goal: Monitor your ingress performance and issues

img

OPA Gatekeeper

info

Available only on version Babel and more.

Goal: Discover the kubernetes policy deployed

img

Velero

Goal: Monitor your backups

img

HOWTO

Make a query

here is the query to help you copy & past:

count by(namespace,reason) (kube_pod_container_status_terminated_reason{reason!="Completed",namespace="$namespace"} !=0 )
count by(namespace,reason) (kube_pod_container_status_waiting_reason{namespace="$namespace"} !=0 )

Then, you should have a look to the collected metrics:

source: kube-state-metrics

Example:

Grafana example 1

Grafana example 2

Expose metrics from your applications

Although some services and applications are already adopting the Prometheus metrics format and provide endpoints for this purpose, many popular server applications like Nginx or PostgreSQL are much older than the Prometheus metrics / OpenMetrics popularization. This complicates getting metrics from them into a single pane of glass, since they usually have their own metrics formats and exposition methods.

If you are trying to unify your metric pipeline across many microservices and hosts using Prometheus metrics, this may be a problem.

To work around this hurdle, the Prometheus community is creating and maintaining a vast collection of Prometheus exporters. An exporter is a “translator” or “adapter” program that is able to collect the server native metrics (or generate its own data observing the server behavior) and re-publish them using the Prometheus metrics format and HTTP protocol transports.

diagram showing how Prometheus exporters work These exporter small binaries can be co-located in the same pod as a sidecar of the main server that is being monitored, or isolated in their own pod or even a different infrastructure.

The exporter exposes the service metrics converted into Prometheus metrics, so you just need to scrape the exporter.

Here you can find an aggregated list of prometheus exporters.

make prometheus scrape your applications metrics

A ServiceMonitor describes the set of targets to be monitored by Prometheus.

ServiceMonitor

The Prometheus resource includes a field called serviceMonitorSelector, which defines a selection of ServiceMonitors to be used. By default and before the version v0.19.0, ServiceMonitors must be installed in the same namespace as the Prometheus instance. With the Prometheus Operator v0.19.0 and above, ServiceMonitors can be selected outside the Prometheus namespace via the serviceMonitorNamespaceSelector field of the Prometheus resource.

First, deploy three instances of a simple example application, which listens and exposes metrics on port 8080.

apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
namespace: dev
spec:
replicas: 3
selector:
matchLabels:
app: example-app
template:
metadata:
labels:
app: example-app
spec:
containers:
- name: example-app
image: fabxc/instrumented_app
ports:
- name: web
containerPort: 8080

The ServiceMonitor has a label selector to select Services and their underlying Endpoint objects. The Service object for the example application selects the Pods by the app label having the example-app value. The Service object also specifies the port on which the metrics are exposed.

kind: Service
apiVersion: v1
metadata:
name: example-app
namespace: dev
labels:
app: example-app
spec:
selector:
app: example-app
ports:
- name: web
port: 8080

This Service object is discovered by a ServiceMonitor, which selects in the same way. The app label must have the value example-app.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: example-app
namespace: dev
labels:
team: frontend
spec:
selector:
matchLabels:
app: example-app
endpoints:
- port: web

Import your own dashboard as code

caution

It was brought to our attention that we offer persistency on Grafana. Although due to breaking change on Grafana/Prometheus we delete the volume to accelerate the migration. Due to this, we highly suggest to export your dashboard and deploy them using the link below.

Instructions here

Next step

Go back to homepage