Cluster Monitoring with Grafana

Context

K8SaaS provides a dashboarding capability to look at the cluster metrics as well as your applicative metrics.

Design

Several types of metrics are collected:

system metrics
kubernetes metrics
application metrics

Use case

Monitor an SLA, a capacity planning
Post-mortem diagnosis
Real-time monitoring & dashboarding

Access to the Monitoring Portal

First, get the following information:

K8SAAS_SUBSCRIPTION_ID
K8SAAS_INSTANCE_NAME

Standard
With Corporate Addon
Legacy

Then, access to the URLS: https://grafana.K8SAAS_INSTANCE_NAME_WITHOUT_k8saas.region.k8saas.thalesdigital.io/

For instance: https://grafana.lja-sandbox.eu.k8saas.thalesdigital.io rather than https://grafana.k8saas-lja-sandbox.kaas.thalesdigital.io

Then, access to the URLS: https://grafana.K8SAAS_INSTANCE_NAME_WITHOUT_k8saas.ahe.tdp.infra.thales/

Then, access to the URLS: https://grafana.K8SAAS_INSTANCE_NAME.kaas.thalesdigital.io/

On this dashboard, you have all the common links to access to blog, online documentation or k8saas custom dashboards.

Here are the default dashboards imported:

Overview

Goal: Get the status of the cluster and discover the cloud resources used by AKS

Azure Log Analytics (app CL)

info

Azure Log Analytics (app CL) and Azure Log Analytics (ingress CL) were merged together and named Azure Log Analytics.

Goal: Select your own namespace and look at the stdout and stderr logs of your applications

Azure Log Analytics (ingress CL)

info

Azure Log Analytics (app CL) and Azure Log Analytics (ingress CL) were merged together and named Azure Log Analytics.

Goal: Look at NGINX ingress controller logs

Cert Manager

Goal: Monitor your certificates

Flux Cluster stats

Goal: Monitor the k8saas transversal application deployed by default

Flux Control plane

Goal: Monitor flux controllers

Linkerd Top Line

Goal: Monitor Linkerd service mesh

NGINX ingress controller

Goal: Monitor your ingress performance and issues

OPA Gatekeeper

Goal: Discover the kubernetes policy deployed

Velero

Goal: Monitor your backups

HOWTO

Make a query

here is the query to help you copy & past:

count by(namespace,reason) (kube_pod_container_status_terminated_reason{reason!="Completed",namespace="$namespace"} !=0 )
count by(namespace,reason) (kube_pod_container_status_waiting_reason{namespace="$namespace"} !=0 )

Then, you should have a look to the collected metrics:

source: kube-state-metrics

Example:

Grafana example 1

Grafana example 2

Expose metrics from your applications

Although some services and applications are already adopting the Prometheus metrics format and provide endpoints for this purpose, many popular server applications like Nginx or PostgreSQL are much older than the Prometheus metrics / OpenMetrics popularization. This complicates getting metrics from them into a single pane of glass, since they usually have their own metrics formats and exposition methods.

If you are trying to unify your metric pipeline across many microservices and hosts using Prometheus metrics, this may be a problem.

To work around this hurdle, the Prometheus community is creating and maintaining a vast collection of Prometheus exporters. An exporter is a “translator” or “adapter” program that is able to collect the server native metrics (or generate its own data observing the server behavior) and re-publish them using the Prometheus metrics format and HTTP protocol transports.

diagram showing how Prometheus exporters work These exporter small binaries can be co-located in the same pod as a sidecar of the main server that is being monitored, or isolated in their own pod or even a different infrastructure.

The exporter exposes the service metrics converted into Prometheus metrics, so you just need to scrape the exporter.

Here you can find an aggregated list of prometheus exporters.

make prometheus scrape your applications metrics

A ServiceMonitor describes the set of targets to be monitored by Prometheus.

ServiceMonitor

The Prometheus resource includes a field called serviceMonitorSelector, which defines a selection of ServiceMonitors to be used. By default and before the version v0.19.0, ServiceMonitors must be installed in the same namespace as the Prometheus instance. With the Prometheus Operator v0.19.0 and above, ServiceMonitors can be selected outside the Prometheus namespace via the serviceMonitorNamespaceSelector field of the Prometheus resource.

First, deploy three instances of a simple example application, which listens and exposes metrics on port 8080.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
  namespace: customer-namespaces
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-app
  template:
    metadata:
      labels:
        app: example-app
    spec:
      containers:
      - name: example-app
        image: fabxc/instrumented_app
        ports:
        - name: web
          containerPort: 8080

The ServiceMonitor has a label selector to select Services and their underlying Endpoint objects. The Service object for the example application selects the Pods by the app label having the example-app value. The Service object also specifies the port on which the metrics are exposed.

kind: Service
apiVersion: v1
metadata:
  name: example-app
  namespace: customer-namespaces
  labels:
    app: example-app
spec:
  selector:
    app: example-app
  ports:
  - name: web
    port: 8080

This Service object is discovered by a ServiceMonitor, which selects in the same way. The app label must have the value example-app.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  namespace: customer-namespaces
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web

Import your own dashboard as code

caution

It was brought to our attention that we offer persistency on Grafana. Although due to breaking change on Grafana/Prometheus we delete the volume to accelerate the migration. Due to this, we highly suggest to export your dashboard and deploy them using the link below.

Instructions here

Next step

Go back to homepage

Context​

Use case​

Access to the Monitoring Portal​

Overview​

Azure Log Analytics (app CL)​

Azure Log Analytics (ingress CL)​

Cert Manager​

Flux Cluster stats​

Flux Control plane​

Linkerd Top Line​

NGINX ingress controller​

OPA Gatekeeper​

Velero​

HOWTO​

Make a query​

Expose metrics from your applications​

make prometheus scrape your applications metrics​

Import your own dashboard as code​

Next step​