Logging and Monitoring with Prometheus and Grafana

Quiz

Monitoring and logging are essential for managing applications in Kubernetes. Without proper visibility into our cluster's health and performance, troubleshooting issues becomes challenging. This is where Prometheus and Grafana come into play. Prometheus is a powerful monitoring system designed for Kubernetes, while Grafana provides rich visualization capabilities.

In this chapter, we'll explore how to set up Prometheus and Grafana in Kubernetes, collect metrics, visualize data, and configure alerts to ensure proactive monitoring. By the end, we'll have a fully functional monitoring system integrated into our Kubernetes environment.

Why Monitoring Matters

As applications grow in complexity, monitoring helps us −

Detect Performance Issues − Spot slow responses, memory leaks, or high CPU usage before users experience problems.
Ensure System Reliability − Maintain uptime by detecting failures early.
Analyze Trends − Understand how our application behaves over time.
Alert on Issues − Receive notifications when critical metrics exceed thresholds.

Understanding Prometheus

Prometheus is an open-source monitoring system specifically designed for dynamic cloud environments like Kubernetes. It works by scraping metrics from instrumented applications and storing them in a time-series database. Prometheus provides −

Multi-dimensional data model with key-value labels
Powerful query language (PromQL) for data analysis
Pull-based model where Prometheus scrapes metrics from targets
Built-in alerting via Alertmanager
Integration with Kubernetes Service Discovery

Prometheus Key Concepts

Targets: The services Prometheus scrapes metrics from (e.g., Kubernetes nodes, pods, applications).
Jobs: Groups of targets defined in the configuration.
Metrics: Time-series data collected from targets.
PromQL (Prometheus Query Language): Used to query and analyze stored metrics.
Alerting rules: Conditions defined to trigger alerts.

Understanding Grafana

Grafana is a visualization tool that transforms raw metrics into rich dashboards. It connects to Prometheus and provides −

Interactive charts, graphs, and tables
Custom dashboards for real-time monitoring
Alerting and notifications via email, Slack, or other integrations
Support for multiple data sources (Prometheus, Loki, Elasticsearch, etc.)

Setting Up Prometheus and Grafana in Kubernetes

To get started, we will deploy Prometheus and Grafana in our Kubernetes cluster. The easiest way to install them is by using Helm, a package manager for Kubernetes.

Prerequisites

Before we begin, we'll ensure that:

A Kubernetes cluster is up and running.
kubectl is configured to communicate with the cluster.
Helm is installed on our system.

Install Prometheus and Grafana

We'll start by adding the Prometheus Helm Repository −

$ helm repo add prometheus-community 
   https://prometheus-community.github.io/helm-charts

Output

"prometheus-community" has been added to your repositories

Update the Helm Repository −

$ helm repo update

Output

Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "bitnami" chart repository
Update Complete. Happy Helming!

Install Prometheus Stack using Helm −

$ helm install prometheus prometheus-community/kube-prometheus-stack 
   --namespace monitoring --create-namespace

Output

NAME: prometheus
LAST DEPLOYED: Mon Mar 10 11:14:18 2025
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace monitoring get pods -l "release=prometheus"
Get Grafana 'admin' user password by running:
  kubectl --namespace monitoring get secrets prometheus-grafana 
  -o jsonpath="{.data.admin-password}" | base64 -d ; echo
Access Grafana local instance:
  export POD_NAME=$(kubectl --namespace monitoring get pod -l
  "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=prometheus" -oname)
  kubectl --namespace monitoring port-forward $POD_NAME 3000

This installs Prometheus, Grafana, and Alertmanager under the monitoring namespace.

Verify that Prometheus Pods are Running −

$ kubectl get pods -n monitoring

NAME                                                     READY   STATUS              RESTARTS   AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   0/2     Init:0/1            0          76s
prometheus-grafana-54d864bf96-t2qjr                      0/3     ContainerCreating   0          3m30s
prometheus-kube-prometheus-operator-dd4b85cc8-qjjmq      1/1     Running             0          3m30s
prometheus-kube-state-metrics-888b7fd55-lv5mq            1/1     Running             0          3m30s
prometheus-prometheus-kube-prometheus-prometheus-0       0/2     Init:0/1            0          76s
prometheus-prometheus-node-exporter-wvgxw                1/1     Running             0          3m30s

This confirms that some of the required services are running, while others are still initializing.

Access the Prometheus UI

To access this UI, we'll first port-forward the Prometheus pod −

$ kubectl port-forward prometheus-prometheus-kube-prometheus-prometheus-0 9090 -n monitoring

Output

Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

Now, we can visit http://localhost:9090 in our browser to access the Prometheus UI.

Access the Grafana UI

Well first port-forward the Grafana pod with the following command −

$ kubectl port-forward prometheus-grafana-54d864bf96-t2qjr 3000 -n monitoring

Output

Forwarding from 127.0.0.1:3000 -> 3000
Forwarding from [::1]:3000 -> 3000

Now, we can open http://localhost:3000 in a web browser and log in using −

Username: admin
Password: prom-operator (kubectl --namespace monitoring get secrets prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 -d ; echo)

Collecting Metrics in Kubernetes

Prometheus automatically discovers and scrapes Kubernetes components, including:

Node metrics (CPU, Memory, Disk Usage)
Pod and container metrics (Resource consumption, restarts, errors)
Kubernetes API metrics (Event logs, scheduler stats)

To manually expose metrics from an application, we can use an Exporter.

Example: Node Exporter (Collecting System Metrics)

$ helm install node-exporter prometheus-community/prometheus-node-exporter --namespace monitoring

Output

NAME: node-exporter
LAST DEPLOYED: Mon Mar 10 13:42:25 2025
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the application URL by running these commands:
  export POD_NAME=$(kubectl get pods --namespace monitoring -l
  "app.kubernetes.io/name=prometheus-node-exporter,app.kubernetes.io/instance=node-exporter" -o
  jsonpath="{.items[0].metadata.name}")
  echo "Visit http://127.0.0.1:9100 to use your application"
  kubectl port-forward --namespace monitoring $POD_NAME 9100

Expose Node Exporter on Port 9100:

$ export POD_NAME=$(kubectl get pods --namespace monitoring -l 
"app.kubernetes.io/name=prometheus-node-exporter,app.kubernetes.io/instance=node-exporter" -o 
jsonpath="{.items[0].metadata.name}")
$ kubectl port-forward --namespace monitoring $POD_NAME 9100

Visit http://127.0.0.1:9100 to see system metrics.

This collects host-level system metrics, including CPU, memory, disk usage, filesystem statistics, and network activity.

Monitoring Targets in Prometheus

Prometheus collects metrics from various sources, known as targets. These targets can include the Prometheus server itself, Node Exporter, application instances, and other services exposing metrics in a Prometheus-compatible format.

When Prometheus scrapes a target, it assigns one of the following states:

Up(Healthy): The target is active and responding to Prometheus scrapes.
Down (Unhealthy): The target is unreachable, or Prometheus cannot retrieve metrics.
Unknown: The target has not been scraped yet or is misconfigured.

Key Details Displayed on the Targets Page

Endpoint − The URL where Prometheus scrapes metrics.
State − Whether the target is up, down, or unknown.
Labels − Identifiers used for querying and grouping in Prometheus.
Last Scrape − The last time Prometheus successfully scraped the target.
Scrape Duration − The time taken to collect the metrics.
Error − If Prometheus fails to scrape a target, an error message is displayed.

Configuring Targets in Prometheus

Targets are defined in the Prometheus configuration file (prometheus.yml) under the scrape_configs section.

Here is an example configuration for monitoring both Prometheus itself and Node Exporter:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']

Each job_name represents a monitored service, and targets specify the endpoints Prometheus scrapes.

Viewing Target Metrics

Once configured, visit http://localhost:9090/targets to check the status of each target. Clicking on an endpoint will display the raw metrics exposed by the service. These metrics can then be queried in Prometheus and visualized in tools like Grafana for better insights.

Visualizing Metrics with Grafana

Let's now understand how to visualize metrics with Grafana:

Connecting Prometheus to Grafana

To get started, log in to Grafana by accessing http://localhost:3000 and using the admin credentials.

Next, navigate to the welcome page and go to Data Sources.

Click Add Data Source and select Prometheus as the data source type.

In the URL field, enter http://localhost:9090 as the Prometheus server URL.

Click Save & Test.

The following output confirms a successful connection:

Creating a Grafana Dashboard with Custom Visualizations

To achieve this, start by clicking Create → Dashboard.

Next, click the Add visualization button.

Select prometheus (default) as the data source.

In the Query section, click kick start your query button to choose the query type. Use the Rate then sum as the query.

Then, input the query, for instance, node_cpu_seconds_total, and click the relevant option.

Choose time series as the visualization type (e.g., Graph, Gauge, Table, or Heatmap), and complete the query by clicking run queries.

Now, the raw data can be visualized.

Additional Example Queries for Visualization

CPU Usage per Pod: sum(rate(container_cpu_usage_seconds_total{namespace="monitoring"}[5m])) by (pod)
Memory Usage per Node: sum(container_memory_usage_bytes) by (node)
Disk I/O Reads per Container: rate(container_fs_reads_bytes_total[5m])
Network Traffic per Pod: sum(rate(container_network_receive_bytes_total[5m])) by (pod)

Importing a Pre-built Kubernetes Dashboard in Grafana

Grafana allows importing dashboards using JSON files, which makes it easy to reuse pre-configured visualizations. To import a dashboard, follow these steps:

Obtain a JSON Dashboard File

Download pre-built Kubernetes monitoring dashboards from Grafana's official website or export an existing dashboard as a JSON file.

Import the JSON File in Grafana

Go to Dashboards → Import.
Click Upload JSON file and select your pre-configured dashboard file.
Alternatively, paste the JSON content into the Import via JSON

Select Data Source

Choose the correct Prometheus data source to ensure the imported dashboard retrieves the expected metrics.

Import the Dashboard

Click Import to load a detailed Kubernetes monitoring dashboard.

Once imported, customize the dashboard by adding additional panels or adjusting alert thresholds.

Configuring Alerts in Prometheus

To receive alerts when issues occur, we can configure Alertmanager.

Create an Alerting Rule

Save the following as alerts.yaml.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: high-cpu-alert
  namespace: monitoring
spec:
  groups:
    - name: example.rules
      rules:
        - alert: HighCPUUsage
          expr: (100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 80
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "High CPU usage detected"
            description: "CPU usage has exceeded 80% for more than 2 minutes"

Apply the rule:

$ kubectl apply -f alerts.yaml

Output

prometheusrule.monitoring.coreos.com/high-cpu-alert created

This confirms that Prometheus has successfully added the alert rule.

Managing Alerts in Grafana

Let's now understand how to manage alerts in Grafana:

Edit the Prometheus Alert Rules:

groups:
  - name: instance-down
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"

Apply the alert configuration:

$ kubectl apply -f prometheus-rules.yaml

Output

prometheusrule.monitoring.coreos.com/instance-down created

Setting Up Alerts in Grafana

Open Grafana:

Go to Alerting → Contact Points.
Create a new contact point (e.g., Email, Slack, PagerDuty).
Configure the recipient details.

Define a New Alert Rule:

Go to Alerting → New Alert Rule.
Choose a metric query (e.g., node_cpu_seconds_total).
Set conditions (e.g., CPU usage > 80%).
Define notification policies to link alerts with contact points.

Test and Activate the Alert:

Click Test Rule to validate.
Save and enable the rule to start monitoring.

Alert status changes from Pending → Firing when conditions are met. In addition, notifications are sent to the configured contact points when an alert triggers. This ensures that both Prometheus and Grafana provide efficient alerting mechanisms for monitoring Kubernetes infrastructure.

Troubleshooting Common Issues

We have highlighted here some of the common issues faced in monitoring and how to troubleshoot such issues.

Prometheus Not Scraping Metrics

Check if the target is correctly discovered: kubectl get servicemonitors -n monitoring
Inspect Prometheus logs: kubectl logs -l app=prometheus -n monitoring

Grafana Dashboards Not Displaying Data

Verify the Prometheus data source connection.
Check PromQL queries for correct syntax.
Inspect Grafana logs: kubectl logs -l app=grafana -n monitoring

Alerts Not Firing

Confirm alert rules are correctly loaded: kubectl get prometheusrules -n monitoring
Test alerts manually: kubectl exec -it prometheus-0 -n monitoring -- promtool check rules /etc/prometheus/rules.yaml

Best Practices for Monitoring in Kubernetes

Listed below are some of the best practices for monitoring in Kubernetes:

Use Dashboards − Regularly review Grafana dashboards to track key metrics.
Set Alerts − Configure alerts to notify you of abnormal conditions.
Optimize Storage − Ensure Prometheus retention settings are configured correctly to avoid excessive data usage.
Secure Access − Restrict access to monitoring tools to avoid unauthorized access.

Conclusion

Monitoring and logging are crucial for maintaining a healthy Kubernetes environment. Prometheus and Grafana provide powerful capabilities for collecting, analyzing, and visualizing metrics. By setting up Prometheus for metric collection and Grafana for visualization, we gain deep insights into our cluster’s health, helping us detect and resolve issues efficiently.

By implementing these tools, we ensure that our Kubernetes workloads run smoothly and any potential issues are addressed before they impact users.

Print Page