⚡ PerfCatch

Introducing PerfCatch — eBPF Per-Request Performance Monitoring for Kubernetes Applications

Measure CPU time, memory, network I/O, and duration for every HTTP request — zero code changes required.

The Problem

Traditional monitoring tools give you pod-level CPU and memory averages. But when a single slow request causes a timeout or an OOM kill, you're left guessing — which request consumed all the resources?

APM tools require SDK instrumentation, code changes, and per-host licensing. What if you could get per-request resource measurement without touching a single line of application code?

Introducing PerfCatch

PerfCatch is an open-source eBPF-based monitoring tool that measures resource consumption for every individual HTTP request hitting your Kubernetes pods.

It deploys as a DaemonSet, attaches eBPF programs to kernel TCP functions, and captures detailed per-request metrics automatically.

Zero instrumentation required. Works with any language — Python, Go, Java, Node.js, Rust, C# — because it operates at the kernel level.

What It Measures

Metric	Source	Description
`duration_ms`	accept() → tcp_close()	Total request wall-clock time
`cpu_time_ms`	sched_switch tracepoint	Actual on-CPU execution time
`memory_rss_bytes`	/proc/<pid>/status	Process RSS memory at request time
`bytes_received`	tcp_recvmsg kprobe	Network bytes received per request
`bytes_sent`	tcp_sendmsg kprobe	Network bytes sent per request
`correlation_id`	HTTP header capture	Extracted from X-Correlation-ID, traceparent, etc.
`http_method/path`	TCP stream first bytes	GET /compute extracted from request line

Key Features

Per-request granularity — CPU, memory, duration, and network per individual HTTP request
Real CPU time — actual on-CPU nanoseconds from kernel scheduler, not estimation
Correlation ID tracking — auto-extracts from 7 built-in headers + custom ones
< 1% CPU overhead — eBPF runs in kernel space with zero-copy event delivery
500+ requests/sec throughput per node
Helm chart deployment — single command with bundled Prometheus + Grafana
Pre-built Grafana dashboard — 8 panels with request table, time-series charts, and filtering
Flexible storage — in-memory ring buffer, Prometheus, VictoriaMetrics, or SQLite
Dependency tracking — captures outbound TCP connections during request processing

How It Works

PerfCatch runs 3 eBPF programs in kernel space:

request_tracker.c — tracks TCP accept → send → close lifecycle per connection
dependency_tracker.c — captures outbound calls made during request processing
resource_tracker.c — accumulates real CPU time via sched_switch tracepoint

In userspace, a Collector correlates events with K8s pod metadata, stores them in a 50K-entry ring buffer, and exposes both Prometheus histograms and a JSON query API.

Quick Start

Step 1: Deploy

helm install perfcatch charts/perfcatch \
  -n perfcatch --create-namespace \
  --set config.namespace=my-app

Step 2: Verify

kubectl -n perfcatch get pods
# NAME              READY   STATUS    RESTARTS   AGE
# perfcatch-znplr   1/1     Running   0          30s

Step 3: Send requests with correlation IDs

curl -H "X-Correlation-ID: order-12345" http://my-app/compute

Step 4: Query the API

curl localhost:9090/api/requests?correlation_id=order-12345

{
  "count": 1,
  "requests": [{
    "correlation_id": "order-12345",
    "pod_name": "sample-app-5879fb87f5-zcfbr",
    "http_method": "GET",
    "http_path": "/compute",
    "duration_ms": 9.24,
    "cpu_time_ms": 8.90,
    "memory_rss_bytes": 53518336,
    "bytes_received": 161,
    "bytes_sent": 170
  }]
}

Step 5: View in Grafana

kubectl -n monitoring port-forward svc/grafana 3000:3000
# Open http://localhost:3000 (admin / perfcatch)
# Dashboard: "PerfCatch - eBPF Request Metrics"

Deployment Options

Mode	What You Get
Full Stack (default)	Agent + Prometheus (5Gi PVC, 15d retention) + Grafana with pre-built dashboard
With VictoriaMetrics	Full stack + long-term per-request storage via Remote Write (30d retention)
Existing Prometheus	Agent only — auto-discovered via pod annotations
Standalone	No Prometheus. SQLite persistence on hostPath. Query via /api/requests

Supported Correlation Headers

PerfCatch auto-detects these headers from HTTP requests (configurable):

x-correlation-id
x-request-id
x-trace-id
traceparent
x-amzn-trace-id
request-id
correlation-id
+ any custom header via config

Why PerfCatch?

	PerfCatch	Traditional APM
Code changes	None	SDK integration required
CPU overhead	< 1%	5-15%
Language support	Any (kernel-level)	Language-specific agents
CPU time accuracy	Real on-CPU ns (sched_switch)	Wall-clock estimation
Cost	Free / open source	Per-host licensing
Deploy time	1 Helm command	Hours of SDK integration

The Problem

When a Kubernetes pod crashes and restarts, the clock starts ticking. You have roughly 1 hour before kubectl get events forgets it ever happened. Previous container logs? Gone after the next restart. OOMKill at 3 AM? Good luck debugging it on Monday.

Introducing Podmortem

Podmortem is a lightweight Kubernetes sidecar that watches for pod restarts in real-time and automatically captures the reason, last container logs, and events — storing them permanently in SQLite.

Podmortem Architecture

Key Features

⚡ Real-time pod restart detection via Kubernetes Watch API
📋 Captures previous container logs (the crashed container's output)
🔍 Records pod events at the exact moment of restart
💾 SQLite-backed searchable history — survives beyond K8s 1-hour TTL
🎯 Rich CLI with filtering by namespace, pod, time range
🗑️ Built-in purge command for housekeeping
☸️ Helm chart for one-command deployment

How It Works

Detection — Watches Kubernetes API for pod lifecycle events using the Watch API with near real-time monitoring (<1s delay)
Context Capture — Grabs restart reason, previous container logs, pod status, and environment metadata
Data Processing — Normalizes data, deduplicates, classifies root cause (OOMKill, CrashLoopBackOff, Error), aligns timestamps
Persistent Storage — Stores in SQLite with indexing for fast queries and long-term retention
Insight & Retrieval — Query restart history, build debug timelines, detect recurring failure patterns

Quick Start — Deploy with Helm

# Install to your cluster
helm install podmortem charts/podmortem \
  -n podmortem --create-namespace

# Verify it's running
kubectl get pods -n podmortem

Query Restart History

No local install needed — exec directly into the pod:

# Get pod name
POD=$(kubectl get pod -n podmortem \
  -l app.kubernetes.io/name=podmortem \
  -o jsonpath='{.items[0].metadata.name}')

# View recent restarts
kubectl exec -n podmortem $POD -- podmortem history

# Filter by namespace and pod
kubectl exec -n podmortem $POD -- podmortem history -n production -p my-app

# Full crash details with logs
kubectl exec -n podmortem $POD -- podmortem detail 1

Example Output

                Pod Restart History (3 records)
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ ID ┃ Timestamp           ┃ Namespace  ┃ Pod         ┃ Reason  ┃ Exit    ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│  3 │ 2026-05-22T14:08:53 │ clares-ns  │ clares-pod  │ OOMKill │    137  │
│  2 │ 2026-05-22T14:03:46 │ clares-ns  │ clares-pod  │ OOMKill │    137  │
│  1 │ 2026-05-22T13:58:41 │ clares-ns  │ clares-pod  │ OOMKill │    137  │
└────┴─────────────────────┴────────────┴─────────────┴─────────┴─────────┘

Housekeeping with Purge

# Delete records older than a date
kubectl exec -n podmortem $POD -- podmortem purge --before "2026-05-01T00:00:00" -y

# Delete by namespace
kubectl exec -n podmortem $POD -- podmortem purge -n staging -y

# Wipe everything
kubectl exec -n podmortem $POD -- podmortem purge --all -y

Helm Configuration

Parameter	Default	Description
`watchNamespace`	`""` (all)	Namespace to watch
`persistence.enabled`	`true`	Enable PVC for SQLite
`persistence.size`	`1Gi`	Storage size
`resources.limits.memory`	`256Mi`	Memory limit
`verbose`	`true`	Debug logging

Why Podmortem?

Without Podmortem	With Podmortem
Events expire after ~1 hour	Permanent searchable history
Previous logs lost on next restart	Logs captured at crash time
Manual `kubectl describe` per pod	Aggregated view across all pods
No pattern visibility	Detect recurring failures

Get It

🔗 GitHub: github.com/DevOpsArts/podmortem
🐳 Docker Hub: devopsart1/podmortem

Built by DevOpsArt — because every pod crash tells a story.

Ever missed a certificate renewal and had a production outage? Or forgot to renew a software license until it was too late? I built CLARES to make sure that never happens again.

What is CLARES?

CLARES stands for Compliance License & Asset Reminder Engine System. It's a full-stack web application that tracks expiry dates of SSL certificates, software licenses, compliance certificates, and any custom asset type your organization manages — and sends email reminders before things expire.

I built it because every team I've worked with has the same problem: critical renewals tracked in spreadsheets, emails, or someone's memory. CLARES replaces all of that with a single, centralized dashboard.

The Problem

In any enterprise environment, you're juggling dozens (or hundreds) of:

🔒 SSL/TLS Certificates — expiring silently until your website goes down
📜 Software Licenses — renewal dates buried in procurement emails
📋 Compliance Certificates — audit deadlines that sneak up on you
🔑 API Keys, Tokens, Secrets — rotating credentials on schedule

The cost of missing even one renewal can be significant — from service outages to compliance violations. CLARES gives you a single pane of glass with urgency-based grouping and automated email reminders.

How It Works

Step 1: Login — Users authenticate with username/password. The server validates credentials, checks the account is active, and issues a JWT token (8-hour expiry). Deactivated accounts get a clear error message — no cryptic "session expired" nonsense.

Step 2: Dashboard — The home page auto-fetches all renewal items and groups them into four urgency buckets: Expired, Critical (≤14 days), Warning (≤30 days), and Upcoming (≤90 days). Summary cards show counts per catalog type.

Step 3: Manage Catalogs — The sidebar lists all catalog types. Click any catalog to view, add, edit, or delete entries. Upload up to 500 rows at once via CSV bulk import. Add custom catalog types beyond the built-in ones.

Step 4: Granular Permissions — Global admins see everything. Other users get per-catalog roles: No Access, View (read-only), or Admin (full CRUD). A user can be a viewer globally but an admin for specific catalogs. Permissions are enforced on both frontend and backend.

Step 5: Email Reminders — Configure SMTP settings from the admin page, test the connection, send a test email, then trigger reminders. Each item has its own reminder config — how many days before expiry and how many times to repeat. The system calculates exact send dates by evenly spacing repeats within the window (e.g. 30 days / 3 repeats → reminders at 30, 20, and 10 days before expiry).

Step 6: Automatic Scheduler — Enable the daily auto-reminder from Admin Settings and pick an hour (server time). A background scheduler checks every 60 seconds and fires once per day. A reminder_logs table tracks which reminder number has been sent per item — no duplicates, and missed reminders are caught up automatically.

Tech Stack

Layer	Technology
Frontend	React 18, Vite 5, React Router v6
Backend	Node.js 20, Express 4
Database	PostgreSQL
Auth	JWT + bcrypt
Email	Nodemailer (configurable SMTP)
Deployment	Docker (multi-arch), Helm, Kubernetes

Architecture

🎨 React 18
Vite + Router v6

→

⚙️ Express API
Node.js + JWT

→

🗃️ PostgreSQL
Database

The frontend is a React SPA built by Vite into static files. Express serves both the static files and the REST API on the same port. Authentication is stateless via JWT — no server-side sessions. The whole thing is packaged into a single Docker image using a multi-stage build.

Key Features

📊 Smart Dashboard

Items are automatically grouped by urgency. No more scanning through spreadsheets — you instantly see what needs attention. Summary cards give you counts per catalog type at a glance.

📁 Flexible Catalogs

Three built-in catalog types (Certificates, Licenses, SSL Certs) plus unlimited custom types. Each catalog tracks items with name, environment, expiry date, owner, notes, and per-item reminder settings.

📤 Bulk CSV Upload

Download a CSV template, fill it in, and upload up to 500 rows at once. Perfect for initial data migration or when you inherit a spreadsheet full of renewal dates.

👥 Granular User Permissions

This was one of the trickier features. The permission model has two layers:

Global role: Admin (everything) or Viewer (read-only)
Per-catalog role: No Access, View, or Admin for each individual catalog

A user can be a global Viewer but have Admin rights on specific catalogs. This means you can delegate management of SSL certificates to the infra team without giving them access to license data.

📧 Smart Email Reminders

Configure any SMTP server (Exchange, Gmail, etc.) from the admin UI. Test the connection, send a test email, then trigger reminders. Each item can have its own reminder settings — how many days before expiry, and how many times to repeat.

The system calculates exact reminder dates by evenly spacing the repeat count within the reminder window. For example:

SSL cert expires June 10 · Remind 30 days before · Repeat 3 times

Reminder 1 → May 11 (30 days before)
Reminder 2 → May 21 (20 days before)
Reminder 3 → May 31 (10 days before)

Emails include the reminder number (e.g. "reminder 2 of 3") and a color-coded status — red for ≤7 days, amber for ≤14, green for 14+.

⏰ Automatic Daily Scheduler

No more relying on someone clicking "Send Reminders" manually. Enable the automatic reminder scheduler from Admin Settings, pick an hour (0–23, server time), and CLARES handles the rest:

Background check runs every 60 seconds
Fires once per day at the configured hour
A reminder_logs table tracks which reminder number has been sent per item — no duplicate emails
If the server was down on a reminder date, it catches up automatically on the next run
All activity logged in pod logs for observability

🔒 Security

Passwords hashed with bcrypt (12 rounds)
JWT tokens with configurable expiry (default 8 hours)
Role-based access control on every API endpoint
Case-insensitive login
Inactive account detection with clear error messaging
Sessions in sessionStorage — cleared on tab close

Deployment: Docker + Helm

The app is containerized with a multi-stage Dockerfile and supports multi-architecture builds (amd64/arm64). For Kubernetes deployment, there's a complete Helm chart with:

Deployment, Service, ConfigMap, Secret, and optional Ingress templates
Separate values files for minikube (local dev) and production
PostgreSQL deployed via the Bitnami Helm chart
One-command database initialization: kubectl exec deployment/clares -n clares -- node server/setup.js

The setup script is idempotent — it creates tables only if they don't exist and seeds a default admin user when the users table is empty. Safe to run multiple times.

Quick Start

# Clone the repo
git clone https://github.com/DevOpsArts/clares.git
cd clares

# Option 1: Local development
npm install
cp .env.example .env   # Edit with your DB credentials
npm run setup-db
npm run dev             # Frontend → :5174, API → :3002

# Option 2: Kubernetes with Helm
helm install clares-postgres bitnami/postgresql \
  --set auth.database=clares --set auth.username=clares \
  --namespace clares --create-namespace

helm install clares ./helm/clares-engine \
  -f ./helm/clares-engine/values-minikube.yaml \
  --namespace clares

kubectl exec deployment/clares -n clares -- node server/setup.js

# Login with admin / admin

Try It Out

CLARES is open source on GitHub: github.com/DevOpsArts/clares

Check out the project page: devopsarts.github.io/clares

If you're managing renewal dates in spreadsheets, give CLARES a try. It takes under 5 minutes to deploy and the default admin account is ready out of the box.

Built with React 18, Node.js, Express, PostgreSQL, Docker, and Helm. Deployed on Kubernetes.

The only Kubernetes log agent with intelligent error context capture, rule-based alerting, and 9 pluggable storage backends.

The Problem: Finding the Needle in the Log Haystack

Every SRE knows the pain: an alert fires at 3 AM, and you're digging through gigabytes of logs trying to understand what happened before the error. Traditional log solutions either capture everything (expensive) or miss crucial context (frustrating).

What if your log agent was smart enough to capture only what matters—the error AND the context around it—and alert you instantly?

Introducing Logsenta

Logsenta is an open-source Kubernetes log monitoring agent that solves this problem with intelligent error-aware context capture and rule-based alerting. Instead of blindly forwarding all logs, Logsenta:

🔍 Detects errors using configurable regex patterns
⏪ Captures context – logs BEFORE and AFTER the error
🚨 Alerts intelligently – route different errors to different teams
💾 Stores smartly – choose from 9 storage backends
⚡ Scales efficiently – handles 500+ pods with minimal resources

Key Features

🎯 Smart Error Detection

Logsenta uses regex and string-based pattern matching to detect errors across multiple languages and frameworks:

errorPatterns:
  - "ERROR"
  - "Exception"
  - "FATAL"
  - "panic:"
  - "Traceback"
  - "OOMKilled"
  - "CrashLoopBackOff"
  
  These patterns are "fully customizable"

🚨 Rule-Based Alerting NEW

Route different error patterns to different teams with customizable thresholds:

alerting:
  enabled: true
  rules:
    # Critical errors → On-call team immediately
    - name: "critical-errors"
      patterns: ["CRITICAL", "FATAL", "OOMKilled", "panic:"]
      threshold:
        count: 1          # Alert on FIRST occurrence
        windowSeconds: 60
      email:
        enabled: true
        toAddresses: ["oncall@company.com"]

    # Java exceptions → Backend team (after 2 occurrences)
    - name: "java-exceptions"  
      patterns: ["NullPointerException", "OutOfMemoryError"]
      threshold:
        count: 2          # Alert after 2 occurrences
        windowSeconds: 300
      email:
        enabled: true
        toAddresses: ["backend-team@company.com"]

    # Python errors → Data team
    - name: "python-errors"
      patterns: ["Traceback", "TypeError", "ValueError"]
      threshold:
        count: 1
      email:
        enabled: true
        toAddresses: ["data-team@company.com"]

Why rule-based alerting matters:

🎯 Reduce alert fatigue – only alert relevant teams
⏱️ Smart thresholds – distinguish between flaky tests and real outages
📧 Multiple channels – email and webhooks (Slack, PagerDuty, etc.)
🔄 Context included – alerts contain the actual error with surrounding logs

📦 9 Storage Backends

One agent, any storage destination:

Backend	Use Case
PostgreSQL	Relational queries, SQL analysis
MongoDB	Flexible document storage
Elasticsearch	Full-text search, Kibana dashboards
Azure Log Analytics	Azure ecosystem, KQL queries
AWS CloudWatch	AWS ecosystem, CloudWatch Insights
GCP Cloud Logging	Google Cloud ecosystem

🔄 Context Capture Window

Configure how much context to capture around errors:

captureWindow:
  bufferDurationMinutes: 2  # Lines BEFORE error
  captureAfterMinutes: 2    # Lines AFTER error

This means when an error occurs, you get the full story—not just the error line.

Quick Start

Deploy Logsenta in under 2 minutes:

# Clone the repository
git clone https://github.com/DevOpsArts/logsenta.git
cd logsenta

# Install with Helm (PostgreSQL backend + alerting)
helm install logsenta-engine ./charts/logsenta-engine \
  --namespace logsenta \
  --create-namespace \
  --set storage.type=postgresql \
  --set connections.postgresql.host=your-db-host \
  --set connections.postgresql.username=logsenta \
  --set connections.postgresql.password=YOUR_PASSWORD \
  --set alerting.enabled=true \
  --set alerting.email.smtpHost=smtp.company.com

Architecture

┌─────────────────────────────────────────────┐
│           Kubernetes Cluster                │
│                                             │
│  ┌─────┐  ┌─────┐  ┌─────┐                 │
│  │Pod A│  │Pod B│  │Pod C│  ← Monitored    │
│  └──┬──┘  └──┬──┘  └──┬──┘                 │
│     └────────┼────────┘                     │
│              ▼                              │
│     ┌────────────────────┐                  │
│     │   Logsenta-Engine  │                  │
│     │  • Error Detection │                  │
│     │  • Context Capture │                  │
│     │  • Rule-Based Alert│ ──► 📧 Email     │
│     │  • Rolling Buffer  │ ──► 🔗 Webhook   │
│     └─────────┬──────────┘                  │
└───────────────┼─────────────────────────────┘
                ▼
       ┌────────────────┐
       │ Storage Backend│
       │ (Your Choice)  │
       └────────────────┘

Production-Ready Features

🔒 Security: Non-root container, read-only filesystem, seccomp profiles
🔄 High Availability: Leader election for multi-replica deployments
📊 Scalability: ThreadPoolExecutor handles 500+ pods efficiently
🔐 SSL/TLS: Secure connections to all database backends
🛡️ Network Policies: Built-in network isolation templates
🚨 Smart Alerting: Rule-based routing with thresholds and cooldowns

Get Started Today

Logsenta is open-source and free to use. Check out the resources below:

📦 GitHub: github.com/DevOpsArts/logsenta
📖 Documentation: Wiki Documentation
🚨 Alerting Guide: Alerting Configuration
🐳 Docker Image: devopsart1/logsenta-engine

Have questions or feedback? Drop a comment below or open an issue on GitHub!

Tags: Kubernetes, DevOps, SRE, Logging, Monitoring, Alerting, Azure, AWS, GCP, Helm, Open Source

In Part 1, We have covered how to setup Grafana Loki and Grafana Agent to view Kubernetes pod logs

In Part 2, We have covered how to configure Grafana Agent on Windows VM and export application logs to Grafana Loki.

In this Part 3, We will see how to export Azure PAAS service logs to Grafana loki and view it from Grafana Dashboard.

Requirement:

Grafana loki
Azure eventhub
Azure AKS or any PAAS which is having option with "Diagnostics Settings"

Step 8: Create Azure Eventhub namespace,

Go to the Azure Portal and create the Event Hub namespace with one Event Hub. (Currently, we are going to use Azure AKS, so we will create one Event Hub named "aks" under the Event Hub namespace)

Step 9: Configure Azure AKS to send logs to Azure Eventhub,

Go to Azure AKS, in the side blade select "Diagnostic Settings", and choose "Add Diagnostic Setting".

Then, in the new page, select which logs need to be sent to the Event Hub and choose "Stream to an Event Hub". Here, provide the newly created Event Hub namespace and Event Hub.

Step 10: Configure Grafana Agent to scrap the messages from Azure eventhub,

Next, We need to pull the data from Azure eventhub and push it to Grafana loki,

In our existing grafana-agent-values.yaml add below lines to pull the messages from Azure eventhub and redeploy grafana agent in AKS.

Here is the reference github url and below is the yaml.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/grafana-agent-values-azure-aks.yaml

loki.source.azure_event_hubs "azure_aks" {

fully_qualified_namespace = "==XXX Eventhub namespace hostname XX===:9093"

event_hubs = ["aks"]

forward_to = [loki.write.local.receiver]

labels = {

"job" = "azure_aks",

}

authentication {

mechanism = "connection_string"

connection_string = " ===XXX Eventhub connection String XX==="

}

Replace the correct value for the above RED color. We can add multiple Event hubs in the Grafana agent by providing different Job names for each Azure PAAS.

Note : Make sure the communication is established between Azure AKS and Azure Eventhub to send the messages on port 9093.

Redeploy grafana agent in AKS using below command,

helm install --values grafana-agent-values-azure-aks.yaml grafana-agent grafana/grafana-agent -n observability

Check all the Grafana agent pods are up and running using below command,

kubectl get all -n observability

Now, the Grafana agent will pull the messages from Azure Event Hub and push them to Grafana Loki for Azure AKS, which is configured to send the logs in Diagnostic Settings.

We can verify the status of message processing from Azure Event Hub, including the status of incoming and outgoing messages.

Step 11: Access Azure AKS logs in Grafana dashboard,

Go to Grafan Dashboard, Home > Explore > Select Loki Datasource

In the filter section, select "Job" and value as the job name which is given in the grafana-agent-values-azure-aks.yaml. In our case the job name is "azure_aks"

Thats all, We have successfully deployed centralized logging with Grafana Loki, Grafana Agent for Kubernetes, VM application and Azure PAAS.

In Part 1, We covered how to setup Grafana Loki and Grafana Agent to view Kubernetes pod logs

In Part 2, We will explore how to configure Grafana Agent on VM and export application logs to Grafana Loki.

Requirement:

Grafana Loki
Grafana agent
Windows VM with one application
Grafana Dashboard

Step 6: Install Grafana Agent in Windows VM,

Download latest Windows Grafana agent from this location,

Windows: https://github.com/grafana/agent/releases/download/v0.40.2/grafana-agent-installer.exe.zip

For other operating system refer here,

https://github.com/grafana/agent/releases

Next double click the downloaded exe and install it, by default in windows the installer path is,

C:\Program Files\Grafana Agent

Once installation is completed, We need to update the configuration based on our needs like which application logs we need to send to Grafana loki.

In our case, we installed Grafana Dashboard in the windows VM and configured the Grafana dashboard logs in Grafana agent.

Similarly, we can add multiple application with different Job names.

Copy the grafana agent config file from below repo and update the required changes according on your needs.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/agent-config.yaml

Next start the grafana agent service from services.msc

We can start manually by below command in command prompt as well.

In command Prompt go to, C:\Program Files\Grafana Agent

Execute below command,

grafana-agent-windows-amd64.exe --config.file=agent-config.yaml

This will help to find any issue with the configuration.

Note : Here the Grafana loki distributed service endpoint(which is configured in the agent-config.yaml) should be accessible from the windows VM

Step 7 : Access VM application logs in Grafana Loki,

Go to Grafana Dashboard > Home > Explore > Select Loki Datasource

In the filter section, select "Job" and value as the job name which is given in the agent-config.yaml. In our case the job name is "devopsart-vm"

Now We are able to view the Grafana Dashboard logs in Grafana Loki. You can create the Dashboard from here based on your preference.

In Part 2, We covered how to export Windows VM application logs to Grafana Loki and how to view them from the Grafana Dashboard.

In Part 3, We will cover how to export Azure PAAS services logs to Grafana Loki

Dealing with multiple tools for capturing application logs from different sources can be a hassle for anyone. In this blog post, we'll dive into the steps required to establish centralized logging with Grafana Loki and Grafana Agent. This solution will allow us to unify the collection of logs from Kubernetes pods, VM services, and Azure PAAS services.

Grafana Loki : It is a highly scalable log aggregation system designed for cloud-native environments

Grafana Agent : It is an observability agent that collects metrics and logs from various application for visualization and analysis in Grafana

Requirement:

Kuberentes Cluster (Latest version)
Helm
Azure PAAS(Eventhub and AKS, IOT, etc)
VM with any application
Azure storage account(Loki backend)
Azure subscription with admin privileges

Step 1: Deploy Grafana loki in Kubernetes(K8s) cluster,

Ensure you have admin permission for the k8s cluster.

Before deploying Grafana Loki in k8s cluster, there are certain changes are required in the configuration.

Note : We are going to use backend as Azure Storage container in loki to store the logs and we will use Loki distributed version.

Execute below helm commands to add the Grafana repository,

helm repo add grafana https://grafana.github.io/helm-charts

helm repo update

Execute below command to export the Grafana Loki and Grafana agent configuration via helm,

helm show values grafana/loki-distributed > loki-values.yaml

helm show values grafana/grafana-agent > grafana-agent-values.yaml

In the loki-values.yaml, update the below configuration to use Azure storage account as backend.

schemaConfig:

configs:

- from: "2020-09-07"

store: boltdb-shipper

object_store: azure

schema: v11

index:

prefix: index_

period: 24h

storageConfig:

boltdb_shipper:

shared_store: azure

active_index_directory: /var/loki/index

cache_location: /var/loki/cache

cache_ttl: 1h

filesystem:

directory: /var/loki/chunks

azure:

account_name: === Azure Storage name ===

account_key: === Azure Storage access key ===

container_name: === Container Name ===

request_timeout: 0

Here is the loki-values.yaml.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/loki-distributed-values.yaml

Next deploy Grafana loki,

Execute below command to deploy Loki in the k8s cluster,

helm upgrade --install --values loki-distributed-values.yaml loki grafana/loki-distributed -n observability --create-namespace

Verify the pods are up and running by using below command,

kubectl get all -n observability

Now all the pods are up and running.

Step 2: Deploy Grafana agent in K8s cluster,

Deploy Grafana Agent in k8s cluster to export the k8s Pod logs to Loki

Update the grafana agent values before deploying.

Replace the grafana-agent-values.yaml file with loki distributed gateway service endpoint in line number 169 in the below file with your namespace. Currently observability is used.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/grafana-agent-values.yaml

loki.write "local" {

endpoint {

url = "http://loki-loki-distributed-gateway.observability.svc.cluster.local/loki/api/v1/push"

}

In the grafana-agent-values.yaml, Currently it is added to export k8s pod logs, k8s events, etc.

Next deploy Grafana-Agent using below command,

helm install --values grafana-agent-values.yaml grafana-agent grafana/grafana-agent -n observability

Verify the pods are up and running by using below command,

kubectl get all -n observability

Now you can go to azure storage account which is configured in Grafana Loki and verify the logs are getting updated to the respective container.

Step 3: Deploy Grafana, to view the pod logs

Execute below command to install Grafana in K8s cluster,

helm install grafana grafana/grafana -n observability

Verify the pods are up and running by using below command,

kubectl get all -n observability

use below command to get the password for Grafana,

kubectl get secret --namespace observability grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Step 4: Access Grafana and view the pod logs

Use port forward or use NodePort or Ingress to access the Grafana Dashboard.

- Configure loki as Datasource in Grafana,

Once login to the Grafana, Go to Home > Connections > Search for "Loki", then select it.

Next give connection url as,

http://loki-loki-distributed-gateway.monitoring1.svc.cluster.local

then select save and test.

- Next import Loki dashboard,

Go to Grafana Home > Dashboard > select New and Import.

Use this below Grafana template ID "15141" , load it and save it.

https://grafana.com/grafana/dashboards/15141-kubernetes-service-logs/?source=post_page-----d2847d526f9e--------------------------------

Next click the dashboard, You can able to view all the pod logs now.

Step 5: View the Kubernetes(K8s) Events in Grafana,

To view the Kubernetes events, Go to Grafana Home page > Explore > Select "Datasource name" > Select "Job" and value as "loki.source.kubernetes_events"

In this part, We have covered how to deploy Grafana Loki with Azure Storage as the backend, deployed Grafana Agent to view Kubernetes pod logs, and deployed Grafana dashboard to visualize the pod logs and Kubernetes events.

In the next Part 2, We will explain how to configure Grafana Agent for VM applications and send the logs to the same Grafana Loki.

Part 3 : We will cover how to export Azure PAAS services logs to Grafana Loki

In this blog, we will explore a new tool called 'Rover,' which helps to visualize the Terraform plan

Rover : This open-source tool is designed to visualize Terraform Plan output, offering insights into infrastructure and its dependencies.

We will use the "Rover" docker image, to do our setup and visualize the infra.

Requirements:

1.Linux/Windows VM

2. Docker

Steps 1 : Generate terraform plan output

I have a sample Azure terraform block in devopsart folder, will generate terraform plan output from there and store is locally.

cd devopsart

terraform plan -out tfplan.out

terraform show -json tfplan.out > tfplan.json

Now both the files are generated.

Step 2 : Run Rover tool locally,

Execute below docker command to run rover from the same step 1 path,

docker run --rm -it -p 9000:9000 -v $(pwd)/tfplan.json:/src/tfplan.json im2nguyen/rover:latest -planJSONPath=tfplan.json

Its run the webUI in port number 9000.

Step 3 : Accessing Rover WebUI,

Lets access the WebUI and check it,

Go to browser, and enter http://localhost:9000

In the UI, color codes on the left side provide assistance in understanding the actions that will take place for the resources when running terraform apply.

When a specific resource is selected from the image, it will provide the name and parameter information.

Additionally, the image can be saved locally by clicking the 'Save' option

I hope this is helpful for someone who is genuinely confused by the Terraform plan output, especially when dealing with a large infrastructure.

Thanks for reading!! We have tried Rover tool and experimented with examples.

Reference:

https://github.com/im2nguyen/rover

In this blog, we will see a new tool called Infracost, which helps provide expected cloud cost estimates based on Terraform code. We will cover the installation and demonstrate how to use this tool.

Infracost : It provides cloud cost projections from Terraform. It enables engineers to view a detailed cost breakdown and comprehend expenses before implementions.

Requirement :

1. One window/Linux VM

2.Terraform

3.Terraform examples

Step 1 : infracost installation,

For Mac, use below brew command to do the installation,

# brew install infracost

For other Operating systems, follow below link,

https://www.infracost.io/docs/#quick-start

Step 2 : Infracost configuration,

We need to set up the Infracost API key by signing up here,

https://dashboard.infracost.io

Once logged in, visit the following URL to obtain the API key,

https://dashboard.infracost.io/org/praboosingh/settings/general

Next, open the terminal and set the key as an environment variable using the following command,

# export INFRACOST_API_KEY=XXXXXXXXXXXXX

or You can log in to the Infracost UI and grant terminal access by using the following command,

# infracost auth login

Note : Infracost will not send any cloud information to their server.

Step 3 : Infracost validation

Next, We will do the validation. For validation purpose i have cloned below github repo which contains terraform examples.

# git clone https://github.com/alfonsof/terraform-azure-examples.git

# cd terraform-azure-examples/code/01-hello-world

try infracost by using below command to get the estimated cost for a month,

# infracost breakdown --path .

To save the report in json format and upload to infracost server, use below command,

# infracost breakdown --path . --format json --out-file infracost-demo.json

# infracost upload --path infracost-demo.json

In case we plan to upgrade the infrastructure and need to understand the new cost, execute the following command to compare it with the previously saved output from the Terraform code path.

# infracost diff --path . --compare-to infracost-demo.json

Thanks for reading!! We have installed infracost and experimented with examples.

References:

https://github.com/infracost/infracost

https://www.infracost.io/docs/#quick-start

⚡ PerfCatch

Introducing PerfCatch — eBPF Per-Request Performance Monitoring for Kubernetes Applications

Measure CPU time, memory, network I/O, and duration for every HTTP request — zero code changes required.

The Problem

APM tools require SDK instrumentation, code changes, and per-host licensing. What if you could get per-request resource measurement without touching a single line of application code?

Introducing PerfCatch

PerfCatch is an open-source eBPF-based monitoring tool that measures resource consumption for every individual HTTP request hitting your Kubernetes pods.

It deploys as a DaemonSet, attaches eBPF programs to kernel TCP functions, and captures detailed per-request metrics automatically.

Zero instrumentation required. Works with any language — Python, Go, Java, Node.js, Rust, C# — because it operates at the kernel level.

What It Measures

Metric	Source	Description
`duration_ms`	accept() → tcp_close()	Total request wall-clock time
`cpu_time_ms`	sched_switch tracepoint	Actual on-CPU execution time
`memory_rss_bytes`	/proc/<pid>/status	Process RSS memory at request time
`bytes_received`	tcp_recvmsg kprobe	Network bytes received per request
`bytes_sent`	tcp_sendmsg kprobe	Network bytes sent per request
`correlation_id`	HTTP header capture	Extracted from X-Correlation-ID, traceparent, etc.
`http_method/path`	TCP stream first bytes	GET /compute extracted from request line

Key Features

Per-request granularity — CPU, memory, duration, and network per individual HTTP request
Real CPU time — actual on-CPU nanoseconds from kernel scheduler, not estimation
Correlation ID tracking — auto-extracts from 7 built-in headers + custom ones
< 1% CPU overhead — eBPF runs in kernel space with zero-copy event delivery
500+ requests/sec throughput per node
Helm chart deployment — single command with bundled Prometheus + Grafana
Pre-built Grafana dashboard — 8 panels with request table, time-series charts, and filtering
Flexible storage — in-memory ring buffer, Prometheus, VictoriaMetrics, or SQLite
Dependency tracking — captures outbound TCP connections during request processing

How It Works

PerfCatch runs 3 eBPF programs in kernel space:

request_tracker.c — tracks TCP accept → send → close lifecycle per connection
dependency_tracker.c — captures outbound calls made during request processing
resource_tracker.c — accumulates real CPU time via sched_switch tracepoint

In userspace, a Collector correlates events with K8s pod metadata, stores them in a 50K-entry ring buffer, and exposes both Prometheus histograms and a JSON query API.

Quick Start

Step 1: Deploy

helm install perfcatch charts/perfcatch \
  -n perfcatch --create-namespace \
  --set config.namespace=my-app

Step 2: Verify

kubectl -n perfcatch get pods
# NAME              READY   STATUS    RESTARTS   AGE
# perfcatch-znplr   1/1     Running   0          30s

Step 3: Send requests with correlation IDs

curl -H "X-Correlation-ID: order-12345" http://my-app/compute

Step 4: Query the API

curl localhost:9090/api/requests?correlation_id=order-12345

{
  "count": 1,
  "requests": [{
    "correlation_id": "order-12345",
    "pod_name": "sample-app-5879fb87f5-zcfbr",
    "http_method": "GET",
    "http_path": "/compute",
    "duration_ms": 9.24,
    "cpu_time_ms": 8.90,
    "memory_rss_bytes": 53518336,
    "bytes_received": 161,
    "bytes_sent": 170
  }]
}

Step 5: View in Grafana

kubectl -n monitoring port-forward svc/grafana 3000:3000
# Open http://localhost:3000 (admin / perfcatch)
# Dashboard: "PerfCatch - eBPF Request Metrics"

Deployment Options

Mode	What You Get
Full Stack (default)	Agent + Prometheus (5Gi PVC, 15d retention) + Grafana with pre-built dashboard
With VictoriaMetrics	Full stack + long-term per-request storage via Remote Write (30d retention)
Existing Prometheus	Agent only — auto-discovered via pod annotations
Standalone	No Prometheus. SQLite persistence on hostPath. Query via /api/requests

Supported Correlation Headers

PerfCatch auto-detects these headers from HTTP requests (configurable):

x-correlation-id
x-request-id
x-trace-id
traceparent
x-amzn-trace-id
request-id
correlation-id
+ any custom header via config

Why PerfCatch?

	PerfCatch	Traditional APM
Code changes	None	SDK integration required
CPU overhead	< 1%	5-15%
Language support	Any (kernel-level)	Language-specific agents
CPU time accuracy	Real on-CPU ns (sched_switch)	Wall-clock estimation
Cost	Free / open source	Per-host licensing
Deploy time	1 Helm command	Hours of SDK integration

The Problem

Introducing Podmortem

Podmortem Architecture

Key Features

⚡ Real-time pod restart detection via Kubernetes Watch API
📋 Captures previous container logs (the crashed container's output)
🔍 Records pod events at the exact moment of restart
💾 SQLite-backed searchable history — survives beyond K8s 1-hour TTL
🎯 Rich CLI with filtering by namespace, pod, time range
🗑️ Built-in purge command for housekeeping
☸️ Helm chart for one-command deployment

How It Works

Detection — Watches Kubernetes API for pod lifecycle events using the Watch API with near real-time monitoring (<1s delay)
Context Capture — Grabs restart reason, previous container logs, pod status, and environment metadata
Data Processing — Normalizes data, deduplicates, classifies root cause (OOMKill, CrashLoopBackOff, Error), aligns timestamps
Persistent Storage — Stores in SQLite with indexing for fast queries and long-term retention
Insight & Retrieval — Query restart history, build debug timelines, detect recurring failure patterns

Quick Start — Deploy with Helm

# Install to your cluster
helm install podmortem charts/podmortem \
  -n podmortem --create-namespace

# Verify it's running
kubectl get pods -n podmortem

Query Restart History

No local install needed — exec directly into the pod:

# Get pod name
POD=$(kubectl get pod -n podmortem \
  -l app.kubernetes.io/name=podmortem \
  -o jsonpath='{.items[0].metadata.name}')

# View recent restarts
kubectl exec -n podmortem $POD -- podmortem history

# Filter by namespace and pod
kubectl exec -n podmortem $POD -- podmortem history -n production -p my-app

# Full crash details with logs
kubectl exec -n podmortem $POD -- podmortem detail 1

Example Output

                Pod Restart History (3 records)
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ ID ┃ Timestamp           ┃ Namespace  ┃ Pod         ┃ Reason  ┃ Exit    ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│  3 │ 2026-05-22T14:08:53 │ clares-ns  │ clares-pod  │ OOMKill │    137  │
│  2 │ 2026-05-22T14:03:46 │ clares-ns  │ clares-pod  │ OOMKill │    137  │
│  1 │ 2026-05-22T13:58:41 │ clares-ns  │ clares-pod  │ OOMKill │    137  │
└────┴─────────────────────┴────────────┴─────────────┴─────────┴─────────┘

Housekeeping with Purge

# Delete records older than a date
kubectl exec -n podmortem $POD -- podmortem purge --before "2026-05-01T00:00:00" -y

# Delete by namespace
kubectl exec -n podmortem $POD -- podmortem purge -n staging -y

# Wipe everything
kubectl exec -n podmortem $POD -- podmortem purge --all -y

Helm Configuration

Parameter	Default	Description
`watchNamespace`	`""` (all)	Namespace to watch
`persistence.enabled`	`true`	Enable PVC for SQLite
`persistence.size`	`1Gi`	Storage size
`resources.limits.memory`	`256Mi`	Memory limit
`verbose`	`true`	Debug logging

Why Podmortem?

Without Podmortem	With Podmortem
Events expire after ~1 hour	Permanent searchable history
Previous logs lost on next restart	Logs captured at crash time
Manual `kubectl describe` per pod	Aggregated view across all pods
No pattern visibility	Detect recurring failures

Get It

🔗 GitHub: github.com/DevOpsArts/podmortem
🐳 Docker Hub: devopsart1/podmortem

Built by DevOpsArt — because every pod crash tells a story.

Ever missed a certificate renewal and had a production outage? Or forgot to renew a software license until it was too late? I built CLARES to make sure that never happens again.

What is CLARES?

The Problem

In any enterprise environment, you're juggling dozens (or hundreds) of:

🔒 SSL/TLS Certificates — expiring silently until your website goes down
📜 Software Licenses — renewal dates buried in procurement emails
📋 Compliance Certificates — audit deadlines that sneak up on you
🔑 API Keys, Tokens, Secrets — rotating credentials on schedule

How It Works

Tech Stack

Layer	Technology
Frontend	React 18, Vite 5, React Router v6
Backend	Node.js 20, Express 4
Database	PostgreSQL
Auth	JWT + bcrypt
Email	Nodemailer (configurable SMTP)
Deployment	Docker (multi-arch), Helm, Kubernetes

Architecture

🎨 React 18
Vite + Router v6

→

⚙️ Express API
Node.js + JWT

→

🗃️ PostgreSQL
Database

Key Features

📊 Smart Dashboard

Items are automatically grouped by urgency. No more scanning through spreadsheets — you instantly see what needs attention. Summary cards give you counts per catalog type at a glance.

📁 Flexible Catalogs

📤 Bulk CSV Upload

Download a CSV template, fill it in, and upload up to 500 rows at once. Perfect for initial data migration or when you inherit a spreadsheet full of renewal dates.

👥 Granular User Permissions

This was one of the trickier features. The permission model has two layers:

Global role: Admin (everything) or Viewer (read-only)
Per-catalog role: No Access, View, or Admin for each individual catalog

A user can be a global Viewer but have Admin rights on specific catalogs. This means you can delegate management of SSL certificates to the infra team without giving them access to license data.

📧 Smart Email Reminders

The system calculates exact reminder dates by evenly spacing the repeat count within the reminder window. For example:

SSL cert expires June 10 · Remind 30 days before · Repeat 3 times

Reminder 1 → May 11 (30 days before)
Reminder 2 → May 21 (20 days before)
Reminder 3 → May 31 (10 days before)

Emails include the reminder number (e.g. "reminder 2 of 3") and a color-coded status — red for ≤7 days, amber for ≤14, green for 14+.

⏰ Automatic Daily Scheduler

No more relying on someone clicking "Send Reminders" manually. Enable the automatic reminder scheduler from Admin Settings, pick an hour (0–23, server time), and CLARES handles the rest:

Background check runs every 60 seconds
Fires once per day at the configured hour
A reminder_logs table tracks which reminder number has been sent per item — no duplicate emails
If the server was down on a reminder date, it catches up automatically on the next run
All activity logged in pod logs for observability

🔒 Security

Passwords hashed with bcrypt (12 rounds)
JWT tokens with configurable expiry (default 8 hours)
Role-based access control on every API endpoint
Case-insensitive login
Inactive account detection with clear error messaging
Sessions in sessionStorage — cleared on tab close

Deployment: Docker + Helm

The app is containerized with a multi-stage Dockerfile and supports multi-architecture builds (amd64/arm64). For Kubernetes deployment, there's a complete Helm chart with:

Deployment, Service, ConfigMap, Secret, and optional Ingress templates
Separate values files for minikube (local dev) and production
PostgreSQL deployed via the Bitnami Helm chart
One-command database initialization: kubectl exec deployment/clares -n clares -- node server/setup.js

The setup script is idempotent — it creates tables only if they don't exist and seeds a default admin user when the users table is empty. Safe to run multiple times.

Quick Start

# Clone the repo
git clone https://github.com/DevOpsArts/clares.git
cd clares

# Option 1: Local development
npm install
cp .env.example .env   # Edit with your DB credentials
npm run setup-db
npm run dev             # Frontend → :5174, API → :3002

# Option 2: Kubernetes with Helm
helm install clares-postgres bitnami/postgresql \
  --set auth.database=clares --set auth.username=clares \
  --namespace clares --create-namespace

helm install clares ./helm/clares-engine \
  -f ./helm/clares-engine/values-minikube.yaml \
  --namespace clares

kubectl exec deployment/clares -n clares -- node server/setup.js

# Login with admin / admin

Try It Out

CLARES is open source on GitHub: github.com/DevOpsArts/clares

Check out the project page: devopsarts.github.io/clares

If you're managing renewal dates in spreadsheets, give CLARES a try. It takes under 5 minutes to deploy and the default admin account is ready out of the box.

Built with React 18, Node.js, Express, PostgreSQL, Docker, and Helm. Deployed on Kubernetes.

The only Kubernetes log agent with intelligent error context capture, rule-based alerting, and 9 pluggable storage backends.

The Problem: Finding the Needle in the Log Haystack

What if your log agent was smart enough to capture only what matters—the error AND the context around it—and alert you instantly?

Introducing Logsenta

🔍 Detects errors using configurable regex patterns
⏪ Captures context – logs BEFORE and AFTER the error
🚨 Alerts intelligently – route different errors to different teams
💾 Stores smartly – choose from 9 storage backends
⚡ Scales efficiently – handles 500+ pods with minimal resources

Key Features

🎯 Smart Error Detection

Logsenta uses regex and string-based pattern matching to detect errors across multiple languages and frameworks:

errorPatterns:
  - "ERROR"
  - "Exception"
  - "FATAL"
  - "panic:"
  - "Traceback"
  - "OOMKilled"
  - "CrashLoopBackOff"
  
  These patterns are "fully customizable"

🚨 Rule-Based Alerting NEW

Route different error patterns to different teams with customizable thresholds:

alerting:
  enabled: true
  rules:
    # Critical errors → On-call team immediately
    - name: "critical-errors"
      patterns: ["CRITICAL", "FATAL", "OOMKilled", "panic:"]
      threshold:
        count: 1          # Alert on FIRST occurrence
        windowSeconds: 60
      email:
        enabled: true
        toAddresses: ["oncall@company.com"]

    # Java exceptions → Backend team (after 2 occurrences)
    - name: "java-exceptions"  
      patterns: ["NullPointerException", "OutOfMemoryError"]
      threshold:
        count: 2          # Alert after 2 occurrences
        windowSeconds: 300
      email:
        enabled: true
        toAddresses: ["backend-team@company.com"]

    # Python errors → Data team
    - name: "python-errors"
      patterns: ["Traceback", "TypeError", "ValueError"]
      threshold:
        count: 1
      email:
        enabled: true
        toAddresses: ["data-team@company.com"]

Why rule-based alerting matters:

🎯 Reduce alert fatigue – only alert relevant teams
⏱️ Smart thresholds – distinguish between flaky tests and real outages
📧 Multiple channels – email and webhooks (Slack, PagerDuty, etc.)
🔄 Context included – alerts contain the actual error with surrounding logs

📦 9 Storage Backends

One agent, any storage destination:

Backend	Use Case
PostgreSQL	Relational queries, SQL analysis
MongoDB	Flexible document storage
Elasticsearch	Full-text search, Kibana dashboards
Azure Log Analytics	Azure ecosystem, KQL queries
AWS CloudWatch	AWS ecosystem, CloudWatch Insights
GCP Cloud Logging	Google Cloud ecosystem

🔄 Context Capture Window

Configure how much context to capture around errors:

captureWindow:
  bufferDurationMinutes: 2  # Lines BEFORE error
  captureAfterMinutes: 2    # Lines AFTER error

This means when an error occurs, you get the full story—not just the error line.

Quick Start

Deploy Logsenta in under 2 minutes:

# Clone the repository
git clone https://github.com/DevOpsArts/logsenta.git
cd logsenta

# Install with Helm (PostgreSQL backend + alerting)
helm install logsenta-engine ./charts/logsenta-engine \
  --namespace logsenta \
  --create-namespace \
  --set storage.type=postgresql \
  --set connections.postgresql.host=your-db-host \
  --set connections.postgresql.username=logsenta \
  --set connections.postgresql.password=YOUR_PASSWORD \
  --set alerting.enabled=true \
  --set alerting.email.smtpHost=smtp.company.com

Architecture

┌─────────────────────────────────────────────┐
│           Kubernetes Cluster                │
│                                             │
│  ┌─────┐  ┌─────┐  ┌─────┐                 │
│  │Pod A│  │Pod B│  │Pod C│  ← Monitored    │
│  └──┬──┘  └──┬──┘  └──┬──┘                 │
│     └────────┼────────┘                     │
│              ▼                              │
│     ┌────────────────────┐                  │
│     │   Logsenta-Engine  │                  │
│     │  • Error Detection │                  │
│     │  • Context Capture │                  │
│     │  • Rule-Based Alert│ ──► 📧 Email     │
│     │  • Rolling Buffer  │ ──► 🔗 Webhook   │
│     └─────────┬──────────┘                  │
└───────────────┼─────────────────────────────┘
                ▼
       ┌────────────────┐
       │ Storage Backend│
       │ (Your Choice)  │
       └────────────────┘

Production-Ready Features

🔒 Security: Non-root container, read-only filesystem, seccomp profiles
🔄 High Availability: Leader election for multi-replica deployments
📊 Scalability: ThreadPoolExecutor handles 500+ pods efficiently
🔐 SSL/TLS: Secure connections to all database backends
🛡️ Network Policies: Built-in network isolation templates
🚨 Smart Alerting: Rule-based routing with thresholds and cooldowns

Get Started Today

Logsenta is open-source and free to use. Check out the resources below:

📦 GitHub: github.com/DevOpsArts/logsenta
📖 Documentation: Wiki Documentation
🚨 Alerting Guide: Alerting Configuration
🐳 Docker Image: devopsart1/logsenta-engine

Have questions or feedback? Drop a comment below or open an issue on GitHub!

Tags: Kubernetes, DevOps, SRE, Logging, Monitoring, Alerting, Azure, AWS, GCP, Helm, Open Source

In Part 1, We have covered how to setup Grafana Loki and Grafana Agent to view Kubernetes pod logs

In Part 2, We have covered how to configure Grafana Agent on Windows VM and export application logs to Grafana Loki.

In this Part 3, We will see how to export Azure PAAS service logs to Grafana loki and view it from Grafana Dashboard.

Requirement:

Grafana loki
Azure eventhub
Azure AKS or any PAAS which is having option with "Diagnostics Settings"

Step 8: Create Azure Eventhub namespace,

Go to the Azure Portal and create the Event Hub namespace with one Event Hub. (Currently, we are going to use Azure AKS, so we will create one Event Hub named "aks" under the Event Hub namespace)

Step 9: Configure Azure AKS to send logs to Azure Eventhub,

Go to Azure AKS, in the side blade select "Diagnostic Settings", and choose "Add Diagnostic Setting".

Then, in the new page, select which logs need to be sent to the Event Hub and choose "Stream to an Event Hub". Here, provide the newly created Event Hub namespace and Event Hub.

Step 10: Configure Grafana Agent to scrap the messages from Azure eventhub,

Next, We need to pull the data from Azure eventhub and push it to Grafana loki,

In our existing grafana-agent-values.yaml add below lines to pull the messages from Azure eventhub and redeploy grafana agent in AKS.

Here is the reference github url and below is the yaml.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/grafana-agent-values-azure-aks.yaml

loki.source.azure_event_hubs "azure_aks" {

fully_qualified_namespace = "==XXX Eventhub namespace hostname XX===:9093"

event_hubs = ["aks"]

forward_to = [loki.write.local.receiver]

labels = {

"job" = "azure_aks",

}

authentication {

mechanism = "connection_string"

connection_string = " ===XXX Eventhub connection String XX==="

}

Replace the correct value for the above RED color. We can add multiple Event hubs in the Grafana agent by providing different Job names for each Azure PAAS.

Note : Make sure the communication is established between Azure AKS and Azure Eventhub to send the messages on port 9093.

Redeploy grafana agent in AKS using below command,

helm install --values grafana-agent-values-azure-aks.yaml grafana-agent grafana/grafana-agent -n observability

Check all the Grafana agent pods are up and running using below command,

kubectl get all -n observability

Now, the Grafana agent will pull the messages from Azure Event Hub and push them to Grafana Loki for Azure AKS, which is configured to send the logs in Diagnostic Settings.

We can verify the status of message processing from Azure Event Hub, including the status of incoming and outgoing messages.

Step 11: Access Azure AKS logs in Grafana dashboard,

Go to Grafan Dashboard, Home > Explore > Select Loki Datasource

In the filter section, select "Job" and value as the job name which is given in the grafana-agent-values-azure-aks.yaml. In our case the job name is "azure_aks"

Thats all, We have successfully deployed centralized logging with Grafana Loki, Grafana Agent for Kubernetes, VM application and Azure PAAS.

In Part 1, We covered how to setup Grafana Loki and Grafana Agent to view Kubernetes pod logs

In Part 2, We will explore how to configure Grafana Agent on VM and export application logs to Grafana Loki.

Requirement:

Grafana Loki
Grafana agent
Windows VM with one application
Grafana Dashboard

Step 6: Install Grafana Agent in Windows VM,

Download latest Windows Grafana agent from this location,

Windows: https://github.com/grafana/agent/releases/download/v0.40.2/grafana-agent-installer.exe.zip

For other operating system refer here,

https://github.com/grafana/agent/releases

Next double click the downloaded exe and install it, by default in windows the installer path is,

C:\Program Files\Grafana Agent

Once installation is completed, We need to update the configuration based on our needs like which application logs we need to send to Grafana loki.

In our case, we installed Grafana Dashboard in the windows VM and configured the Grafana dashboard logs in Grafana agent.

Similarly, we can add multiple application with different Job names.

Copy the grafana agent config file from below repo and update the required changes according on your needs.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/agent-config.yaml

Next start the grafana agent service from services.msc

We can start manually by below command in command prompt as well.

In command Prompt go to, C:\Program Files\Grafana Agent

Execute below command,

grafana-agent-windows-amd64.exe --config.file=agent-config.yaml

This will help to find any issue with the configuration.

Note : Here the Grafana loki distributed service endpoint(which is configured in the agent-config.yaml) should be accessible from the windows VM

Step 7 : Access VM application logs in Grafana Loki,

Go to Grafana Dashboard > Home > Explore > Select Loki Datasource

In the filter section, select "Job" and value as the job name which is given in the agent-config.yaml. In our case the job name is "devopsart-vm"

Now We are able to view the Grafana Dashboard logs in Grafana Loki. You can create the Dashboard from here based on your preference.

In Part 2, We covered how to export Windows VM application logs to Grafana Loki and how to view them from the Grafana Dashboard.

In Part 3, We will cover how to export Azure PAAS services logs to Grafana Loki

Grafana Loki : It is a highly scalable log aggregation system designed for cloud-native environments

Grafana Agent : It is an observability agent that collects metrics and logs from various application for visualization and analysis in Grafana

Requirement:

Kuberentes Cluster (Latest version)
Helm
Azure PAAS(Eventhub and AKS, IOT, etc)
VM with any application
Azure storage account(Loki backend)
Azure subscription with admin privileges

Step 1: Deploy Grafana loki in Kubernetes(K8s) cluster,

Ensure you have admin permission for the k8s cluster.

Before deploying Grafana Loki in k8s cluster, there are certain changes are required in the configuration.

Note : We are going to use backend as Azure Storage container in loki to store the logs and we will use Loki distributed version.

Execute below helm commands to add the Grafana repository,

helm repo add grafana https://grafana.github.io/helm-charts

helm repo update

Execute below command to export the Grafana Loki and Grafana agent configuration via helm,

helm show values grafana/loki-distributed > loki-values.yaml

helm show values grafana/grafana-agent > grafana-agent-values.yaml

In the loki-values.yaml, update the below configuration to use Azure storage account as backend.

schemaConfig:

configs:

- from: "2020-09-07"

store: boltdb-shipper

object_store: azure

schema: v11

index:

prefix: index_

period: 24h

storageConfig:

boltdb_shipper:

shared_store: azure

active_index_directory: /var/loki/index

cache_location: /var/loki/cache

cache_ttl: 1h

filesystem:

directory: /var/loki/chunks

azure:

account_name: === Azure Storage name ===

account_key: === Azure Storage access key ===

container_name: === Container Name ===

request_timeout: 0

Here is the loki-values.yaml.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/loki-distributed-values.yaml

Next deploy Grafana loki,

Execute below command to deploy Loki in the k8s cluster,

helm upgrade --install --values loki-distributed-values.yaml loki grafana/loki-distributed -n observability --create-namespace

Verify the pods are up and running by using below command,

kubectl get all -n observability

Now all the pods are up and running.

Step 2: Deploy Grafana agent in K8s cluster,

Deploy Grafana Agent in k8s cluster to export the k8s Pod logs to Loki

Update the grafana agent values before deploying.

Replace the grafana-agent-values.yaml file with loki distributed gateway service endpoint in line number 169 in the below file with your namespace. Currently observability is used.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/grafana-agent-values.yaml

loki.write "local" {

endpoint {

url = "http://loki-loki-distributed-gateway.observability.svc.cluster.local/loki/api/v1/push"

}

In the grafana-agent-values.yaml, Currently it is added to export k8s pod logs, k8s events, etc.

Next deploy Grafana-Agent using below command,

helm install --values grafana-agent-values.yaml grafana-agent grafana/grafana-agent -n observability

Verify the pods are up and running by using below command,

kubectl get all -n observability

Now you can go to azure storage account which is configured in Grafana Loki and verify the logs are getting updated to the respective container.

Step 3: Deploy Grafana, to view the pod logs

Execute below command to install Grafana in K8s cluster,

helm install grafana grafana/grafana -n observability

Verify the pods are up and running by using below command,

kubectl get all -n observability

use below command to get the password for Grafana,

kubectl get secret --namespace observability grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Step 4: Access Grafana and view the pod logs

Use port forward or use NodePort or Ingress to access the Grafana Dashboard.

- Configure loki as Datasource in Grafana,

Once login to the Grafana, Go to Home > Connections > Search for "Loki", then select it.

Next give connection url as,

http://loki-loki-distributed-gateway.monitoring1.svc.cluster.local

then select save and test.

- Next import Loki dashboard,

Go to Grafana Home > Dashboard > select New and Import.

Use this below Grafana template ID "15141" , load it and save it.

https://grafana.com/grafana/dashboards/15141-kubernetes-service-logs/?source=post_page-----d2847d526f9e--------------------------------

Next click the dashboard, You can able to view all the pod logs now.

Step 5: View the Kubernetes(K8s) Events in Grafana,

To view the Kubernetes events, Go to Grafana Home page > Explore > Select "Datasource name" > Select "Job" and value as "loki.source.kubernetes_events"

In the next Part 2, We will explain how to configure Grafana Agent for VM applications and send the logs to the same Grafana Loki.

Part 3 : We will cover how to export Azure PAAS services logs to Grafana Loki

In this blog, we will explore a new tool called 'Rover,' which helps to visualize the Terraform plan

Rover : This open-source tool is designed to visualize Terraform Plan output, offering insights into infrastructure and its dependencies.

We will use the "Rover" docker image, to do our setup and visualize the infra.

Requirements:

1.Linux/Windows VM

2. Docker

Steps 1 : Generate terraform plan output

I have a sample Azure terraform block in devopsart folder, will generate terraform plan output from there and store is locally.

cd devopsart

terraform plan -out tfplan.out

terraform show -json tfplan.out > tfplan.json

Now both the files are generated.

Step 2 : Run Rover tool locally,

Execute below docker command to run rover from the same step 1 path,

docker run --rm -it -p 9000:9000 -v $(pwd)/tfplan.json:/src/tfplan.json im2nguyen/rover:latest -planJSONPath=tfplan.json

Its run the webUI in port number 9000.

Step 3 : Accessing Rover WebUI,

Lets access the WebUI and check it,

Go to browser, and enter http://localhost:9000

In the UI, color codes on the left side provide assistance in understanding the actions that will take place for the resources when running terraform apply.

When a specific resource is selected from the image, it will provide the name and parameter information.

Additionally, the image can be saved locally by clicking the 'Save' option

I hope this is helpful for someone who is genuinely confused by the Terraform plan output, especially when dealing with a large infrastructure.

Thanks for reading!! We have tried Rover tool and experimented with examples.

Reference:

https://github.com/im2nguyen/rover

In this blog, we will see a new tool called Infracost, which helps provide expected cloud cost estimates based on Terraform code. We will cover the installation and demonstrate how to use this tool.

Infracost : It provides cloud cost projections from Terraform. It enables engineers to view a detailed cost breakdown and comprehend expenses before implementions.

Requirement :

1. One window/Linux VM

2.Terraform

3.Terraform examples

Step 1 : infracost installation,

For Mac, use below brew command to do the installation,

# brew install infracost

For other Operating systems, follow below link,

https://www.infracost.io/docs/#quick-start

Step 2 : Infracost configuration,

We need to set up the Infracost API key by signing up here,

https://dashboard.infracost.io

Once logged in, visit the following URL to obtain the API key,

https://dashboard.infracost.io/org/praboosingh/settings/general

Next, open the terminal and set the key as an environment variable using the following command,

# export INFRACOST_API_KEY=XXXXXXXXXXXXX

or You can log in to the Infracost UI and grant terminal access by using the following command,

# infracost auth login

Note : Infracost will not send any cloud information to their server.

Step 3 : Infracost validation

Next, We will do the validation. For validation purpose i have cloned below github repo which contains terraform examples.

# git clone https://github.com/alfonsof/terraform-azure-examples.git

# cd terraform-azure-examples/code/01-hello-world

try infracost by using below command to get the estimated cost for a month,

# infracost breakdown --path .

To save the report in json format and upload to infracost server, use below command,

# infracost breakdown --path . --format json --out-file infracost-demo.json

# infracost upload --path infracost-demo.json

In case we plan to upgrade the infrastructure and need to understand the new cost, execute the following command to compare it with the previously saved output from the Terraform code path.

# infracost diff --path . --compare-to infracost-demo.json

Thanks for reading!! We have installed infracost and experimented with examples.

References:

https://github.com/infracost/infracost

https://www.infracost.io/docs/#quick-start

⚡ PerfCatch

Introducing PerfCatch — eBPF Per-Request Performance Monitoring for Kubernetes Applications

Measure CPU time, memory, network I/O, and duration for every HTTP request — zero code changes required.

The Problem

APM tools require SDK instrumentation, code changes, and per-host licensing. What if you could get per-request resource measurement without touching a single line of application code?

Introducing PerfCatch

PerfCatch is an open-source eBPF-based monitoring tool that measures resource consumption for every individual HTTP request hitting your Kubernetes pods.

It deploys as a DaemonSet, attaches eBPF programs to kernel TCP functions, and captures detailed per-request metrics automatically.

Zero instrumentation required. Works with any language — Python, Go, Java, Node.js, Rust, C# — because it operates at the kernel level.

What It Measures

Metric	Source	Description
`duration_ms`	accept() → tcp_close()	Total request wall-clock time
`cpu_time_ms`	sched_switch tracepoint	Actual on-CPU execution time
`memory_rss_bytes`	/proc/<pid>/status	Process RSS memory at request time
`bytes_received`	tcp_recvmsg kprobe	Network bytes received per request
`bytes_sent`	tcp_sendmsg kprobe	Network bytes sent per request
`correlation_id`	HTTP header capture	Extracted from X-Correlation-ID, traceparent, etc.
`http_method/path`	TCP stream first bytes	GET /compute extracted from request line

Key Features

Per-request granularity — CPU, memory, duration, and network per individual HTTP request
Real CPU time — actual on-CPU nanoseconds from kernel scheduler, not estimation
Correlation ID tracking — auto-extracts from 7 built-in headers + custom ones
< 1% CPU overhead — eBPF runs in kernel space with zero-copy event delivery
500+ requests/sec throughput per node
Helm chart deployment — single command with bundled Prometheus + Grafana
Pre-built Grafana dashboard — 8 panels with request table, time-series charts, and filtering
Flexible storage — in-memory ring buffer, Prometheus, VictoriaMetrics, or SQLite
Dependency tracking — captures outbound TCP connections during request processing

How It Works

PerfCatch runs 3 eBPF programs in kernel space:

request_tracker.c — tracks TCP accept → send → close lifecycle per connection
dependency_tracker.c — captures outbound calls made during request processing
resource_tracker.c — accumulates real CPU time via sched_switch tracepoint

In userspace, a Collector correlates events with K8s pod metadata, stores them in a 50K-entry ring buffer, and exposes both Prometheus histograms and a JSON query API.

Quick Start

Step 1: Deploy

helm install perfcatch charts/perfcatch \
  -n perfcatch --create-namespace \
  --set config.namespace=my-app

Step 2: Verify

kubectl -n perfcatch get pods
# NAME              READY   STATUS    RESTARTS   AGE
# perfcatch-znplr   1/1     Running   0          30s

Step 3: Send requests with correlation IDs

curl -H "X-Correlation-ID: order-12345" http://my-app/compute

Step 4: Query the API

curl localhost:9090/api/requests?correlation_id=order-12345

{
  "count": 1,
  "requests": [{
    "correlation_id": "order-12345",
    "pod_name": "sample-app-5879fb87f5-zcfbr",
    "http_method": "GET",
    "http_path": "/compute",
    "duration_ms": 9.24,
    "cpu_time_ms": 8.90,
    "memory_rss_bytes": 53518336,
    "bytes_received": 161,
    "bytes_sent": 170
  }]
}

Step 5: View in Grafana

kubectl -n monitoring port-forward svc/grafana 3000:3000
# Open http://localhost:3000 (admin / perfcatch)
# Dashboard: "PerfCatch - eBPF Request Metrics"

Deployment Options

Mode	What You Get
Full Stack (default)	Agent + Prometheus (5Gi PVC, 15d retention) + Grafana with pre-built dashboard
With VictoriaMetrics	Full stack + long-term per-request storage via Remote Write (30d retention)
Existing Prometheus	Agent only — auto-discovered via pod annotations
Standalone	No Prometheus. SQLite persistence on hostPath. Query via /api/requests

Supported Correlation Headers

PerfCatch auto-detects these headers from HTTP requests (configurable):

x-correlation-id
x-request-id
x-trace-id
traceparent
x-amzn-trace-id
request-id
correlation-id
+ any custom header via config

Why PerfCatch?

	PerfCatch	Traditional APM
Code changes	None	SDK integration required
CPU overhead	< 1%	5-15%
Language support	Any (kernel-level)	Language-specific agents
CPU time accuracy	Real on-CPU ns (sched_switch)	Wall-clock estimation
Cost	Free / open source	Per-host licensing
Deploy time	1 Helm command	Hours of SDK integration

The Problem

Introducing Podmortem

Podmortem Architecture

Key Features

⚡ Real-time pod restart detection via Kubernetes Watch API
📋 Captures previous container logs (the crashed container's output)
🔍 Records pod events at the exact moment of restart
💾 SQLite-backed searchable history — survives beyond K8s 1-hour TTL
🎯 Rich CLI with filtering by namespace, pod, time range
🗑️ Built-in purge command for housekeeping
☸️ Helm chart for one-command deployment

How It Works

Detection — Watches Kubernetes API for pod lifecycle events using the Watch API with near real-time monitoring (<1s delay)
Context Capture — Grabs restart reason, previous container logs, pod status, and environment metadata
Data Processing — Normalizes data, deduplicates, classifies root cause (OOMKill, CrashLoopBackOff, Error), aligns timestamps
Persistent Storage — Stores in SQLite with indexing for fast queries and long-term retention
Insight & Retrieval — Query restart history, build debug timelines, detect recurring failure patterns

Quick Start — Deploy with Helm

# Install to your cluster
helm install podmortem charts/podmortem \
  -n podmortem --create-namespace

# Verify it's running
kubectl get pods -n podmortem

Query Restart History

No local install needed — exec directly into the pod:

# Get pod name
POD=$(kubectl get pod -n podmortem \
  -l app.kubernetes.io/name=podmortem \
  -o jsonpath='{.items[0].metadata.name}')

# View recent restarts
kubectl exec -n podmortem $POD -- podmortem history

# Filter by namespace and pod
kubectl exec -n podmortem $POD -- podmortem history -n production -p my-app

# Full crash details with logs
kubectl exec -n podmortem $POD -- podmortem detail 1

Example Output

                Pod Restart History (3 records)
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ ID ┃ Timestamp           ┃ Namespace  ┃ Pod         ┃ Reason  ┃ Exit    ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│  3 │ 2026-05-22T14:08:53 │ clares-ns  │ clares-pod  │ OOMKill │    137  │
│  2 │ 2026-05-22T14:03:46 │ clares-ns  │ clares-pod  │ OOMKill │    137  │
│  1 │ 2026-05-22T13:58:41 │ clares-ns  │ clares-pod  │ OOMKill │    137  │
└────┴─────────────────────┴────────────┴─────────────┴─────────┴─────────┘

Housekeeping with Purge

# Delete records older than a date
kubectl exec -n podmortem $POD -- podmortem purge --before "2026-05-01T00:00:00" -y

# Delete by namespace
kubectl exec -n podmortem $POD -- podmortem purge -n staging -y

# Wipe everything
kubectl exec -n podmortem $POD -- podmortem purge --all -y

Helm Configuration

Parameter	Default	Description
`watchNamespace`	`""` (all)	Namespace to watch
`persistence.enabled`	`true`	Enable PVC for SQLite
`persistence.size`	`1Gi`	Storage size
`resources.limits.memory`	`256Mi`	Memory limit
`verbose`	`true`	Debug logging

Why Podmortem?

Without Podmortem	With Podmortem
Events expire after ~1 hour	Permanent searchable history
Previous logs lost on next restart	Logs captured at crash time
Manual `kubectl describe` per pod	Aggregated view across all pods
No pattern visibility	Detect recurring failures

Get It

🔗 GitHub: github.com/DevOpsArts/podmortem
🐳 Docker Hub: devopsart1/podmortem

Built by DevOpsArt — because every pod crash tells a story.

Ever missed a certificate renewal and had a production outage? Or forgot to renew a software license until it was too late? I built CLARES to make sure that never happens again.

What is CLARES?

The Problem

In any enterprise environment, you're juggling dozens (or hundreds) of:

🔒 SSL/TLS Certificates — expiring silently until your website goes down
📜 Software Licenses — renewal dates buried in procurement emails
📋 Compliance Certificates — audit deadlines that sneak up on you
🔑 API Keys, Tokens, Secrets — rotating credentials on schedule

How It Works

Tech Stack

Layer	Technology
Frontend	React 18, Vite 5, React Router v6
Backend	Node.js 20, Express 4
Database	PostgreSQL
Auth	JWT + bcrypt
Email	Nodemailer (configurable SMTP)
Deployment	Docker (multi-arch), Helm, Kubernetes

Architecture

🎨 React 18
Vite + Router v6

→

⚙️ Express API
Node.js + JWT

→

🗃️ PostgreSQL
Database

Key Features

📊 Smart Dashboard

Items are automatically grouped by urgency. No more scanning through spreadsheets — you instantly see what needs attention. Summary cards give you counts per catalog type at a glance.

📁 Flexible Catalogs

📤 Bulk CSV Upload

Download a CSV template, fill it in, and upload up to 500 rows at once. Perfect for initial data migration or when you inherit a spreadsheet full of renewal dates.

👥 Granular User Permissions

This was one of the trickier features. The permission model has two layers:

Global role: Admin (everything) or Viewer (read-only)
Per-catalog role: No Access, View, or Admin for each individual catalog

A user can be a global Viewer but have Admin rights on specific catalogs. This means you can delegate management of SSL certificates to the infra team without giving them access to license data.

📧 Smart Email Reminders

The system calculates exact reminder dates by evenly spacing the repeat count within the reminder window. For example:

SSL cert expires June 10 · Remind 30 days before · Repeat 3 times

Reminder 1 → May 11 (30 days before)
Reminder 2 → May 21 (20 days before)
Reminder 3 → May 31 (10 days before)

Emails include the reminder number (e.g. "reminder 2 of 3") and a color-coded status — red for ≤7 days, amber for ≤14, green for 14+.

⏰ Automatic Daily Scheduler

No more relying on someone clicking "Send Reminders" manually. Enable the automatic reminder scheduler from Admin Settings, pick an hour (0–23, server time), and CLARES handles the rest:

Background check runs every 60 seconds
Fires once per day at the configured hour
A reminder_logs table tracks which reminder number has been sent per item — no duplicate emails
If the server was down on a reminder date, it catches up automatically on the next run
All activity logged in pod logs for observability

🔒 Security

Passwords hashed with bcrypt (12 rounds)
JWT tokens with configurable expiry (default 8 hours)
Role-based access control on every API endpoint
Case-insensitive login
Inactive account detection with clear error messaging
Sessions in sessionStorage — cleared on tab close

Deployment: Docker + Helm

The app is containerized with a multi-stage Dockerfile and supports multi-architecture builds (amd64/arm64). For Kubernetes deployment, there's a complete Helm chart with:

Deployment, Service, ConfigMap, Secret, and optional Ingress templates
Separate values files for minikube (local dev) and production
PostgreSQL deployed via the Bitnami Helm chart
One-command database initialization: kubectl exec deployment/clares -n clares -- node server/setup.js

The setup script is idempotent — it creates tables only if they don't exist and seeds a default admin user when the users table is empty. Safe to run multiple times.

Quick Start

# Clone the repo
git clone https://github.com/DevOpsArts/clares.git
cd clares

# Option 1: Local development
npm install
cp .env.example .env   # Edit with your DB credentials
npm run setup-db
npm run dev             # Frontend → :5174, API → :3002

# Option 2: Kubernetes with Helm
helm install clares-postgres bitnami/postgresql \
  --set auth.database=clares --set auth.username=clares \
  --namespace clares --create-namespace

helm install clares ./helm/clares-engine \
  -f ./helm/clares-engine/values-minikube.yaml \
  --namespace clares

kubectl exec deployment/clares -n clares -- node server/setup.js

# Login with admin / admin

Try It Out

CLARES is open source on GitHub: github.com/DevOpsArts/clares

Check out the project page: devopsarts.github.io/clares

If you're managing renewal dates in spreadsheets, give CLARES a try. It takes under 5 minutes to deploy and the default admin account is ready out of the box.

Built with React 18, Node.js, Express, PostgreSQL, Docker, and Helm. Deployed on Kubernetes.

The only Kubernetes log agent with intelligent error context capture, rule-based alerting, and 9 pluggable storage backends.

The Problem: Finding the Needle in the Log Haystack

What if your log agent was smart enough to capture only what matters—the error AND the context around it—and alert you instantly?

Introducing Logsenta

🔍 Detects errors using configurable regex patterns
⏪ Captures context – logs BEFORE and AFTER the error
🚨 Alerts intelligently – route different errors to different teams
💾 Stores smartly – choose from 9 storage backends
⚡ Scales efficiently – handles 500+ pods with minimal resources

Key Features

🎯 Smart Error Detection

Logsenta uses regex and string-based pattern matching to detect errors across multiple languages and frameworks:

errorPatterns:
  - "ERROR"
  - "Exception"
  - "FATAL"
  - "panic:"
  - "Traceback"
  - "OOMKilled"
  - "CrashLoopBackOff"
  
  These patterns are "fully customizable"

🚨 Rule-Based Alerting NEW

Route different error patterns to different teams with customizable thresholds:

alerting:
  enabled: true
  rules:
    # Critical errors → On-call team immediately
    - name: "critical-errors"
      patterns: ["CRITICAL", "FATAL", "OOMKilled", "panic:"]
      threshold:
        count: 1          # Alert on FIRST occurrence
        windowSeconds: 60
      email:
        enabled: true
        toAddresses: ["oncall@company.com"]

    # Java exceptions → Backend team (after 2 occurrences)
    - name: "java-exceptions"  
      patterns: ["NullPointerException", "OutOfMemoryError"]
      threshold:
        count: 2          # Alert after 2 occurrences
        windowSeconds: 300
      email:
        enabled: true
        toAddresses: ["backend-team@company.com"]

    # Python errors → Data team
    - name: "python-errors"
      patterns: ["Traceback", "TypeError", "ValueError"]
      threshold:
        count: 1
      email:
        enabled: true
        toAddresses: ["data-team@company.com"]

Why rule-based alerting matters:

🎯 Reduce alert fatigue – only alert relevant teams
⏱️ Smart thresholds – distinguish between flaky tests and real outages
📧 Multiple channels – email and webhooks (Slack, PagerDuty, etc.)
🔄 Context included – alerts contain the actual error with surrounding logs

📦 9 Storage Backends

One agent, any storage destination:

Backend	Use Case
PostgreSQL	Relational queries, SQL analysis
MongoDB	Flexible document storage
Elasticsearch	Full-text search, Kibana dashboards
Azure Log Analytics	Azure ecosystem, KQL queries
AWS CloudWatch	AWS ecosystem, CloudWatch Insights
GCP Cloud Logging	Google Cloud ecosystem

🔄 Context Capture Window

Configure how much context to capture around errors:

captureWindow:
  bufferDurationMinutes: 2  # Lines BEFORE error
  captureAfterMinutes: 2    # Lines AFTER error

This means when an error occurs, you get the full story—not just the error line.

Quick Start

Deploy Logsenta in under 2 minutes:

# Clone the repository
git clone https://github.com/DevOpsArts/logsenta.git
cd logsenta

# Install with Helm (PostgreSQL backend + alerting)
helm install logsenta-engine ./charts/logsenta-engine \
  --namespace logsenta \
  --create-namespace \
  --set storage.type=postgresql \
  --set connections.postgresql.host=your-db-host \
  --set connections.postgresql.username=logsenta \
  --set connections.postgresql.password=YOUR_PASSWORD \
  --set alerting.enabled=true \
  --set alerting.email.smtpHost=smtp.company.com

Architecture

┌─────────────────────────────────────────────┐
│           Kubernetes Cluster                │
│                                             │
│  ┌─────┐  ┌─────┐  ┌─────┐                 │
│  │Pod A│  │Pod B│  │Pod C│  ← Monitored    │
│  └──┬──┘  └──┬──┘  └──┬──┘                 │
│     └────────┼────────┘                     │
│              ▼                              │
│     ┌────────────────────┐                  │
│     │   Logsenta-Engine  │                  │
│     │  • Error Detection │                  │
│     │  • Context Capture │                  │
│     │  • Rule-Based Alert│ ──► 📧 Email     │
│     │  • Rolling Buffer  │ ──► 🔗 Webhook   │
│     └─────────┬──────────┘                  │
└───────────────┼─────────────────────────────┘
                ▼
       ┌────────────────┐
       │ Storage Backend│
       │ (Your Choice)  │
       └────────────────┘

Production-Ready Features

🔒 Security: Non-root container, read-only filesystem, seccomp profiles
🔄 High Availability: Leader election for multi-replica deployments
📊 Scalability: ThreadPoolExecutor handles 500+ pods efficiently
🔐 SSL/TLS: Secure connections to all database backends
🛡️ Network Policies: Built-in network isolation templates
🚨 Smart Alerting: Rule-based routing with thresholds and cooldowns

Get Started Today

Logsenta is open-source and free to use. Check out the resources below:

📦 GitHub: github.com/DevOpsArts/logsenta
📖 Documentation: Wiki Documentation
🚨 Alerting Guide: Alerting Configuration
🐳 Docker Image: devopsart1/logsenta-engine

Have questions or feedback? Drop a comment below or open an issue on GitHub!

Tags: Kubernetes, DevOps, SRE, Logging, Monitoring, Alerting, Azure, AWS, GCP, Helm, Open Source

In Part 1, We have covered how to setup Grafana Loki and Grafana Agent to view Kubernetes pod logs

In Part 2, We have covered how to configure Grafana Agent on Windows VM and export application logs to Grafana Loki.

In this Part 3, We will see how to export Azure PAAS service logs to Grafana loki and view it from Grafana Dashboard.

Requirement:

Grafana loki
Azure eventhub
Azure AKS or any PAAS which is having option with "Diagnostics Settings"

Step 8: Create Azure Eventhub namespace,

Go to the Azure Portal and create the Event Hub namespace with one Event Hub. (Currently, we are going to use Azure AKS, so we will create one Event Hub named "aks" under the Event Hub namespace)

Step 9: Configure Azure AKS to send logs to Azure Eventhub,

Go to Azure AKS, in the side blade select "Diagnostic Settings", and choose "Add Diagnostic Setting".

Then, in the new page, select which logs need to be sent to the Event Hub and choose "Stream to an Event Hub". Here, provide the newly created Event Hub namespace and Event Hub.

Step 10: Configure Grafana Agent to scrap the messages from Azure eventhub,

Next, We need to pull the data from Azure eventhub and push it to Grafana loki,

In our existing grafana-agent-values.yaml add below lines to pull the messages from Azure eventhub and redeploy grafana agent in AKS.

Here is the reference github url and below is the yaml.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/grafana-agent-values-azure-aks.yaml

loki.source.azure_event_hubs "azure_aks" {

fully_qualified_namespace = "==XXX Eventhub namespace hostname XX===:9093"

event_hubs = ["aks"]

forward_to = [loki.write.local.receiver]

labels = {

"job" = "azure_aks",

}

authentication {

mechanism = "connection_string"

connection_string = " ===XXX Eventhub connection String XX==="

}

Replace the correct value for the above RED color. We can add multiple Event hubs in the Grafana agent by providing different Job names for each Azure PAAS.

Note : Make sure the communication is established between Azure AKS and Azure Eventhub to send the messages on port 9093.

Redeploy grafana agent in AKS using below command,

helm install --values grafana-agent-values-azure-aks.yaml grafana-agent grafana/grafana-agent -n observability

Check all the Grafana agent pods are up and running using below command,

kubectl get all -n observability

Now, the Grafana agent will pull the messages from Azure Event Hub and push them to Grafana Loki for Azure AKS, which is configured to send the logs in Diagnostic Settings.

We can verify the status of message processing from Azure Event Hub, including the status of incoming and outgoing messages.

Step 11: Access Azure AKS logs in Grafana dashboard,

Go to Grafan Dashboard, Home > Explore > Select Loki Datasource

In the filter section, select "Job" and value as the job name which is given in the grafana-agent-values-azure-aks.yaml. In our case the job name is "azure_aks"

Thats all, We have successfully deployed centralized logging with Grafana Loki, Grafana Agent for Kubernetes, VM application and Azure PAAS.

In Part 1, We covered how to setup Grafana Loki and Grafana Agent to view Kubernetes pod logs

In Part 2, We will explore how to configure Grafana Agent on VM and export application logs to Grafana Loki.

Requirement:

Grafana Loki
Grafana agent
Windows VM with one application
Grafana Dashboard

Step 6: Install Grafana Agent in Windows VM,

Download latest Windows Grafana agent from this location,

Windows: https://github.com/grafana/agent/releases/download/v0.40.2/grafana-agent-installer.exe.zip

For other operating system refer here,

https://github.com/grafana/agent/releases

Next double click the downloaded exe and install it, by default in windows the installer path is,

C:\Program Files\Grafana Agent

Once installation is completed, We need to update the configuration based on our needs like which application logs we need to send to Grafana loki.

In our case, we installed Grafana Dashboard in the windows VM and configured the Grafana dashboard logs in Grafana agent.

Similarly, we can add multiple application with different Job names.

Copy the grafana agent config file from below repo and update the required changes according on your needs.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/agent-config.yaml

Next start the grafana agent service from services.msc

We can start manually by below command in command prompt as well.

In command Prompt go to, C:\Program Files\Grafana Agent

Execute below command,

grafana-agent-windows-amd64.exe --config.file=agent-config.yaml

This will help to find any issue with the configuration.

Note : Here the Grafana loki distributed service endpoint(which is configured in the agent-config.yaml) should be accessible from the windows VM

Step 7 : Access VM application logs in Grafana Loki,

Go to Grafana Dashboard > Home > Explore > Select Loki Datasource

In the filter section, select "Job" and value as the job name which is given in the agent-config.yaml. In our case the job name is "devopsart-vm"

Now We are able to view the Grafana Dashboard logs in Grafana Loki. You can create the Dashboard from here based on your preference.

In Part 2, We covered how to export Windows VM application logs to Grafana Loki and how to view them from the Grafana Dashboard.

In Part 3, We will cover how to export Azure PAAS services logs to Grafana Loki

Grafana Loki : It is a highly scalable log aggregation system designed for cloud-native environments

Grafana Agent : It is an observability agent that collects metrics and logs from various application for visualization and analysis in Grafana

Requirement:

Kuberentes Cluster (Latest version)
Helm
Azure PAAS(Eventhub and AKS, IOT, etc)
VM with any application
Azure storage account(Loki backend)
Azure subscription with admin privileges

Step 1: Deploy Grafana loki in Kubernetes(K8s) cluster,

Ensure you have admin permission for the k8s cluster.

Before deploying Grafana Loki in k8s cluster, there are certain changes are required in the configuration.

Note : We are going to use backend as Azure Storage container in loki to store the logs and we will use Loki distributed version.

Execute below helm commands to add the Grafana repository,

helm repo add grafana https://grafana.github.io/helm-charts

helm repo update

Execute below command to export the Grafana Loki and Grafana agent configuration via helm,

helm show values grafana/loki-distributed > loki-values.yaml

helm show values grafana/grafana-agent > grafana-agent-values.yaml

In the loki-values.yaml, update the below configuration to use Azure storage account as backend.

schemaConfig:

configs:

- from: "2020-09-07"

store: boltdb-shipper

object_store: azure

schema: v11

index:

prefix: index_

period: 24h

storageConfig:

boltdb_shipper:

shared_store: azure

active_index_directory: /var/loki/index

cache_location: /var/loki/cache

cache_ttl: 1h

filesystem:

directory: /var/loki/chunks

azure:

account_name: === Azure Storage name ===

account_key: === Azure Storage access key ===

container_name: === Container Name ===

request_timeout: 0

Here is the loki-values.yaml.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/loki-distributed-values.yaml

Next deploy Grafana loki,

Execute below command to deploy Loki in the k8s cluster,

helm upgrade --install --values loki-distributed-values.yaml loki grafana/loki-distributed -n observability --create-namespace

Verify the pods are up and running by using below command,

kubectl get all -n observability

Now all the pods are up and running.

Step 2: Deploy Grafana agent in K8s cluster,

Deploy Grafana Agent in k8s cluster to export the k8s Pod logs to Loki

Update the grafana agent values before deploying.

Replace the grafana-agent-values.yaml file with loki distributed gateway service endpoint in line number 169 in the below file with your namespace. Currently observability is used.

https://github.com/DevOpsArts/grafana_loki_agent/blob/main/grafana-agent-values.yaml

loki.write "local" {

endpoint {

url = "http://loki-loki-distributed-gateway.observability.svc.cluster.local/loki/api/v1/push"

}

In the grafana-agent-values.yaml, Currently it is added to export k8s pod logs, k8s events, etc.

Next deploy Grafana-Agent using below command,

helm install --values grafana-agent-values.yaml grafana-agent grafana/grafana-agent -n observability

Verify the pods are up and running by using below command,

kubectl get all -n observability

Now you can go to azure storage account which is configured in Grafana Loki and verify the logs are getting updated to the respective container.

Step 3: Deploy Grafana, to view the pod logs

Execute below command to install Grafana in K8s cluster,

helm install grafana grafana/grafana -n observability

Verify the pods are up and running by using below command,

kubectl get all -n observability

use below command to get the password for Grafana,

kubectl get secret --namespace observability grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Step 4: Access Grafana and view the pod logs

Use port forward or use NodePort or Ingress to access the Grafana Dashboard.

- Configure loki as Datasource in Grafana,

Once login to the Grafana, Go to Home > Connections > Search for "Loki", then select it.

Next give connection url as,

http://loki-loki-distributed-gateway.monitoring1.svc.cluster.local

then select save and test.

- Next import Loki dashboard,

Go to Grafana Home > Dashboard > select New and Import.

Use this below Grafana template ID "15141" , load it and save it.

https://grafana.com/grafana/dashboards/15141-kubernetes-service-logs/?source=post_page-----d2847d526f9e--------------------------------

Next click the dashboard, You can able to view all the pod logs now.

Step 5: View the Kubernetes(K8s) Events in Grafana,

To view the Kubernetes events, Go to Grafana Home page > Explore > Select "Datasource name" > Select "Job" and value as "loki.source.kubernetes_events"

In the next Part 2, We will explain how to configure Grafana Agent for VM applications and send the logs to the same Grafana Loki.

Part 3 : We will cover how to export Azure PAAS services logs to Grafana Loki

In this blog, we will explore a new tool called 'Rover,' which helps to visualize the Terraform plan

Rover : This open-source tool is designed to visualize Terraform Plan output, offering insights into infrastructure and its dependencies.

We will use the "Rover" docker image, to do our setup and visualize the infra.

Requirements:

1.Linux/Windows VM

2. Docker

Steps 1 : Generate terraform plan output

I have a sample Azure terraform block in devopsart folder, will generate terraform plan output from there and store is locally.

cd devopsart

terraform plan -out tfplan.out

terraform show -json tfplan.out > tfplan.json

Now both the files are generated.

Step 2 : Run Rover tool locally,

Execute below docker command to run rover from the same step 1 path,

docker run --rm -it -p 9000:9000 -v $(pwd)/tfplan.json:/src/tfplan.json im2nguyen/rover:latest -planJSONPath=tfplan.json

Its run the webUI in port number 9000.

Step 3 : Accessing Rover WebUI,

Lets access the WebUI and check it,

Go to browser, and enter http://localhost:9000

In the UI, color codes on the left side provide assistance in understanding the actions that will take place for the resources when running terraform apply.

When a specific resource is selected from the image, it will provide the name and parameter information.

Additionally, the image can be saved locally by clicking the 'Save' option

I hope this is helpful for someone who is genuinely confused by the Terraform plan output, especially when dealing with a large infrastructure.

Thanks for reading!! We have tried Rover tool and experimented with examples.

Reference:

https://github.com/im2nguyen/rover

In this blog, we will see a new tool called Infracost, which helps provide expected cloud cost estimates based on Terraform code. We will cover the installation and demonstrate how to use this tool.

Infracost : It provides cloud cost projections from Terraform. It enables engineers to view a detailed cost breakdown and comprehend expenses before implementions.

Requirement :

1. One window/Linux VM

2.Terraform

3.Terraform examples

Step 1 : infracost installation,

For Mac, use below brew command to do the installation,

# brew install infracost

For other Operating systems, follow below link,

https://www.infracost.io/docs/#quick-start

Step 2 : Infracost configuration,

We need to set up the Infracost API key by signing up here,

https://dashboard.infracost.io

Once logged in, visit the following URL to obtain the API key,

https://dashboard.infracost.io/org/praboosingh/settings/general

Next, open the terminal and set the key as an environment variable using the following command,

# export INFRACOST_API_KEY=XXXXXXXXXXXXX

or You can log in to the Infracost UI and grant terminal access by using the following command,

# infracost auth login

Note : Infracost will not send any cloud information to their server.

Step 3 : Infracost validation

Next, We will do the validation. For validation purpose i have cloned below github repo which contains terraform examples.

# git clone https://github.com/alfonsof/terraform-azure-examples.git

# cd terraform-azure-examples/code/01-hello-world

try infracost by using below command to get the estimated cost for a month,

# infracost breakdown --path .

To save the report in json format and upload to infracost server, use below command,

# infracost breakdown --path . --format json --out-file infracost-demo.json

# infracost upload --path infracost-demo.json

In case we plan to upgrade the infrastructure and need to understand the new cost, execute the following command to compare it with the previously saved output from the Terraform code path.

# infracost diff --path . --compare-to infracost-demo.json

Thanks for reading!! We have installed infracost and experimented with examples.

References:

https://github.com/infracost/infracost

https://www.infracost.io/docs/#quick-start

perfcatch

PerfCatch - eBPF Per-Request Performance Monitoring for Kubernetes Applications

Prabhu Raja Singh May 24, 2026

⚡ PerfCatch Introducing PerfCatch — eBPF Per-Request Performance Monitoring f…

podmortem

Podmortem - Pod Restart Root Cause Logger | Never Lose a Pod Restart Root Cause Again

Prabhu Raja Singh May 22, 2026

The Problem When a Kubernetes pod crashes and restarts, the clock starts ticki…

CLARES

CLARES - Compliance License & Asset Reminder Engine System Overview

Prabhu Raja Singh May 04, 2026

Ever missed a certificate renewal and had a production outage? Or …

logsenta

Logsenta - K8s log agent with intelligent error context capture and multi-pluggable Storage

Prabhu Raja Singh April 07, 2026

The only Kubernetes log agent with intelligent error context capture, …

Centralized logging

Steps to set up centralized logging with Grafana Loki and Grafana Agent for Kubernetes, VM Applications, and Azure services - Part 3

Prabhu Raja Singh March 08, 2024

In Part 1 , We have covered how to setup Grafana Loki and Grafana Agent to view…

Centralized logging

Steps to set up centralized logging with Grafana Loki and Grafana Agent for Kubernetes, VM Applications, and Azure services - Part 2

Prabhu Raja Singh March 08, 2024

In Part 1 , We covered how to setup Grafana Loki and Grafana Agent to view Kub…

Centralized logging

Steps to set up centralized logging with Grafana Loki and Grafana Agent for Kubernetes, VM Applications, and Azure services - Part1

Prabhu Raja Singh March 08, 2024

Dealing with multiple tools for capturing application logs from different sourc…

Rover

Rover - An Open Source Terraform Visualizer Tool

Prabhu Raja Singh November 27, 2023

In this blog, we will explore a new tool called 'Rover,' which helps to…

infracost

Infracost - It reveals the expected cloud costs from Terraform script

Prabhu Raja Singh October 24, 2023

In this blog, we will see a new tool called Infracost, which helps provide expe…

Featured Posts

Blog Archive

Categories

Header Ads

Categories

Popular Posts

Tags Clouds

Introducing PerfCatch — eBPF Per-Request Performance Monitoring for Kubernetes Applications

The Problem

Introducing PerfCatch

What It Measures

Key Features

How It Works

Quick Start

Step 1: Deploy

Step 2: Verify

Step 3: Send requests with correlation IDs

Step 4: Query the API

Step 5: View in Grafana

Deployment Options

Supported Correlation Headers

Why PerfCatch?

Links

The Problem

Introducing Podmortem

Key Features

How It Works

Quick Start — Deploy with Helm

Query Restart History

Example Output

Housekeeping with Purge

Helm Configuration

Why Podmortem?

Get It

What is CLARES?

The Problem

How It Works

Tech Stack

Architecture

Key Features

📊 Smart Dashboard

📁 Flexible Catalogs

📤 Bulk CSV Upload

👥 Granular User Permissions

📧 Smart Email Reminders

⏰ Automatic Daily Scheduler

🔒 Security

Deployment: Docker + Helm

Quick Start

Try It Out

The Problem: Finding the Needle in the Log Haystack

Introducing Logsenta

Key Features

🎯 Smart Error Detection

🚨 Rule-Based Alerting NEW

📦 9 Storage Backends

🔄 Context Capture Window

Quick Start

Architecture

Production-Ready Features

Get Started Today

Introducing PerfCatch — eBPF Per-Request Performance Monitoring for Kubernetes Applications

The Problem

Introducing PerfCatch

What It Measures

Key Features

How It Works

Quick Start

Step 1: Deploy

Step 2: Verify

Step 3: Send requests with correlation IDs

Step 4: Query the API

Step 5: View in Grafana

Deployment Options

Supported Correlation Headers

Why PerfCatch?

Links

The Problem

Introducing Podmortem

Key Features