ExecSearch observability hub

Quick links

Dashboards, alerts, and Explore (metrics, logs, traces).

Open Grafana

Targets, ad-hoc queries, and scrape health.

Open Prometheus

Log store HTTP API (often used via Grafana Explore).

Open Loki

Distributed tracing backend (query HTTP; traces in Grafana).

Open Tempo

Prometheus-format self-metrics on this port (pipelines, receivers).

View metrics

Per-node container resource usage and health.

Open cAdvisor

Experiment tracking UI for LLM / ML runs from the app.

Open MLflow

Grafana: Primary operator UI: ExecSearch dashboards, datasource wiring to Prometheus, Loki, and Tempo, and ad-hoc analysis.
Prometheus: Time-series metrics database. Scrapes the OpenTelemetry Collector, Postgres exporter, cAdvisor, and Kubernetes targets defined in repo config.
Loki: Log aggregation. Promtail on each node ships container logs here for correlation with metrics and traces in Grafana.
Tempo: Trace backend. The .NET API and worker send OTLP spans via the collector; Tempo stores them for Grafana trace views.
OpenTelemetry Collector: Ingests OTLP from applications (gRPC/HTTP), exports to Tempo and Prometheus, and exposes its own Prometheus metrics for pipeline health.
cAdvisor: Container metrics per node (CPU, memory, filesystem). Prometheus scrapes these for infra dashboards.
MLflow: Tracking server for experiments, parameters, and metrics from optional MLflow client usage in the platform.
Postgres exporter in-cluster only: Exposes PostgreSQL health and stats to Prometheus at postgres-exporter.observability.svc.cluster.local:9187. No public ingress.
Promtail in-cluster only: DaemonSet that tails node container logs and pushes to Loki. Health: promtail.observability.svc.cluster.local:9080.