Quick links
OpenTelemetry Collector
Prometheus-format self-metrics on this port (pipelines, receivers).
View metricsWhat each component does
- Grafana
- Primary operator UI: ExecSearch dashboards, datasource wiring to Prometheus, Loki, and Tempo, and ad-hoc analysis.
- Prometheus
- Time-series metrics database. Scrapes the OpenTelemetry Collector, Postgres exporter, cAdvisor, and Kubernetes targets defined in repo config.
- Loki
- Log aggregation. Promtail on each node ships container logs here for correlation with metrics and traces in Grafana.
- Tempo
- Trace backend. The .NET API and worker send OTLP spans via the collector; Tempo stores them for Grafana trace views.
- OpenTelemetry Collector
- Ingests OTLP from applications (gRPC/HTTP), exports to Tempo and Prometheus, and exposes its own Prometheus metrics for pipeline health.
- cAdvisor
- Container metrics per node (CPU, memory, filesystem). Prometheus scrapes these for infra dashboards.
- MLflow
- Tracking server for experiments, parameters, and metrics from optional MLflow client usage in the platform.
- Postgres exporter in-cluster only
- Exposes PostgreSQL health and stats to Prometheus at
postgres-exporter.observability.svc.cluster.local:9187. No public ingress. - Promtail in-cluster only
- DaemonSet that tails node container logs and pushes to Loki. Health:
promtail.observability.svc.cluster.local:9080.