Summary
This postmortem analyzes a recurring issue in OpenTelemetry‑based metric pipelines: the loss of metric metadata (unit, value type, instrument type) when data reaches common FOSS storage backends. Although OTLP defines rich semantic metadata, many backends silently discard it because they inherit the Prometheus data model, which historically does not preserve these attributes. The result is a system that appears to support semantic metrics but behaves like a legacy time‑series store once data is ingested.
Root Cause
The root cause is the mismatch between the OpenTelemetry metric data model and the Prometheus‑style storage model used by most open‑source backends.
Key factors:
- Prometheus TSDB does not store metric metadata such as:
- instrument type (counter, gauge, histogram)
- unit
- description
- aggregation temporality
- Many “OpenTelemetry‑native” backends internally convert OTLP → Prometheus format, losing metadata in the process.
- Visualization tools (Grafana, etc.) rely on PromQL, which has no concept of units or instrument types.
- Storage engines optimized for cardinality and compression often treat metadata as non‑essential and drop it.
Why This Happens in Real Systems
Real production systems tend to lose metadata because:
- Historical inertia: Prometheus became the de‑facto standard long before OTEL metrics existed.
- Ecosystem gravity: Grafana, Alertmanager, Thanos, Mimir, VictoriaMetrics all assume Prometheus semantics.
- Performance trade‑offs: Storing metadata per‑series increases cardinality and storage overhead.
- Query language limitations: PromQL cannot express metadata‑aware queries, so backends don’t bother storing it.
- Vendor priorities: Many “OTEL‑native” products focus on traces first, metrics second.
Real-World Impact
Losing metric metadata causes subtle but serious operational issues:
- Incorrect dashboards due to missing units (e.g., bytes vs. milliseconds)
- Broken aggregations when counters are treated as gauges
- Downsampling errors because temporality (delta vs. cumulative) is lost
- Harder debugging since engineers cannot see the original instrument type
- Inconsistent naming conventions because metadata is encoded into metric names instead of stored structurally
These issues accumulate into operational drag and analysis errors that senior engineers recognize immediately.
Example or Code (if necessary and relevant)
Below is a minimal OTLP metric export example showing metadata that is often discarded by backends:
meter := otel.Meter("service")
counter, _ := meter.Int64Counter("requests_total",
instrument.WithUnit("requests"),
instrument.WithDescription("Total number of requests"),
)
counter.Add(ctx, 1)
This code emits:
- instrument type: counter
- unit: requests
- description: Total number of requests
- temporality: delta
Most Prometheus‑based backends drop all of this except the numeric value.
How Senior Engineers Fix It
Experienced engineers approach this problem with a combination of architecture choices and tooling discipline:
- Choose a backend that preserves OTEL metadata, such as:
- ClickHouse‑based OTEL metric schemas (custom schema, not Prometheus‑compat mode)
- TimescaleDB with OTEL schema extensions
- OpenTelemetry Collector → Arrow/Parquet → Data Lake (metadata preserved in schema)
- Avoid Prometheus‑compat ingestion paths when using OTLP metrics.
- Store metadata in a side‑channel (e.g., a metadata table keyed by metric name).
- Use OTEL semantic conventions rigorously to reduce ambiguity.
- Adopt query engines that understand OTEL metadata, such as:
- Apache Arrow Flight SQL
- DuckDB with Parquet OTLP exports
- Push vendors to support OTEL’s full metric model instead of Prometheus emulation.
The key insight: you must choose a storage engine that is not secretly Prometheus under the hood.
Why Juniors Miss It
Junior engineers often overlook this issue because:
- They assume OTEL support = full OTEL semantics, which is rarely true.
- They rely on Grafana dashboards, which hide metadata loss.
- They are accustomed to Prometheus conventions, not OTEL’s richer model.
- They do not inspect raw OTLP payloads, so they never notice metadata disappearing.
- They trust backend documentation that says “OTEL‑native” without verifying the ingestion path.
The result is a silent failure mode: everything “works,” but the system is fundamentally lossy.