Building a Real-Time LLM Observability Dashboard
Observability for LLM applications differs fundamentally from observability for deterministic services. Latency is higher and more variable. Token usage is the primary cost driver, not compute time. Output quality is probabilistic and difficult to measure automatically. This post covers the design of an effective LLM observability system that handles all three dimensions.
The Four Observability Dimensions
A complete LLM observability system needs to track four things: token usage (prompt and completion separately), request latency (time to first token and total generation time), cost attribution (by model, team, feature, and tenant), and error rates (by provider, model, and error type).
Token Tracking Architecture
Token counts need to be captured at the gateway layer, not inferred from the application layer. Provider token counts can differ from client-side estimates by 5-15% depending on the tokenizer. The gateway receives the authoritative token count in the response and should persist it immediately before returning the response to the caller.
Cost Attribution
Effective cost attribution requires tagging every request with at minimum three dimensions: the calling team or service, the product feature, and where relevant the end customer tenant. These tags flow through to a cost aggregation service that produces the per-dimension cost views displayed in the dashboard.
Anomaly Detection
The most practical anomaly detection for LLM costs is a simple rolling 24-hour cost comparison against the same period last week. Spikes above a configurable threshold trigger alerts before month-end surprises become budget conversations.