Logging Model Calls

Logging Model Calls:
Observability in AI

👨‍💻

AI Dev Team

Full-Stack AI Instructors

"If you ship an LLM feature to production without tracing tokens and latency, you aren't flying blind—you're flying a rocket blindfolded."

Why Standard Logging Fails

In traditional web apps, a successful 200 OK response is usually enough to know the system is working. AI applications are different. A model might return a 200 status code, but the response could be hallucinated, take 10 seconds to generate, or consume $0.05 worth of tokens in a single click.

Relying on simple text logs makes it impossible to query this data. We must adopt Structured Logging (typically JSON) to index these variables.

Core Telemetry Metrics

Latency: Measured in milliseconds. LLMs suffer from high time-to-first-token (TTFT). Tracking latency helps you realize when to switch from a heavy model (GPT-4) to a faster one (GPT-4o-mini).
Token Usage: Both prompt_tokens and completion_tokens. This is your variable cost. If users find a way to bloat the context window, your AWS/OpenAI bill will skyrocket.
Model ID: E.g., claude-3-opus or llama-3. Essential for A/B testing which model yields better user engagement.
Chain ID: A unique identifier if the request is part of a multi-agent or RAG (Retrieval-Augmented Generation) chain.

PII and Data Privacy

One massive risk with AI logging is inadvertently writing Personally Identifiable Information (PII) into your log aggregators (like Datadog, New Relic, or AWS CloudWatch). If users paste credit cards or medical data into your chat interface, and you blindly log the prompt, you violate compliance. Use middleware to redact sensitive strings before the log object is serialized.

❓ Frequently Asked Questions: AI Observability

What is LLM Observability?

LLM Observability refers to the tools and practices used to monitor, debug, and evaluate large language models in production. It goes beyond standard uptime monitoring by tracking prompt inputs, model outputs, token consumption, latency, and hallucination rates.

How do I track OpenAI API costs per user?

To track per-user costs, pass a unique user identifier string in your API request to OpenAI. More importantly, build a structured log on your server that records the user's ID alongside the usage.total_tokens returned by the API response, allowing you to run database queries aggregating token usage by user.

Should I use tools like LangSmith or W&B?

Yes, if you are building complex systems like Agentic workflows or RAG. Custom JSON logging is sufficient for simple chat endpoints, but dedicated AI observability platforms visually trace the entire execution graph, making it much easier to debug which step of a multi-prompt chain failed.

Logging AI Calls

Trace Matrix

Concept: Structured Logging

System Check

Telemetry Operations

Community Dashboard

Discuss Observability Tools