Agent Observability with OpenTelemetry
This project includes production-ready OpenTelemetry observability that provides consistent behavior across local development and deployed environments. The implementation automatically instruments LLM calls and application logs with minimal configuration while coexisting with ADK's internal telemetry infrastructure.
What's Instrumented
- LLM Operations: Google Generative AI SDK calls with request/response details
- Structured Logging: JSON logs with automatic trace correlation for Google Cloud Logging
- Agent Callbacks: Lifecycle logging for agent start/end, model calls, and tool invocations
Key Features
- Consistent Setup: Single
setup_opentelemetry()function used across all environments (local and deployed) - Instance-Level Tracking: Unique
SERVICE_INSTANCE_IDper process (PID + UUID) for collision-free identification - Environment Grouping:
SERVICE_NAMESPACEautomatically set to environment name in deployed environments (dev,stage,prod) - Version Tracking:
SERVICE_VERSIONset to Cloud Run revision ID for deployment correlation - Google Cloud Integration: Direct export to Google Cloud Trace (OTLP) and Cloud Logging
- Trace Correlation: Logs automatically include trace context via
LoggingInstrumentor - Service Identification: OpenTelemetry
service.nameset toAGENT_NAMEenvironment variable - Authentication: Uses Application Default Credentials (ADC) for Google Cloud APIs
Configuration
Required environment variables:
- AGENT_NAME: OpenTelemetry service identifier (required)
- GOOGLE_CLOUD_PROJECT: GCP project ID for trace and log export (required)
- OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT: Capture LLM message content - TRUE or FALSE (required)
Optional variables:
- GOOGLE_CLOUD_LOCATION: Vertex AI region (default: us-central1)
- LOG_LEVEL: Logging verbosity - DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)
- TELEMETRY_NAMESPACE: Service namespace for trace grouping (default: local, auto-set to workspace in deployed environments)
See .env.example for complete configuration reference.
Usage
Identical OpenTelemetry setup across local development and deployed environments:
- Traces and logs automatically exported to Google Cloud
- ADK web UI available locally (when web interface enabled)
- Production tip: Use INFO log level to minimize logging costs
Viewing Traces and Logs
Google Cloud Console (Recommended)
Cloud Trace: Filter by agent name, view spans, timing, and generative AI events
Logs Explorer: Query logName="projects/{PROJECT_ID}/logs/{AGENT_NAME}-otel-logs" for correlated logs (replace {AGENT_NAME} with your agent identifier)
gcloud CLI
# Tail logs in real-time
gcloud logging tail "resource.type=cloud_run_revision" --format=json
# Filter by log name
gcloud logging tail "logName:projects/{PROJECT_ID}/logs/{AGENT_NAME}-otel-logs"
# View recent traces
gcloud trace list --limit=10
VS Code GCP Extension
Install the Google Cloud Code extension to view logs and traces directly in your IDE.
Architecture
The observability module uses a two-phase initialization strategy to coexist with ADK's internal telemetry:
- Phase 1 (
configure_otel_resource()): Set resource attributes viaOTEL_RESOURCE_ATTRIBUTESenv var before ADK starts. ADK reads these when constructing its internalTracerProvider. - Phase 2 (
setup_opentelemetry()): Run afterget_fast_api_app()returns. Augment ADK's existingTracerProviderwith a Cloud Trace span processor rather than replacing it. Configure logging exporters independently.
This approach preserves ADK's web UI traces (in-memory spans for local development) while adding Google Cloud export for all environments.
For the full explanation of why this design exists, including ADK's auto-instrumentation behavior and dependency decisions, see OpenTelemetry Architecture.
Implementation Details
Functions: configure_otel_resource() sets resource attributes, setup_opentelemetry() configures exporters
Components: GoogleGenAiSdkInstrumentor (LLM ops), LoggingInstrumentor (trace context), CloudLoggingExporter (logs), OTLPSpanExporter (traces)
Note
ADK auto-detects and calls GoogleGenAiSdkInstrumentor().instrument() when the package is installed. The explicit call in observability.py is redundant but idempotent — it serves as a defensive measure if the project ever stops using ADK's get_fast_api_app().
Resource Attributes
OpenTelemetry resource attributes uniquely identify your service instances in traces and logs:
| Attribute | Source | Example | Description |
|---|---|---|---|
service.name |
AGENT_NAME env var |
your-agent-name |
Service identifier (set explicitly in .env) |
service.namespace |
TELEMETRY_NAMESPACE env var |
dev/stage/prod (deployed) or local (dev) |
Environment name grouping for traces |
service.version |
K_REVISION env var |
your-agent-name-00042-abc (deployed) or local (dev) |
Cloud Run revision or local dev indicator |
service.instance.id |
Generated | worker-1234-a1b2c3d4e5f6 |
Unique process instance (PID + UUID) |
gcp.project_id |
GOOGLE_CLOUD_PROJECT env var |
my-project-id |
GCP project for resource correlation |
Local Development:
- service.namespace: Defaults to "local" (customize via TELEMETRY_NAMESPACE for multi-developer disambiguation)
- service.version: Set to "local"
- service.instance.id: Unique per server restart (includes UUID to prevent collisions)
Deployed Environments:
- service.namespace: Automatically set to environment name (dev, stage, prod)
- service.version: Automatically set to Cloud Run revision ID
- service.instance.id: Unique per container instance
Callback Logging
LoggingCallbacks (in callbacks.py) logs agent lifecycle events (start/end, model calls, tool invocations) with automatic trace context correlation.
Message Content Capture
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT controls LLM content capture:
- TRUE: Full content (debugging, higher costs, sensitive data)
- FALSE: Metadata only (production, lower costs, privacy)
Important
Must be explicitly set to TRUE for ADK to capture conversation content
Resources
- Vertex AI | Agent Engine | Trace an Agent
- Google Cloud Observability | Instrument ADK Applications with OpenTelemetry
- Google Cloud Trace | View Generative AI Events
- OpenTelemetry | Generative AI Instrumentation
- OpenTelemetry | Semantic Conventions for Generative AI
- OpenTelemetry Environment Variables