Monitoring
Effective monitoring is essential for maintaining reliable AI agent systems. Track performance, debug issues, and optimize your agents with comprehensive observability.
Why Monitor Agents?
Performance Track response times and throughput
Reliability Detect and respond to failures quickly
Cost Control Monitor token usage and API costs
Quality Ensure consistent agent behavior
Key Metrics
Time taken for agents to process and respond to requests Targets:
Simple queries: < 2 seconds
Complex tasks: < 30 seconds
Multi-agent workflows: < 60 seconds
Number of requests processed per unit time Metrics:
Requests per second (RPS)
Tasks completed per hour
Concurrent active tasks
Tokens consumed by LLM calls Track:
Input tokens
Output tokens
Total cost per request
Percentage of successfully completed tasks Targets:
Critical tasks: > 99%
Standard tasks: > 95%
Experimental features: > 90%
Agent-Specific Metrics
{
"agent_id" : "research-agent-001" ,
"metrics" : {
"total_requests" : 1543 ,
"successful_requests" : 1489 ,
"failed_requests" : 54 ,
"avg_response_time_ms" : 2341 ,
"total_tokens_used" : 2847392 ,
"avg_tokens_per_request" : 1845 ,
"uptime_percentage" : 99.7
}
}
Monitoring Stack
Logging
Capture detailed logs for debugging and analysis:
import logging
from bindu.penguin.pebblify import pebblify
# Configure logging
logging.basicConfig(
level = logging. INFO ,
format = ' %(asctime)s - %(name)s - %(levelname)s - %(message)s '
)
@pebblify (
name = "Monitored Agent" ,
debug_mode = True ,
debug_level = 2
)
def monitored_agent ( messages : list[ str ]) -> str :
logging.info( f "Processing request with { len (messages) } messages" )
try :
result = process_messages(messages)
logging.info( "Request completed successfully" )
return result
except Exception as e:
logging.error( f "Request failed: { str (e) } " )
raise
Metrics Collection
Track key performance indicators:
from prometheus_client import Counter, Histogram, Gauge
# Define metrics
request_count = Counter( 'agent_requests_total' , 'Total agent requests' )
request_duration = Histogram( 'agent_request_duration_seconds' , 'Request duration' )
active_tasks = Gauge( 'agent_active_tasks' , 'Number of active tasks' )
token_usage = Counter( 'agent_tokens_used' , 'Total tokens used' )
@pebblify ( name = "Instrumented Agent" )
def instrumented_agent ( messages : list[ str ]) -> str :
request_count.inc()
active_tasks.inc()
with request_duration.time():
result = process_messages(messages)
token_usage.inc(count_tokens(result))
active_tasks.dec()
return result
Tracing
Trace requests across multiple agents:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
tracer = trace.get_tracer( __name__ )
@pebblify ( name = "Traced Agent" )
def traced_agent ( messages : list[ str ]) -> str :
with tracer.start_as_current_span( "agent_execution" ) as span:
span.set_attribute( "message_count" , len (messages))
# Process messages
result = process_messages(messages)
span.set_attribute( "result_length" , len (result))
return result
Built-in Monitoring
Bindu provides built-in monitoring capabilities:
@pebblify (
name = "My Agent" ,
monitoring = True , # Enable monitoring
telemetry = True # Enable telemetry
)
def my_agent ( messages : list[ str ]) -> str :
# Agent implementation
pass
Prometheus Metrics collection and alerting
Grafana Visualization and dashboards
Jaeger Distributed tracing
ELK Stack Log aggregation and search
Dashboard Example
Key metrics to display on your monitoring dashboard:
Dashboard : Agent Performance
├── Overview
│ ├── Total Requests (24h)
│ ├── Success Rate
│ ├── Average Response Time
│ └── Active Agents
├── Performance
│ ├── Response Time (p50, p95, p99)
│ ├── Throughput (RPS)
│ └── Queue Depth
├── Reliability
│ ├── Error Rate
│ ├── Timeout Rate
│ └── Retry Rate
└── Cost
├── Token Usage
├── API Costs
└── Cost per Request
Alerting
Set up alerts for critical issues:
Alert Rules
alerts :
- name : HighErrorRate
condition : error_rate > 5%
duration : 5m
severity : critical
- name : SlowResponseTime
condition : p95_response_time > 10s
duration : 10m
severity : warning
- name : HighTokenUsage
condition : tokens_per_hour > 1000000
duration : 1h
severity : warning
- name : AgentDown
condition : agent_uptime < 95%
duration : 5m
severity : critical
Notification Channels
Email
Slack
PagerDuty
Discord
Webhook
Debugging
Debug Mode
Enable detailed debugging information:
@pebblify (
name = "Debug Agent" ,
debug_mode = True ,
debug_level = 2 # 1: basic, 2: detailed
)
def debug_agent ( messages : list[ str ]) -> str :
# Detailed logs will be generated
pass
Task History
Track task execution history:
# Get task details
curl -X POST http://localhost:8030/ \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tasks/get",
"params": {
"taskId": "550e8400-e29b-41d4-a716-446655440041",
"historyLength": 10
},
"id": "550e8400-e29b-41d4-a716-446655440025"
}'
Error Tracking
Capture and analyze errors:
import sentry_sdk
sentry_sdk.init(
dsn = "your-sentry-dsn" ,
traces_sample_rate = 1.0
)
@pebblify ( name = "Error Tracked Agent" )
def error_tracked_agent ( messages : list[ str ]) -> str :
try :
return process_messages(messages)
except Exception as e:
sentry_sdk.capture_exception(e)
raise
Best Practices
Start with basic monitoring and add more detailed metrics as you understand your system’s behavior.
Monitor What Matters : Focus on metrics that impact user experience
Set Baselines : Establish normal behavior before setting alerts
Alert Wisely : Avoid alert fatigue with meaningful thresholds
Log Contextually : Include relevant context in log messages
Trace End-to-End : Track requests across all system components
Review Regularly : Analyze metrics weekly to identify trends
Automate Responses : Set up automatic remediation for common issues
Use monitoring data to optimize agent performance:
Identify Bottlenecks
# Analyze slow requests
slow_requests = metrics.query(
"response_time > 5s" ,
timeframe = "24h"
)
# Find common patterns
for request in slow_requests:
print ( f "Task: { request.task_id } " )
print ( f "Duration: { request.duration } s" )
print ( f "Token count: { request.tokens } " )
Optimize Token Usage
# Track token efficiency
token_efficiency = total_successful_tasks / total_tokens_used
# Set optimization targets
if token_efficiency < 0.001 : # Less than 1 task per 1000 tokens
optimize_prompts()
Next Steps