Skip to main content

Monitoring

Effective monitoring is essential for maintaining reliable AI agent systems. Track performance, debug issues, and optimize your agents with comprehensive observability.

Why Monitor Agents?

Performance

Track response times and throughput

Reliability

Detect and respond to failures quickly

Cost Control

Monitor token usage and API costs

Quality

Ensure consistent agent behavior

Key Metrics

Performance Metrics

Time taken for agents to process and respond to requestsTargets:
  • Simple queries: < 2 seconds
  • Complex tasks: < 30 seconds
  • Multi-agent workflows: < 60 seconds
Number of requests processed per unit timeMetrics:
  • Requests per second (RPS)
  • Tasks completed per hour
  • Concurrent active tasks
Tokens consumed by LLM callsTrack:
  • Input tokens
  • Output tokens
  • Total cost per request
Percentage of successfully completed tasksTargets:
  • Critical tasks: > 99%
  • Standard tasks: > 95%
  • Experimental features: > 90%

Agent-Specific Metrics

{
  "agent_id": "research-agent-001",
  "metrics": {
    "total_requests": 1543,
    "successful_requests": 1489,
    "failed_requests": 54,
    "avg_response_time_ms": 2341,
    "total_tokens_used": 2847392,
    "avg_tokens_per_request": 1845,
    "uptime_percentage": 99.7
  }
}

Monitoring Stack

Logging

Capture detailed logs for debugging and analysis:
import logging
from bindu.penguin.pebblify import pebblify

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

@pebblify(
    name="Monitored Agent",
    debug_mode=True,
    debug_level=2
)
def monitored_agent(messages: list[str]) -> str:
    logging.info(f"Processing request with {len(messages)} messages")
    
    try:
        result = process_messages(messages)
        logging.info("Request completed successfully")
        return result
    except Exception as e:
        logging.error(f"Request failed: {str(e)}")
        raise

Metrics Collection

Track key performance indicators:
from prometheus_client import Counter, Histogram, Gauge

# Define metrics
request_count = Counter('agent_requests_total', 'Total agent requests')
request_duration = Histogram('agent_request_duration_seconds', 'Request duration')
active_tasks = Gauge('agent_active_tasks', 'Number of active tasks')
token_usage = Counter('agent_tokens_used', 'Total tokens used')

@pebblify(name="Instrumented Agent")
def instrumented_agent(messages: list[str]) -> str:
    request_count.inc()
    active_tasks.inc()
    
    with request_duration.time():
        result = process_messages(messages)
        token_usage.inc(count_tokens(result))
    
    active_tasks.dec()
    return result

Tracing

Trace requests across multiple agents:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider

tracer = trace.get_tracer(__name__)

@pebblify(name="Traced Agent")
def traced_agent(messages: list[str]) -> str:
    with tracer.start_as_current_span("agent_execution") as span:
        span.set_attribute("message_count", len(messages))
        
        # Process messages
        result = process_messages(messages)
        
        span.set_attribute("result_length", len(result))
        return result

Monitoring Tools

Built-in Monitoring

Bindu provides built-in monitoring capabilities:
@pebblify(
    name="My Agent",
    monitoring=True,  # Enable monitoring
    telemetry=True    # Enable telemetry
)
def my_agent(messages: list[str]) -> str:
    # Agent implementation
    pass

External Tools

Prometheus

Metrics collection and alerting

Grafana

Visualization and dashboards

Jaeger

Distributed tracing

ELK Stack

Log aggregation and search

Dashboard Example

Key metrics to display on your monitoring dashboard:
Dashboard: Agent Performance
├── Overview
│   ├── Total Requests (24h)
│   ├── Success Rate
│   ├── Average Response Time
│   └── Active Agents
├── Performance
│   ├── Response Time (p50, p95, p99)
│   ├── Throughput (RPS)
│   └── Queue Depth
├── Reliability
│   ├── Error Rate
│   ├── Timeout Rate
│   └── Retry Rate
└── Cost
    ├── Token Usage
    ├── API Costs
    └── Cost per Request

Alerting

Set up alerts for critical issues:

Alert Rules

alerts:
  - name: HighErrorRate
    condition: error_rate > 5%
    duration: 5m
    severity: critical
    
  - name: SlowResponseTime
    condition: p95_response_time > 10s
    duration: 10m
    severity: warning
    
  - name: HighTokenUsage
    condition: tokens_per_hour > 1000000
    duration: 1h
    severity: warning
    
  - name: AgentDown
    condition: agent_uptime < 95%
    duration: 5m
    severity: critical

Notification Channels

  • Email
  • Slack
  • PagerDuty
  • Discord
  • Webhook

Debugging

Debug Mode

Enable detailed debugging information:
@pebblify(
    name="Debug Agent",
    debug_mode=True,
    debug_level=2  # 1: basic, 2: detailed
)
def debug_agent(messages: list[str]) -> str:
    # Detailed logs will be generated
    pass

Task History

Track task execution history:
# Get task details
curl -X POST http://localhost:8030/ \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tasks/get",
    "params": {
      "taskId": "550e8400-e29b-41d4-a716-446655440041",
      "historyLength": 10
    },
    "id": "550e8400-e29b-41d4-a716-446655440025"
  }'

Error Tracking

Capture and analyze errors:
import sentry_sdk

sentry_sdk.init(
    dsn="your-sentry-dsn",
    traces_sample_rate=1.0
)

@pebblify(name="Error Tracked Agent")
def error_tracked_agent(messages: list[str]) -> str:
    try:
        return process_messages(messages)
    except Exception as e:
        sentry_sdk.capture_exception(e)
        raise

Best Practices

Start with basic monitoring and add more detailed metrics as you understand your system’s behavior.
  1. Monitor What Matters: Focus on metrics that impact user experience
  2. Set Baselines: Establish normal behavior before setting alerts
  3. Alert Wisely: Avoid alert fatigue with meaningful thresholds
  4. Log Contextually: Include relevant context in log messages
  5. Trace End-to-End: Track requests across all system components
  6. Review Regularly: Analyze metrics weekly to identify trends
  7. Automate Responses: Set up automatic remediation for common issues

Performance Optimization

Use monitoring data to optimize agent performance:

Identify Bottlenecks

# Analyze slow requests
slow_requests = metrics.query(
    "response_time > 5s",
    timeframe="24h"
)

# Find common patterns
for request in slow_requests:
    print(f"Task: {request.task_id}")
    print(f"Duration: {request.duration}s")
    print(f"Token count: {request.tokens}")

Optimize Token Usage

# Track token efficiency
token_efficiency = total_successful_tasks / total_tokens_used

# Set optimization targets
if token_efficiency < 0.001:  # Less than 1 task per 1000 tokens
    optimize_prompts()

Next Steps

I