Resilience
Retry logic and rate limiting for robust streaming operations
Overview
Aegeantic provides universal resilience utilities that work with any async iterator (streams). These utilities wrap LLM calls, tool execution, or any async operation with automatic retry and rate limiting, making agents production-ready.
Retry Logic
The retry_stream function wraps any async iterator with automatic retry behavior:
from agentic.resilience import retry_stream, RetryConfig
from agentic.events import RetryEvent
async def llm_stream():
# Your streaming LLM call
async for chunk in provider.stream(prompt):
yield chunk
# Wrap with retry logic
config = RetryConfig(
max_attempts=3,
backoff="exponential",
base_delay=1.0,
max_delay=30.0,
jitter=True,
retry_on=(ConnectionError, TimeoutError)
)
async for item in retry_stream(
llm_stream,
config,
operation_name="gpt-4",
operation_type="llm"
):
if isinstance(item, RetryEvent):
print(f"Retry {item.attempt}/{item.max_attempts}, waiting {item.next_delay_seconds}s")
else:
# Actual LLM chunk
print(item, end="")
RetryConfig
Configuration for retry behavior:
from agentic.resilience import RetryConfig
config = RetryConfig(
max_attempts=3, # Maximum retry attempts (default: 3)
backoff="exponential", # "exponential" | "linear" | "constant"
base_delay=1.0, # Base delay in seconds (default: 1.0)
max_delay=60.0, # Maximum delay cap (default: 60.0)
jitter=True, # Add random jitter ±25% (default: True)
retry_on=(TimeoutError, ConnectionError, asyncio.TimeoutError)
)
Backoff Strategies
Exponential Backoff
Delay doubles with each retry (recommended for most cases):
RetryConfig(backoff="exponential", base_delay=1.0, max_delay=60.0)
# Delays: 1s, 2s, 4s, 8s, 16s, 32s, 60s (capped)
Linear Backoff
Delay increases linearly:
RetryConfig(backoff="linear", base_delay=2.0, max_delay=30.0)
# Delays: 2s, 4s, 6s, 8s, 10s...
Constant Backoff
Fixed delay between retries:
RetryConfig(backoff="constant", base_delay=5.0)
# Delays: 5s, 5s, 5s...
Jitter
Adding jitter (random variation) prevents thundering herd problems:
RetryConfig(jitter=True) # Adds ±25% random variation to delays
Retryable Exceptions
Specify which exceptions should trigger a retry:
from aiohttp import ClientError, ServerTimeoutError
RetryConfig(
retry_on=(
ConnectionError,
TimeoutError,
asyncio.TimeoutError,
ClientError,
ServerTimeoutError
)
)
Rate Limiting
Token bucket rate limiting for controlling request rates:
RateLimitConfig
from agentic.resilience import RateLimitConfig, RateLimiter
config = RateLimitConfig(
requests_per_second=10.0, # Limit to 10 requests/second
requests_per_minute=None, # Optional minute-level limit
requests_per_hour=None, # Optional hour-level limit
burst_size=20 # Allow bursts up to 20 tokens
)
limiter = RateLimiter(config)
Using RateLimiter
Manual Token Acquisition
# Blocking acquire - waits until token is available
await limiter.acquire(tokens=1, operation_name="api_call")
await make_api_call()
# Non-blocking try - returns immediately
if await limiter.try_acquire(tokens=1):
await make_api_call()
else:
print("Rate limit exceeded")
# Check available tokens
available = limiter.tokens_available()
With Streaming
from agentic.resilience import rate_limited_stream
from agentic.events import RateLimitEvent
async def my_stream():
async for item in data_source:
yield item
async for item in rate_limited_stream(
my_stream,
limiter,
operation_name="data_fetch"
):
if isinstance(item, RateLimitEvent):
print(f"Rate limit acquired, {item.tokens_remaining} tokens left")
else:
print(item)
Rate Limit Strategies
Strict Rate Limiting
Low burst size enforces even distribution:
RateLimitConfig(
requests_per_second=10.0,
burst_size=1 # No bursting, strict 10/sec
)
Bursty Rate Limiting
High burst size allows temporary spikes:
RateLimitConfig(
requests_per_second=10.0,
burst_size=100 # Allow bursts, average 10/sec
)
Multiple Time Windows
Combine limits at different granularities:
RateLimitConfig(
requests_per_second=10.0, # Max 10/sec
requests_per_minute=500.0, # Max 500/min
requests_per_hour=20000.0, # Max 20k/hour
burst_size=20
)
# Uses most restrictive rate
Combined Resilience
The resilient_stream function combines retry and rate limiting:
from agentic.resilience import (
resilient_stream, RetryConfig, RateLimitConfig, RateLimiter
)
from agentic.events import RetryEvent, RateLimitEvent
# Configure resilience
retry_config = RetryConfig(
max_attempts=3,
backoff="exponential",
base_delay=1.0,
retry_on=(ConnectionError, TimeoutError)
)
rate_config = RateLimitConfig(
requests_per_second=10.0,
burst_size=20
)
rate_limiter = RateLimiter(rate_config)
# Your streaming operation
async def llm_call():
async for chunk in provider.stream(prompt):
yield chunk
# Wrap with full resilience
async for item in resilient_stream(
llm_call,
retry_config=retry_config,
rate_limiter=rate_limiter,
operation_name="gpt-4",
operation_type="llm"
):
if isinstance(item, RetryEvent):
print(f"Retrying: {item.error}")
elif isinstance(item, RateLimitEvent):
print(f"Rate limited, {item.tokens_remaining} tokens left")
else:
# Actual data
print(item, end="")
Real-World Example: Production LLM Provider
from agentic.resilience import (
resilient_stream, RetryConfig, RateLimitConfig, RateLimiter
)
import logging
logger = logging.getLogger("llm_provider")
class ProductionLLMProvider:
def __init__(self, base_provider):
self._provider = base_provider
# Global rate limiter for this provider
self._rate_limiter = RateLimiter(
RateLimitConfig(
requests_per_second=50.0,
requests_per_minute=2000.0,
burst_size=100
)
)
# Retry configuration
self._retry_config = RetryConfig(
max_attempts=5,
backoff="exponential",
base_delay=1.0,
max_delay=30.0,
jitter=True,
retry_on=(
ConnectionError,
TimeoutError,
asyncio.TimeoutError
)
)
async def stream(self, prompt, **kwargs):
# Define the base streaming function
async def _stream():
async for chunk in self._provider.stream(prompt, **kwargs):
yield chunk
# Wrap with resilience
async for item in resilient_stream(
_stream,
retry_config=self._retry_config,
rate_limiter=self._rate_limiter,
operation_name="llm_stream",
operation_type="llm"
):
if isinstance(item, RetryEvent):
logger.warning(
f"LLM retry {item.attempt}/{item.max_attempts}: {item.error}"
)
elif isinstance(item, RateLimitEvent):
logger.debug(f"Rate limit acquired: {item.tokens_remaining} tokens")
else:
yield item
# Usage
provider = ProductionLLMProvider(base_llm_provider)
async for chunk in provider.stream("Hello, world!"):
print(chunk, end="")
Integration with Agent System
The framework automatically integrates resilience into the agent system. Agents can use resilient providers transparently:
from agentic import Agent, AgentConfig
# Create agent with production provider
agent = Agent(
config=AgentConfig(...),
context=context,
patterns=patterns,
tools=tools,
llm_provider=ProductionLLMProvider(base_provider)
)
# Agent step automatically benefits from retry + rate limiting
runner = AgentRunner(agent)
async for event in runner.step_stream(user_input):
# RetryEvent and RateLimitEvent will appear in stream
if isinstance(event, RetryEvent):
print(f"Retrying LLM call...")
elif isinstance(event, LLMChunkEvent):
print(event.chunk, end="")
Tool Resilience
Apply resilience to individual tools:
from agentic import tool
from agentic.resilience import resilient_stream, RetryConfig
class APITool:
def __init__(self):
self._retry_config = RetryConfig(
max_attempts=3,
backoff="exponential",
retry_on=(ConnectionError,)
)
@tool("Fetch data from external API")
async def fetch_data(self, url: str):
async def _fetch():
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
data = await response.json()
yield data
async for item in resilient_stream(
_fetch,
retry_config=self._retry_config,
operation_name="api_fetch",
operation_type="tool"
):
if not isinstance(item, RetryEvent):
return item
Event Types
RetryEvent
Emitted when an operation is being retried:
class RetryEvent:
operation_type: str # "llm", "tool", "custom"
operation_name: str # Name of operation
attempt: int # Current retry attempt (1-indexed)
max_attempts: int # Maximum attempts configured
error: str # Error that triggered retry
next_delay_seconds: float # Delay before next attempt
step_id: str # Optional step correlation ID
RateLimitEvent
Emitted when rate limit token is acquired:
class RateLimitEvent:
operation_name: str # Name of operation
acquired_at: float # Timestamp of acquisition
tokens_remaining: float # Tokens left in bucket
step_id: str # Optional step correlation ID
Best Practices
- Always use jitter - Prevents thundering herd in distributed systems
- Set reasonable max_delay - Cap exponential backoff to avoid excessive waits
- Use exponential backoff - Most effective for transient failures
- Retry transient errors only - Don't retry validation errors or 4xx HTTP codes
- Set max_attempts carefully - Too many retries amplify outages
- Monitor retry rates - High retry rates indicate upstream issues
- Combine rate limiting with retry - Rate limit first, then retry
- Use burst_size wisely - Match burst capacity to expected traffic patterns
- Share rate limiters - Use one RateLimiter instance per API/service
max_attempts and use exponential backoff with jitter to spread load.
Monitoring and Observability
Handle resilience events for monitoring:
from agentic.events import RetryEvent, RateLimitEvent
import time
class ResilienceMonitor:
def __init__(self):
self.retry_count = 0
self.rate_limit_waits = 0
self.total_delay = 0.0
def handle_event(self, event):
if isinstance(event, RetryEvent):
self.retry_count += 1
self.total_delay += event.next_delay_seconds
logger.warning(
f"Retry {event.attempt}/{event.max_attempts} "
f"for {event.operation_name}: {event.error}"
)
elif isinstance(event, RateLimitEvent):
self.rate_limit_waits += 1
logger.debug(
f"Rate limit for {event.operation_name}, "
f"{event.tokens_remaining} tokens remaining"
)
monitor = ResilienceMonitor()
async for item in resilient_stream(...):
monitor.handle_event(item)
# Process item...
print(f"Total retries: {monitor.retry_count}")
print(f"Total delay from retries: {monitor.total_delay}s")
Next Steps
- Agent System - Integrate resilience into agents
- Tools - Add resilience to custom tools
- Events - Handle resilience events
- Logic Flows - Combine with conditional loops