Resilience

Retry logic and rate limiting for robust streaming operations

Overview

Aegeantic provides universal resilience utilities that work with any async iterator (streams). These utilities wrap LLM calls, tool execution, or any async operation with automatic retry and rate limiting, making agents production-ready.

Key Concept: All resilience functions work with async iterators (streams), not simple async functions. They yield events and items from the wrapped stream.

Retry Logic

The retry_stream function wraps any async iterator with automatic retry behavior:

from agentic.resilience import retry_stream, RetryConfig
from agentic.events import RetryEvent

async def llm_stream():
    # Your streaming LLM call
    async for chunk in provider.stream(prompt):
        yield chunk

# Wrap with retry logic
config = RetryConfig(
    max_attempts=3,
    backoff="exponential",
    base_delay=1.0,
    max_delay=30.0,
    jitter=True,
    retry_on=(ConnectionError, TimeoutError)
)

async for item in retry_stream(
    llm_stream,
    config,
    operation_name="gpt-4",
    operation_type="llm"
):
    if isinstance(item, RetryEvent):
        print(f"Retry {item.attempt}/{item.max_attempts}, waiting {item.next_delay_seconds}s")
    else:
        # Actual LLM chunk
        print(item, end="")

RetryConfig

Configuration for retry behavior:

from agentic.resilience import RetryConfig

config = RetryConfig(
    max_attempts=3,                      # Maximum retry attempts (default: 3)
    backoff="exponential",              # "exponential" | "linear" | "constant"
    base_delay=1.0,                      # Base delay in seconds (default: 1.0)
    max_delay=60.0,                      # Maximum delay cap (default: 60.0)
    jitter=True,                        # Add random jitter ±25% (default: True)
    retry_on=(TimeoutError, ConnectionError, asyncio.TimeoutError)
)

Backoff Strategies

Exponential Backoff

Delay doubles with each retry (recommended for most cases):

RetryConfig(backoff="exponential", base_delay=1.0, max_delay=60.0)
# Delays: 1s, 2s, 4s, 8s, 16s, 32s, 60s (capped)

Linear Backoff

Delay increases linearly:

RetryConfig(backoff="linear", base_delay=2.0, max_delay=30.0)
# Delays: 2s, 4s, 6s, 8s, 10s...

Constant Backoff

Fixed delay between retries:

RetryConfig(backoff="constant", base_delay=5.0)
# Delays: 5s, 5s, 5s...

Jitter

Adding jitter (random variation) prevents thundering herd problems:

RetryConfig(jitter=True)  # Adds ±25% random variation to delays

Retryable Exceptions

Specify which exceptions should trigger a retry:

from aiohttp import ClientError, ServerTimeoutError

RetryConfig(
    retry_on=(
        ConnectionError,
        TimeoutError,
        asyncio.TimeoutError,
        ClientError,
        ServerTimeoutError
    )
)

Rate Limiting

Token bucket rate limiting for controlling request rates:

RateLimitConfig

from agentic.resilience import RateLimitConfig, RateLimiter

config = RateLimitConfig(
    requests_per_second=10.0,   # Limit to 10 requests/second
    requests_per_minute=None,     # Optional minute-level limit
    requests_per_hour=None,       # Optional hour-level limit
    burst_size=20               # Allow bursts up to 20 tokens
)

limiter = RateLimiter(config)

Using RateLimiter

Manual Token Acquisition

# Blocking acquire - waits until token is available
await limiter.acquire(tokens=1, operation_name="api_call")
await make_api_call()

# Non-blocking try - returns immediately
if await limiter.try_acquire(tokens=1):
    await make_api_call()
else:
    print("Rate limit exceeded")

# Check available tokens
available = limiter.tokens_available()

With Streaming

from agentic.resilience import rate_limited_stream
from agentic.events import RateLimitEvent

async def my_stream():
    async for item in data_source:
        yield item

async for item in rate_limited_stream(
    my_stream,
    limiter,
    operation_name="data_fetch"
):
    if isinstance(item, RateLimitEvent):
        print(f"Rate limit acquired, {item.tokens_remaining} tokens left")
    else:
        print(item)

Rate Limit Strategies

Strict Rate Limiting

Low burst size enforces even distribution:

RateLimitConfig(
    requests_per_second=10.0,
    burst_size=1  # No bursting, strict 10/sec
)

Bursty Rate Limiting

High burst size allows temporary spikes:

RateLimitConfig(
    requests_per_second=10.0,
    burst_size=100  # Allow bursts, average 10/sec
)

Multiple Time Windows

Combine limits at different granularities:

RateLimitConfig(
    requests_per_second=10.0,   # Max 10/sec
    requests_per_minute=500.0,  # Max 500/min
    requests_per_hour=20000.0,  # Max 20k/hour
    burst_size=20
)
# Uses most restrictive rate

Combined Resilience

The resilient_stream function combines retry and rate limiting:

from agentic.resilience import (
    resilient_stream, RetryConfig, RateLimitConfig, RateLimiter
)
from agentic.events import RetryEvent, RateLimitEvent

# Configure resilience
retry_config = RetryConfig(
    max_attempts=3,
    backoff="exponential",
    base_delay=1.0,
    retry_on=(ConnectionError, TimeoutError)
)

rate_config = RateLimitConfig(
    requests_per_second=10.0,
    burst_size=20
)
rate_limiter = RateLimiter(rate_config)

# Your streaming operation
async def llm_call():
    async for chunk in provider.stream(prompt):
        yield chunk

# Wrap with full resilience
async for item in resilient_stream(
    llm_call,
    retry_config=retry_config,
    rate_limiter=rate_limiter,
    operation_name="gpt-4",
    operation_type="llm"
):
    if isinstance(item, RetryEvent):
        print(f"Retrying: {item.error}")
    elif isinstance(item, RateLimitEvent):
        print(f"Rate limited, {item.tokens_remaining} tokens left")
    else:
        # Actual data
        print(item, end="")

Real-World Example: Production LLM Provider

from agentic.resilience import (
    resilient_stream, RetryConfig, RateLimitConfig, RateLimiter
)
import logging

logger = logging.getLogger("llm_provider")

class ProductionLLMProvider:
    def __init__(self, base_provider):
        self._provider = base_provider

        # Global rate limiter for this provider
        self._rate_limiter = RateLimiter(
            RateLimitConfig(
                requests_per_second=50.0,
                requests_per_minute=2000.0,
                burst_size=100
            )
        )

        # Retry configuration
        self._retry_config = RetryConfig(
            max_attempts=5,
            backoff="exponential",
            base_delay=1.0,
            max_delay=30.0,
            jitter=True,
            retry_on=(
                ConnectionError,
                TimeoutError,
                asyncio.TimeoutError
            )
        )

    async def stream(self, prompt, **kwargs):
        # Define the base streaming function
        async def _stream():
            async for chunk in self._provider.stream(prompt, **kwargs):
                yield chunk

        # Wrap with resilience
        async for item in resilient_stream(
            _stream,
            retry_config=self._retry_config,
            rate_limiter=self._rate_limiter,
            operation_name="llm_stream",
            operation_type="llm"
        ):
            if isinstance(item, RetryEvent):
                logger.warning(
                    f"LLM retry {item.attempt}/{item.max_attempts}: {item.error}"
                )
            elif isinstance(item, RateLimitEvent):
                logger.debug(f"Rate limit acquired: {item.tokens_remaining} tokens")
            else:
                yield item

# Usage
provider = ProductionLLMProvider(base_llm_provider)

async for chunk in provider.stream("Hello, world!"):
    print(chunk, end="")

Integration with Agent System

The framework automatically integrates resilience into the agent system. Agents can use resilient providers transparently:

from agentic import Agent, AgentConfig

# Create agent with production provider
agent = Agent(
    config=AgentConfig(...),
    context=context,
    patterns=patterns,
    tools=tools,
    llm_provider=ProductionLLMProvider(base_provider)
)

# Agent step automatically benefits from retry + rate limiting
runner = AgentRunner(agent)
async for event in runner.step_stream(user_input):
    # RetryEvent and RateLimitEvent will appear in stream
    if isinstance(event, RetryEvent):
        print(f"Retrying LLM call...")
    elif isinstance(event, LLMChunkEvent):
        print(event.chunk, end="")

Tool Resilience

Apply resilience to individual tools:

from agentic import tool
from agentic.resilience import resilient_stream, RetryConfig

class APITool:
    def __init__(self):
        self._retry_config = RetryConfig(
            max_attempts=3,
            backoff="exponential",
            retry_on=(ConnectionError,)
        )

    @tool("Fetch data from external API")
    async def fetch_data(self, url: str):
        async def _fetch():
            async with aiohttp.ClientSession() as session:
                async with session.get(url) as response:
                    data = await response.json()
                    yield data

        async for item in resilient_stream(
            _fetch,
            retry_config=self._retry_config,
            operation_name="api_fetch",
            operation_type="tool"
        ):
            if not isinstance(item, RetryEvent):
                return item

Event Types

RetryEvent

Emitted when an operation is being retried:

class RetryEvent:
    operation_type: str     # "llm", "tool", "custom"
    operation_name: str     # Name of operation
    attempt: int            # Current retry attempt (1-indexed)
    max_attempts: int       # Maximum attempts configured
    error: str              # Error that triggered retry
    next_delay_seconds: float  # Delay before next attempt
    step_id: str            # Optional step correlation ID

RateLimitEvent

Emitted when rate limit token is acquired:

class RateLimitEvent:
    operation_name: str     # Name of operation
    acquired_at: float      # Timestamp of acquisition
    tokens_remaining: float # Tokens left in bucket
    step_id: str            # Optional step correlation ID

Best Practices

Warning: Excessive retries during outages can make problems worse. Always set max_attempts and use exponential backoff with jitter to spread load.

Monitoring and Observability

Handle resilience events for monitoring:

from agentic.events import RetryEvent, RateLimitEvent
import time

class ResilienceMonitor:
    def __init__(self):
        self.retry_count = 0
        self.rate_limit_waits = 0
        self.total_delay = 0.0

    def handle_event(self, event):
        if isinstance(event, RetryEvent):
            self.retry_count += 1
            self.total_delay += event.next_delay_seconds
            logger.warning(
                f"Retry {event.attempt}/{event.max_attempts} "
                f"for {event.operation_name}: {event.error}"
            )

        elif isinstance(event, RateLimitEvent):
            self.rate_limit_waits += 1
            logger.debug(
                f"Rate limit for {event.operation_name}, "
                f"{event.tokens_remaining} tokens remaining"
            )

monitor = ResilienceMonitor()

async for item in resilient_stream(...):
    monitor.handle_event(item)
    # Process item...

print(f"Total retries: {monitor.retry_count}")
print(f"Total delay from retries: {monitor.total_delay}s")

Next Steps