Fix clw-llm-limit-exceeded: LLM API Rate Limit Exceeded

LLM Integration intermediate Linux macOS Windows Docker

1. Symptoms

When OpenClaw encounters the clw-llm-limit-exceeded error, you will observe the following symptoms:

  • Agent execution halts: The OpenClaw agent stops mid-task and returns an error status.
  • API responses fail: LLM API calls return 429 status codes or equivalent rate-limit errors.
  • Retry exhaustion: Multiple automatic retry attempts fail with the same error.
  • Error message displayed: The CLI or SDK outputs a detailed error message.

Typical error output in the terminal:

[OpenClaw] ERROR: clw-llm-limit-exceeded Message: LLM API rate limit exceeded for model ‘gpt-4’ on endpoint ‘api.openai.com/v1/chat/completions’ Details: { “limit_type”: “requests_per_minute”, “current”: 150, “limit”: 150, “retry_after”: 45, “provider”: “openai” } Suggestion: Implement exponential backoff or reduce request frequency


In programmatic usage via the OpenClaw SDK, the error manifests as:

```python
from openclaw import Agent, OpenClawError

try:
    agent = Agent(model="gpt-4")
    result = agent.run("Analyze this dataset")
except OpenClawError as e:
    if e.code == "clw-llm-limit-exceeded":
        print(f"Rate limit hit: {e.details}")
        # Handle the rate limit error

Additional observable symptoms include:

  • Increased latency: Requests queue up while waiting for rate limit windows to reset.
  • Partial results: Some LLM calls may succeed while others fail, leading to incomplete outputs.
  • Logging entries: Verbose logs show HTTP 429 responses from the upstream LLM provider.

2. Root Cause

The clw-llm-limit-exceeded error occurs when your application exceeds the rate limits imposed by the LLM provider. Understanding the root causes is essential for implementing an effective fix.

Primary Causes

  1. Exceeded requests-per-minute (RPM) limit: Most LLM providers enforce a maximum number of requests you can send within a 60-second window. For example, OpenAI’s GPT-4 has different tiers of RPM limits based on your subscription level.

  2. Exceeded tokens-per-minute (TPM) limit: LLM providers also limit the total number of tokens (input + output) you can process per minute. This is separate from the RPM limit and often hits first with large prompts.

  3. Exceeded daily/monthly quota: Cumulative usage limits that reset on a daily or monthly cycle can trigger this error when exhausted.

  4. Burst traffic: Sending a large number of concurrent requests can exceed both instantaneous and rolling window limits.

  5. Misconfigured retry logic: Aggressive retry logic without proper backoff can quickly exhaust rate limits during transient failures.

Technical Details

OpenClaw integrates with multiple LLM providers (OpenAI, Anthropic, Azure OpenAI, self-hosted models). Each provider has its own rate limit implementation:

ProviderRPM Limit (Standard Tier)TPM Limit (Standard Tier)
OpenAI GPT-415060,000
OpenAI GPT-3.5-Turbo35090,000
Anthropic Claude5040,000
Azure OpenAIVaries by deploymentVaries by deployment

The error is thrown by OpenClaw’s LLM client when it receives a 429 HTTP response or equivalent from the provider. OpenClaw attempts automatic retries with exponential backoff, but if all retry attempts fail, the error propagates to your application.


3. Step-by-Step Fix

Follow these steps to resolve the clw-llm-limit-exceeded error:

Step 1: Identify Your Current Rate Limit Usage

First, determine how many requests and tokens your application is consuming:

openclaw diagnostic --check-rate-limits --model gpt-4

This command outputs your current usage statistics:

Rate Limit Status for model: gpt-4
====================================
Requests (last 60s): 142 / 150 RPM
Tokens (last 60s): 58,200 / 60,000 TPM
Daily usage: $12.45 / $100.00

Step 2: Configure Rate Limit Parameters

Update your OpenClaw configuration to respect provider rate limits:

Before:

# openclaw.yaml
llm:
  provider: openai
  model: gpt-4
  max_retries: 5
  timeout: 30

After:

# openclaw.yaml
llm:
  provider: openai
  model: gpt-4
  max_retries: 3
  timeout: 60
  rate_limit:
    enabled: true
    requests_per_minute: 120
    tokens_per_minute: 50000
    strategy: "exponential_backoff"
    base_delay: 2
    max_delay: 120

Step 3: Implement Token Budgeting

Reduce token consumption by optimizing your prompts and using context compression:

from openclaw import Agent
from openclaw.utils import TokenBudget

# Create a token budget to monitor usage
budget = TokenBudget(max_tokens=50000, window_seconds=60)

agent = Agent(
    model="gpt-4",
    token_budget=budget,
    context_compression=True,
    max_context_tokens=8000
)

# Your agent runs now respect the token budget
result = agent.run("Analyze this dataset")

Step 4: Add Exponential Backoff to Your Code

Implement robust retry logic with exponential backoff for programmatic usage:

Before:

from openclaw import Agent

agent = Agent(model="gpt-4")

# No rate limit handling
result = agent.run("Process this batch")

After:

import time
import random
from openclaw import Agent, OpenClawError

agent = Agent(
    model="gpt-4",
    max_retries=5,
    backoff_base=2,
    backoff_max=120
)

def run_with_backoff(prompt, max_attempts=5):
    for attempt in range(max_attempts):
        try:
            return agent.run(prompt)
        except OpenClawError as e:
            if e.code == "clw-llm-limit-exceeded":
                delay = min(2 ** attempt + random.uniform(0, 1), 120)
                retry_after = e.details.get("retry_after", delay)
                print(f"Rate limited. Retrying in {retry_after}s...")
                time.sleep(retry_after)
            else:
                raise
    raise Exception("Max retry attempts exceeded")

result = run_with_backoff("Process this batch")

Step 5: Consider Request Batching and Queueing

For high-volume workloads, implement a request queue:

from openclaw import Agent
from openclaw.queue import RateLimitedQueue
import asyncio

queue = RateLimitedQueue(
    requests_per_minute=100,
    tokens_per_minute=45000
)

async def process_batch(prompts):
    agent = Agent(model="gpt-4")
    results = []
    
    for prompt in prompts:
        queued_agent = await queue.enqueue(agent, prompt)
        result = await queued_agent.run()
        results.append(result)
    
    return results

# Usage
prompts = [f"Analyze item {i}" for i in range(100)]
results = asyncio.run(process_batch(prompts))

4. Verification

After implementing the fixes, verify that the clw-llm-limit-exceeded error is resolved:

Test Script

Create and run a verification script:

#!/usr/bin/env python3
"""Verify rate limit fix for clw-llm-limit-exceeded error."""

from openclaw import Agent
from openclaw.utils import RateLimitMonitor
import time

def verify_fix():
    print("=" * 50)
    print("Rate Limit Fix Verification")
    print("=" * 50)
    
    # Initialize monitor
    monitor = RateLimitMonitor()
    agent = Agent(model="gpt-4")
    
    # Send a burst of requests (should respect limits)
    test_prompts = [
        "What is 2+2?",
        "What is the capital of France?",
        "Define: algorithm",
        "What year is it?",
        "What is H2O?",
    ]
    
    success_count = 0
    rate_limit_errors = 0
    
    for i, prompt in enumerate(test_prompts):
        print(f"\nRequest {i+1}/5: {prompt[:30]}...")
        try:
            result = agent.run(prompt)
            success_count += 1
            print(f"  Status: SUCCESS")
        except Exception as e:
            if "clw-llm-limit-exceeded" in str(e):
                rate_limit_errors += 1
                print(f"  Status: RATE LIMITED ({e})")
            else:
                print(f"  Status: OTHER ERROR ({e})")
    
    # Report results
    print("\n" + "=" * 50)
    print("Verification Results:")
    print("=" * 50)
    print(f"Total requests: {len(test_prompts)}")
    print(f"Successful: {success_count}")
    print(f"Rate limit errors: {rate_limit_errors}")
    print(f"Current RPM usage: {monitor.get_current_rpm()}")
    print(f"Current TPM usage: {monitor.get_current_tpm()}")
    
    if rate_limit_errors == 0:
        print("\n✓ VERIFICATION PASSED: No rate limit errors detected")
    else:
        print(f"\n✗ VERIFICATION FAILED: {rate_limit_errors} rate limit errors")
    
    return rate_limit_errors == 0

if __name__ == "__main__":
    success = verify_fix()
    exit(0 if success else 1)

Expected Output

==================================================
Rate Limit Fix Verification
==================================================

Request 1/5: What is 2+2?...
  Status: SUCCESS
Request 2/5: What is the capital of France?...
  Status: SUCCESS
Request 3/5: Define: algorithm...
  Status: SUCCESS
Request 4/5: What year is it?...
  Status: SUCCESS
Request 5/5: What is H2O?...
  Status: SUCCESS

==================================================
Verification Results:
==================================================
Total requests: 5
Successful: 5
Rate limit errors: 0
Current RPM usage: 5 / 120
Current TPM usage: 847 / 50000

✓ VERIFICATION PASSED: No rate limit errors detected

5. Common Pitfalls

Avoid these common mistakes when handling clw-llm-limit-exceeded:

Pitfall 1: Disabling Retries Entirely

Anti-pattern: Setting max_retries: 0 to “avoid rate limits.”

# WRONG - This will fail immediately on any rate limit
llm:
  max_retries: 0

Correct approach: Keep retries but implement proper backoff.

Pitfall 2: Ignoring retry_after from Provider

The provider includes a retry_after value indicating when to retry. Some code ignores this:

# WRONG - Always using fixed backoff
time.sleep(5)  # May be too short or too long
# CORRECT - Respect provider's retry_after
sleep_time = e.details.get("retry_after", default_backoff)
time.sleep(sleep_time)

Pitfall 3: Not Monitoring TPM vs RPM

Many developers only track requests but forget tokens:

# WRONG - Only checking RPM
if current_rpm < rpm_limit:
    send_request()
# CORRECT - Checking both limits
if current_rpm < rpm_limit and current_tpm < tpm_limit:
    send_request()

Pitfall 4: Concurrent Requests Without Coordination

Multiple processes or threads can independently exhaust rate limits:

# WRONG - No coordination across processes
import multiprocessing

def worker(prompt):
    agent = Agent(model="gpt-4")
    return agent.run(prompt)

with multiprocessing.Pool(10) as pool:
    results = pool.map(worker, prompts)  # 10x rate limit usage!
# CORRECT - Use a shared queue or rate limiter
from openclaw.queue import DistributedRateLimiter

limiter = DistributedRateLimiter(rpm_limit=120)

def worker(prompt):
    with limiter.acquire():
        agent = Agent(model="gpt-4")
        return agent.run(prompt)

Pitfall 5: Not Handling Partial Failures

When processing batches, some requests may succeed while others hit rate limits:

# WRONG - Failing entire batch on any error
results = []
for prompt in prompts:
    results.append(agent.run(prompt))  # Stops on first error
# CORRECT - Implement partial failure handling
results = []
errors = []
for i, prompt in enumerate(prompts):
    try:
        results.append({"index": i, "result": agent.run(prompt)})
    except OpenClawError as e:
        if e.code == "clw-llm-limit-exceeded":
            errors.append({"index": i, "error": e, "retryable": True})
        else:
            errors.append({"index": i, "error": e, "retryable": False})

The following errors are commonly related to clw-llm-limit-exceeded:

Error CodeDescriptionRelationship
clw-auth-invalidAuthentication failure with LLM providerCan trigger rate limit errors if repeated auth attempts exhaust limits
clw-context-lengthInput exceeds model’s context windowLarge prompts consume more tokens, increasing TPM usage
clw-model-unavailableRequested model is unavailableMay trigger fallback attempts that hit rate limits
clw-timeout-operationLLM request timed outTimeout retries can compound rate limit issues
clw-rate-limit-globalGlobal account-level rate limit exceededMore severe than model-specific limits

When clw-context-length occurs alongside clw-llm-limit-exceeded:

# Reduce token usage to address both errors
from openclaw import Agent
from openclaw.utils import ContextManager

agent = Agent(
    model="gpt-4",
    context_manager=ContextManager(
        max_tokens=6000,
        compression_threshold=0.7
    )
)

When clw-timeout-operation leads to clw-llm-limit-exceeded:

# Increase timeout to reduce unnecessary retries
agent = Agent(
    model="gpt-4",
    timeout=120,  # Increased from default
    max_retries=2  # Reduced to prevent retry storms
)

Summary

The clw-llm-limit-exceeded error indicates that your OpenClaw application’s LLM API usage has exceeded the rate limits imposed by your provider. To resolve this error:

  1. Monitor your RPM and TPM usage against provider limits
  2. Configure rate limit parameters in your OpenClaw setup
  3. Implement exponential backoff for retry logic
  4. Optimize token usage through context compression and batching
  5. Queue requests when handling high-volume workloads

By following these steps and avoiding common pitfalls, you can build resilient OpenClaw applications that handle LLM rate limits gracefully while maintaining optimal performance.