1. Symptoms
When OpenClaw encounters the clw-llm-limit-exceeded error, you will observe the following symptoms:
- Agent execution halts: The OpenClaw agent stops mid-task and returns an error status.
- API responses fail: LLM API calls return 429 status codes or equivalent rate-limit errors.
- Retry exhaustion: Multiple automatic retry attempts fail with the same error.
- Error message displayed: The CLI or SDK outputs a detailed error message.
Typical error output in the terminal:
[OpenClaw] ERROR: clw-llm-limit-exceeded Message: LLM API rate limit exceeded for model ‘gpt-4’ on endpoint ‘api.openai.com/v1/chat/completions’ Details: { “limit_type”: “requests_per_minute”, “current”: 150, “limit”: 150, “retry_after”: 45, “provider”: “openai” } Suggestion: Implement exponential backoff or reduce request frequency
In programmatic usage via the OpenClaw SDK, the error manifests as:
```python
from openclaw import Agent, OpenClawError
try:
agent = Agent(model="gpt-4")
result = agent.run("Analyze this dataset")
except OpenClawError as e:
if e.code == "clw-llm-limit-exceeded":
print(f"Rate limit hit: {e.details}")
# Handle the rate limit error
Additional observable symptoms include:
- Increased latency: Requests queue up while waiting for rate limit windows to reset.
- Partial results: Some LLM calls may succeed while others fail, leading to incomplete outputs.
- Logging entries: Verbose logs show HTTP 429 responses from the upstream LLM provider.
2. Root Cause
The clw-llm-limit-exceeded error occurs when your application exceeds the rate limits imposed by the LLM provider. Understanding the root causes is essential for implementing an effective fix.
Primary Causes
Exceeded requests-per-minute (RPM) limit: Most LLM providers enforce a maximum number of requests you can send within a 60-second window. For example, OpenAI’s GPT-4 has different tiers of RPM limits based on your subscription level.
Exceeded tokens-per-minute (TPM) limit: LLM providers also limit the total number of tokens (input + output) you can process per minute. This is separate from the RPM limit and often hits first with large prompts.
Exceeded daily/monthly quota: Cumulative usage limits that reset on a daily or monthly cycle can trigger this error when exhausted.
Burst traffic: Sending a large number of concurrent requests can exceed both instantaneous and rolling window limits.
Misconfigured retry logic: Aggressive retry logic without proper backoff can quickly exhaust rate limits during transient failures.
Technical Details
OpenClaw integrates with multiple LLM providers (OpenAI, Anthropic, Azure OpenAI, self-hosted models). Each provider has its own rate limit implementation:
| Provider | RPM Limit (Standard Tier) | TPM Limit (Standard Tier) |
|---|---|---|
| OpenAI GPT-4 | 150 | 60,000 |
| OpenAI GPT-3.5-Turbo | 350 | 90,000 |
| Anthropic Claude | 50 | 40,000 |
| Azure OpenAI | Varies by deployment | Varies by deployment |
The error is thrown by OpenClaw’s LLM client when it receives a 429 HTTP response or equivalent from the provider. OpenClaw attempts automatic retries with exponential backoff, but if all retry attempts fail, the error propagates to your application.
3. Step-by-Step Fix
Follow these steps to resolve the clw-llm-limit-exceeded error:
Step 1: Identify Your Current Rate Limit Usage
First, determine how many requests and tokens your application is consuming:
openclaw diagnostic --check-rate-limits --model gpt-4
This command outputs your current usage statistics:
Rate Limit Status for model: gpt-4
====================================
Requests (last 60s): 142 / 150 RPM
Tokens (last 60s): 58,200 / 60,000 TPM
Daily usage: $12.45 / $100.00
Step 2: Configure Rate Limit Parameters
Update your OpenClaw configuration to respect provider rate limits:
Before:
# openclaw.yaml
llm:
provider: openai
model: gpt-4
max_retries: 5
timeout: 30
After:
# openclaw.yaml
llm:
provider: openai
model: gpt-4
max_retries: 3
timeout: 60
rate_limit:
enabled: true
requests_per_minute: 120
tokens_per_minute: 50000
strategy: "exponential_backoff"
base_delay: 2
max_delay: 120
Step 3: Implement Token Budgeting
Reduce token consumption by optimizing your prompts and using context compression:
from openclaw import Agent
from openclaw.utils import TokenBudget
# Create a token budget to monitor usage
budget = TokenBudget(max_tokens=50000, window_seconds=60)
agent = Agent(
model="gpt-4",
token_budget=budget,
context_compression=True,
max_context_tokens=8000
)
# Your agent runs now respect the token budget
result = agent.run("Analyze this dataset")
Step 4: Add Exponential Backoff to Your Code
Implement robust retry logic with exponential backoff for programmatic usage:
Before:
from openclaw import Agent
agent = Agent(model="gpt-4")
# No rate limit handling
result = agent.run("Process this batch")
After:
import time
import random
from openclaw import Agent, OpenClawError
agent = Agent(
model="gpt-4",
max_retries=5,
backoff_base=2,
backoff_max=120
)
def run_with_backoff(prompt, max_attempts=5):
for attempt in range(max_attempts):
try:
return agent.run(prompt)
except OpenClawError as e:
if e.code == "clw-llm-limit-exceeded":
delay = min(2 ** attempt + random.uniform(0, 1), 120)
retry_after = e.details.get("retry_after", delay)
print(f"Rate limited. Retrying in {retry_after}s...")
time.sleep(retry_after)
else:
raise
raise Exception("Max retry attempts exceeded")
result = run_with_backoff("Process this batch")
Step 5: Consider Request Batching and Queueing
For high-volume workloads, implement a request queue:
from openclaw import Agent
from openclaw.queue import RateLimitedQueue
import asyncio
queue = RateLimitedQueue(
requests_per_minute=100,
tokens_per_minute=45000
)
async def process_batch(prompts):
agent = Agent(model="gpt-4")
results = []
for prompt in prompts:
queued_agent = await queue.enqueue(agent, prompt)
result = await queued_agent.run()
results.append(result)
return results
# Usage
prompts = [f"Analyze item {i}" for i in range(100)]
results = asyncio.run(process_batch(prompts))
4. Verification
After implementing the fixes, verify that the clw-llm-limit-exceeded error is resolved:
Test Script
Create and run a verification script:
#!/usr/bin/env python3
"""Verify rate limit fix for clw-llm-limit-exceeded error."""
from openclaw import Agent
from openclaw.utils import RateLimitMonitor
import time
def verify_fix():
print("=" * 50)
print("Rate Limit Fix Verification")
print("=" * 50)
# Initialize monitor
monitor = RateLimitMonitor()
agent = Agent(model="gpt-4")
# Send a burst of requests (should respect limits)
test_prompts = [
"What is 2+2?",
"What is the capital of France?",
"Define: algorithm",
"What year is it?",
"What is H2O?",
]
success_count = 0
rate_limit_errors = 0
for i, prompt in enumerate(test_prompts):
print(f"\nRequest {i+1}/5: {prompt[:30]}...")
try:
result = agent.run(prompt)
success_count += 1
print(f" Status: SUCCESS")
except Exception as e:
if "clw-llm-limit-exceeded" in str(e):
rate_limit_errors += 1
print(f" Status: RATE LIMITED ({e})")
else:
print(f" Status: OTHER ERROR ({e})")
# Report results
print("\n" + "=" * 50)
print("Verification Results:")
print("=" * 50)
print(f"Total requests: {len(test_prompts)}")
print(f"Successful: {success_count}")
print(f"Rate limit errors: {rate_limit_errors}")
print(f"Current RPM usage: {monitor.get_current_rpm()}")
print(f"Current TPM usage: {monitor.get_current_tpm()}")
if rate_limit_errors == 0:
print("\n✓ VERIFICATION PASSED: No rate limit errors detected")
else:
print(f"\n✗ VERIFICATION FAILED: {rate_limit_errors} rate limit errors")
return rate_limit_errors == 0
if __name__ == "__main__":
success = verify_fix()
exit(0 if success else 1)
Expected Output
==================================================
Rate Limit Fix Verification
==================================================
Request 1/5: What is 2+2?...
Status: SUCCESS
Request 2/5: What is the capital of France?...
Status: SUCCESS
Request 3/5: Define: algorithm...
Status: SUCCESS
Request 4/5: What year is it?...
Status: SUCCESS
Request 5/5: What is H2O?...
Status: SUCCESS
==================================================
Verification Results:
==================================================
Total requests: 5
Successful: 5
Rate limit errors: 0
Current RPM usage: 5 / 120
Current TPM usage: 847 / 50000
✓ VERIFICATION PASSED: No rate limit errors detected
5. Common Pitfalls
Avoid these common mistakes when handling clw-llm-limit-exceeded:
Pitfall 1: Disabling Retries Entirely
Anti-pattern: Setting max_retries: 0 to “avoid rate limits.”
# WRONG - This will fail immediately on any rate limit
llm:
max_retries: 0
Correct approach: Keep retries but implement proper backoff.
Pitfall 2: Ignoring retry_after from Provider
The provider includes a retry_after value indicating when to retry. Some code ignores this:
# WRONG - Always using fixed backoff
time.sleep(5) # May be too short or too long
# CORRECT - Respect provider's retry_after
sleep_time = e.details.get("retry_after", default_backoff)
time.sleep(sleep_time)
Pitfall 3: Not Monitoring TPM vs RPM
Many developers only track requests but forget tokens:
# WRONG - Only checking RPM
if current_rpm < rpm_limit:
send_request()
# CORRECT - Checking both limits
if current_rpm < rpm_limit and current_tpm < tpm_limit:
send_request()
Pitfall 4: Concurrent Requests Without Coordination
Multiple processes or threads can independently exhaust rate limits:
# WRONG - No coordination across processes
import multiprocessing
def worker(prompt):
agent = Agent(model="gpt-4")
return agent.run(prompt)
with multiprocessing.Pool(10) as pool:
results = pool.map(worker, prompts) # 10x rate limit usage!
# CORRECT - Use a shared queue or rate limiter
from openclaw.queue import DistributedRateLimiter
limiter = DistributedRateLimiter(rpm_limit=120)
def worker(prompt):
with limiter.acquire():
agent = Agent(model="gpt-4")
return agent.run(prompt)
Pitfall 5: Not Handling Partial Failures
When processing batches, some requests may succeed while others hit rate limits:
# WRONG - Failing entire batch on any error
results = []
for prompt in prompts:
results.append(agent.run(prompt)) # Stops on first error
# CORRECT - Implement partial failure handling
results = []
errors = []
for i, prompt in enumerate(prompts):
try:
results.append({"index": i, "result": agent.run(prompt)})
except OpenClawError as e:
if e.code == "clw-llm-limit-exceeded":
errors.append({"index": i, "error": e, "retryable": True})
else:
errors.append({"index": i, "error": e, "retryable": False})
6. Related Errors
The following errors are commonly related to clw-llm-limit-exceeded:
| Error Code | Description | Relationship |
|---|---|---|
clw-auth-invalid | Authentication failure with LLM provider | Can trigger rate limit errors if repeated auth attempts exhaust limits |
clw-context-length | Input exceeds model’s context window | Large prompts consume more tokens, increasing TPM usage |
clw-model-unavailable | Requested model is unavailable | May trigger fallback attempts that hit rate limits |
clw-timeout-operation | LLM request timed out | Timeout retries can compound rate limit issues |
clw-rate-limit-global | Global account-level rate limit exceeded | More severe than model-specific limits |
Troubleshooting Related Errors
When clw-context-length occurs alongside clw-llm-limit-exceeded:
# Reduce token usage to address both errors
from openclaw import Agent
from openclaw.utils import ContextManager
agent = Agent(
model="gpt-4",
context_manager=ContextManager(
max_tokens=6000,
compression_threshold=0.7
)
)
When clw-timeout-operation leads to clw-llm-limit-exceeded:
# Increase timeout to reduce unnecessary retries
agent = Agent(
model="gpt-4",
timeout=120, # Increased from default
max_retries=2 # Reduced to prevent retry storms
)
Summary
The clw-llm-limit-exceeded error indicates that your OpenClaw application’s LLM API usage has exceeded the rate limits imposed by your provider. To resolve this error:
- Monitor your RPM and TPM usage against provider limits
- Configure rate limit parameters in your OpenClaw setup
- Implement exponential backoff for retry logic
- Optimize token usage through context compression and batching
- Queue requests when handling high-volume workloads
By following these steps and avoiding common pitfalls, you can build resilient OpenClaw applications that handle LLM rate limits gracefully while maintaining optimal performance.