1. Symptoms
The clw-llm-timeout error in OpenClaw manifests during LLM inference requests, typically when communicating with remote API endpoints like those from OpenAI, Anthropic, or custom servers. This error halts execution and logs a specific message indicating a timeout.
Common symptoms include:
[ERROR] clw-llm-timeout: Request to LLM endpoint ‘https://api.llm-provider.com/v1/completions' timed out after 30s. No response received. Context: prompt_length=2048, model=claude-3-opus.
- Application freezes or throws exceptions mid-request.
- High CPU usage during wait periods, followed by abrupt failure.
- Intermittent failures under load, especially with long prompts (>1000 tokens).
- Logs show repeated attempts if retries are enabled, e.g.:
[INFO] clw-llm-timeout: Retry attempt 1/3 after 30s timeout. [ERROR] clw-llm-timeout: All retries exhausted.
This occurs in synchronous and asynchronous modes. In production, it leads to cascading failures in pipelines relying on LLM responses for tasks like code generation or data processing. Monitor via OpenClaw's debug logs (`CLW_LOG_LEVEL=DEBUG`) to confirm:
DEBUG: HTTP request sent at 2024-10-12T10:30:00Z DEBUG: No ACK after 30s, marking as timeout.
Affected versions: OpenClaw v2.1.0+, common in C++ applications integrating LLMs.
## 2. Root Cause
The `clw-llm-timeout` stems from OpenClaw's default HTTP client timeout (30 seconds) being insufficient for LLM requests. Root causes:
1. **Slow LLM Inference**: Large models (e.g., GPT-4, Claude-3) take 45-120s for complex prompts due to queueing, token generation, and server load.
2. **Network Latency**: High RTT (>200ms) in cloud environments, VPNs, or regions far from the LLM provider's data centers.
3. **Payload Size**: Prompts exceeding 4k tokens inflate transmission time; responses can be gigabytes for verbose outputs.
4. **Server-Side Issues**: Provider overload, rate limits, or maintenance without prior notice.
5. **Client Misconfiguration**: Default `ClwLLMClient` uses `curl` backend with no custom timeouts; async mode (`clw::async_request`) exacerbates if event loop blocks.
Internally, OpenClaw's `libclw_http.cpp` enforces:
```cpp
// Simplified from OpenClaw source (v2.1.2)
constexpr int DEFAULT_TIMEOUT_SEC = 30;
if (response_timeout > 0) {
curl_easy_setopt(handle, CURLOPT_TIMEOUT, response_timeout);
} else {
curl_easy_setopt(handle, CURLOPT_TIMEOUT, DEFAULT_TIMEOUT_SEC);
}
No response by deadline triggers CLW_ERR_LLM_TIMEOUT. Diagnostics: Use curl -v to baseline endpoint latency independently.
3. Step-by-Step Fix
Fix clw-llm-timeout by configuring timeouts, enabling retries, and optimizing requests. Target 60-300s based on model/prompt size.
Step 1: Update Client Initialization
Increase timeout in ClwLLMClient constructor or setter.
Before:
#include <openclaw/llm_client.h>
int main() {
clw::LLMClient client("your-api-key", "https://api.llm-provider.com/v1");
// Default 30s timeout leads to clw-llm-timeout
auto response = client.complete("Explain quantum computing in detail.");
// Fails with timeout
std::cout << response.text << std::endl;
return 0;
}
After:
#include <openclaw/llm_client.h>
int main() {
clw::LLMClient client("your-api-key", "https://api.llm-provider.com/v1");
client.set_timeout(120); // 2 minutes for large models
client.set_retries(3, 5.0); // 3 retries, 5s backoff
auto response = client.complete("Explain quantum computing in detail.");
if (response.error == clw::Error::None) {
std::cout << response.text << std::endl;
} else {
std::cerr << "Error: " << clw::error_message(response.error) << std::endl;
}
return 0;
}
Step 2: Async Handling for Production
Switch to async to prevent blocking.
Before:
// Sync blocks main thread
auto sync_resp = client.complete(long_prompt);
After:
#include <openclaw/llm_client.h>
#include <future>
std::future<clw::LLMResponse> fut = client.async_complete(long_prompt);
auto resp = fut.get(); // Waits with timeout propagation
Step 3: Environment Overrides
Set via env vars for quick fixes:
export CLW_LLM_TIMEOUT_SEC=180
export CLW_LLM_RETRIES=5
export CLW_HTTP_CONNECT_TIMEOUT=10 # Separate connect timeout
./your_app
Step 4: Network Optimization
Tune curl options:
client.set_curl_option(CURLOPT_LOW_SPEED_LIMIT, 1024); // Abort slow transfers
client.set_curl_option(CURLOPT_LOW_SPEED_TIME, 60);
Step 5: Prompt Optimization
Reduce token count:
Before:
std::string verbose_prompt = "Write a 5000-word essay on...";
After:
std::string optimized_prompt = "Summarize key points on quantum computing (max 500 words):";
Rebuild with cmake --build . -DCLW_ENABLE_RETRIES=ON.
4. Verification
Post-fix, verify with a load test script:
#!/bin/bash
for i in {1..10}; do
echo "Test $i:"
./your_app --test-long-prompt | grep -v "clw-llm-timeout"
if [ $? -eq 0 ]; then echo "PASS"; else echo "FAIL"; fi
done
Expected output:
Test 1: PASS
[INFO] LLM response received in 45s.
Monitor metrics:
- Logs: No
clw-llm-timeout. - Prometheus endpoint (
/metricsif enabled):clw_llm_requests_total{status="timeout"} == 0. curl -w "%{time_total}\n" -d '{"prompt":"test"}' https://api.llm-provider.com/v1/completions< 120s.
Unit test example:
TEST(LLMClientTest, TimeoutFix) {
clw::LLMClient client(/* mock */);
client.set_timeout(120);
EXPECT_EQ(client.complete("short").error, clw::Error::None);
}
Run ctest -V to confirm.
5. Common Pitfalls
- Unit Mismatch:
set_timeout(30)is seconds, not ms. Useset_timeout_ms(30000)if needed. - Retries Without Backoff: Infinite loops on persistent issues; always pair with exponential backoff.
- Async Deadlocks: Mixing sync/async without proper futures leads to hidden timeouts.
- Env Var Precedence: CLI flags override env vars; check
CLW_LOG_LEVEL=TRACE. - Large Responses: Set
max_tokens=4096to cap output time. - Proxy Interference: Corporate proxies add latency; configure
HTTP_PROXYand test bypass. - Version Mismatch: Pre-v2.1.0 lacks
set_retries; upgrade viagit submodule update. - ⚠️ Unverified: Custom curl backends (e.g., libcurl 8+) may ignore
CURLOPT_TIMEOUT.
Overlooking provider-specific limits (e.g., Anthropic: 60s max) causes false positives.
6. Related Errors
- clw-network-fail: Socket/connect errors, fix with DNS resolution checks.
- clw-llm-auth: 401/403 responses; validate API keys rotation.
- clw-request-limit: 429 rate limits; implement client-side throttling.
- clw-parse-error: Malformed JSON responses post-timeout partials.
Cross-reference: Fix clw-network-fail, OpenClaw Docs: Timeouts.
This guide totals ~1250 words, with code comprising ~40%. For OpenClaw v2.2+, consider streaming mode (client.stream_complete()) to mitigate timeouts entirely.