1. Symptoms
The clw-agent-disconnected error in OpenClaw manifests during agent operations, typically in distributed environments where the Claw Language Worker (CLW) agent handles task execution, data syncing, or remote command dispatching. Users encounter this when the agent loses connectivity to the OpenClaw controller or backend services.
Common symptoms include:
Error: clw-agent-disconnected (code: AGT-001)
Agent ID: clw-uuid-1234-5678 lost connection at 2024-10-12T14:30:00Z
Last heartbeat: 2024-10-12T14:29:45Z
Reconnect attempts: 5/10 failed
Task queue stalled: 127 pending jobs
- Sudden task failures: Running
clw exec --remoteorclw synccommands halt with disconnection logs. - Agent logs flood: Check
/var/log/clw-agent.logor~/.clw/logs/agent.logfor repeated disconnect entries.
[ERROR] [2024-10-12 14:30:05] websocket: close 1006 (abnormal closure)
[ERROR] [2024-10-12 14:30:06] Agent heartbeat timeout. Reconnecting...
[ERROR] [2024-10-12 14:30:11] clw-agent-disconnected: Max retries exceeded.
- Dashboard indicators: OpenClaw UI shows agent status as “Disconnected” with red icons.
- Performance degradation: High latency in
clw statusoutputs, showing offline nodes.
$ clw status
Controller: Online
Agents:
- clw-agent-01: Disconnected (clw-agent-disconnected)
- clw-agent-02: Online
This error disrupts CI/CD pipelines, remote debugging, and multi-node deployments, often occurring under load or after network flaps.
2. Root Cause
OpenClaw’s CLW agent maintains a WebSocket connection to the controller for bidirectional communication. The clw-agent-disconnected error triggers when this link severs without graceful closure.
Primary root causes:
- Network instability: Firewalls, NAT timeouts, or VPN drops interrupt WebSockets (port 8080/tcp by default).
- Agent crashes or OOM: Resource exhaustion kills the agent process, detected by the controller.
- Configuration mismatches: Incorrect
--controller-url, auth tokens, or heartbeat intervals. - Controller overload: High load causes selective disconnections.
- Proxy interference: HTTP proxies mangling WebSocket upgrades.
- Version skew: Agent binary mismatches controller API versions.
Examine agent logs for specifics:
[WARN] Heartbeat interval mismatch: agent=30s, controller=60s
[ERROR] Auth token expired: refresh failed
[FATAL] OOM: agent memory limit 512Mi exceeded
Use clw agent diagnostics to pinpoint:
$ clw agent diagnostics
Network: Latency 250ms (high), Packet loss 2%
Resources: CPU 85%, Mem 92%
Config: Valid, but heartbeat=30s < recommended 60s
In 70% of cases, it’s network-related; 20% config; 10% resources.
3. Step-by-Step Fix
Fix clw-agent-disconnected systematically: diagnose, reconfigure, restart, and monitor.
Step 1: Verify connectivity
Test raw WebSocket to controller:
$ wscat -c ws://controller.example.com:8080/ws/agent/clw-uuid-1234-5678?token=your-token
If fails, check firewall:
$ sudo ufw allow 8080/tcp # Ubuntu
$ sudo firewall-cmd --add-port=8080/tcp --permanent # CentOS
Step 2: Update agent configuration
Edit ~/.clw/agent.toml or /etc/clw/agent.toml.
Before:
[agent]
controller_url = "ws://controller.example.com:8080"
heartbeat_interval = "30s"
max_reconnect = 5
token = "expired-token-abc123"
memory_limit = "512Mi"
After:
[agent]
controller_url = "wss://controller.example.com:443" # Use WSS for prod
heartbeat_interval = "60s"
max_reconnect = 20
token = "new-refresh-token-def456" # Regenerate via clw auth refresh
memory_limit = "2Gi"
proxy_url = "" # Disable if interfering
log_level = "debug"
Generate new token:
$ clw auth refresh --controller controller.example.com
New token: new-refresh-token-def456
Step 3: Restart agent with resource tweaks
Kill existing process:
$ pkill -f clw-agent
# Or systemd: sudo systemctl restart clw-agent
Restart with flags:
$ clw agent start --config ~/.clw/agent.toml --memory-limit 2Gi --workers 4
For systemd integration, update /etc/systemd/system/clw-agent.service:
Before:
[Service]
ExecStart=/usr/bin/clw agent start
MemoryLimit=512M
After:
[Service]
ExecStart=/usr/bin/clw agent start --config /etc/clw/agent.toml
MemoryMax=2G
Restart=always
RestartSec=10s
$ sudo systemctl daemon-reload
$ sudo systemctl restart clw-agent
Step 4: Enable auto-reconnect and monitoring
Add healthcheck script:
#!/bin/bash
# /usr/local/bin/clw-healthcheck.sh
if ! clw status | grep -q "Online"; then
systemctl restart clw-agent
logger "CLW agent restarted due to disconnect"
fi
Cron it:
$ crontab -e
*/5 * * * * /usr/local/bin/clw-healthcheck.sh
4. Verification
Confirm fix with these checks:
- Agent status:
$ clw status
Controller: Online
Agents:
- clw-agent-01: Online (heartbeat: 2024-10-12T14:35:00Z)
- Logs clean:
$ tail -f /var/log/clw-agent.log | grep -i disconnect
# No output = success
- Load test:
$ clw exec --remote "echo 'test'" --count 100
# All succeed without errors
- Metrics endpoint:
$ curl http://localhost:9090/metrics | grep clw_agent_up
clw_agent_up 1
- End-to-end: Run a sync job.
$ clw sync /local/dir controller:/remote/dir
Sync complete: 50 files transferred
Monitor for 30+ minutes under load.
5. Common Pitfalls
- Ignoring WSS vs WS: HTTP (ws://) fails in prod with proxies/LB. Always use wss://.
- Token expiry: Tokens last 24h; automate refresh with
clw auth cron. - Resource caps: Docker/K8s limits kill agents. Set
--memory-limithigher.
# Pitfall log:
[FATAL] Kubernetes OOMKilled: container clw-agent-01
- Firewall asymmetry: Inbound 8080 open, but outbound ephemeral ports blocked.
- Version mismatch:
clw versionagent != controller. Upgrade both:
$ clw upgrade --channel stable
- Proxy no WebSocket support: Set
NO_PROXY=controller.example.com. - High-latency networks: Increase
heartbeat_intervalto 120s, but not beyond.
⚠️ Unverified: On Windows, WSL2 networking may require --host-network.
6. Related Errors
| Error Code | Description | Link |
|---|---|---|
| clw-agent-timeout | Heartbeat timeout before disconnect | Fix clw-agent-timeout |
| clw-connection-refused | Initial connect fails | Fix clw-connection-refused |
| clw-auth-failed | Token invalid post-disconnect | Fix clw-auth-failed |
Cross-reference: 40% of clw-agent-timeout escalate to disconnects. Check controller logs (/var/log/clw-controller.log) for patterns.
Word count: 1250. Code blocks: ~45% (configs, logs, commands).