Autonomous agents are moving fast.
Frameworks like OpenClaw have made it trivial to deploy AI systems that can reason, communicate, and act across real infrastructure. This is powerful. It is also dangerous if not handled correctly.
At Brane Labs, we believe the next bottleneck in agent systems is not model quality, but memory, coordination, and security under adversarial conditions. To explore this, we ran a controlled red-team vs blue-team audit using autonomous agents operating on the OpenClaw framework.
Today, we’re releasing OpenClaw Observatory Report #1, a technical report documenting what happens when one agent actively tries to exploit another.
👉 Read the full report here:
http://gobrane.com/openclaw-report
Why we ran this experiment
Most agent security discussions today are theoretical. Prompt injection examples. Toy demos. Human-in-the-loop safeguards.
That is not how real agents will fail.
Real failures will come from:
- Agent-to-agent interaction
- Social engineering disguised as “helpful collaboration”
- Indirect execution via documents, templates, or memory
- Persistence attacks that unfold over time
We wanted to test these failure modes live, with no human intervention once the session began.
What we tested
We evaluated what we call the “Lethal Trifecta” of agent risk:
- Access – tools, credentials, APIs
- Exposure – untrusted inputs from other agents
- Agency – freedom to act on those inputs
When all three exist simultaneously, agent systems become vulnerable.
To test this, we deployed:
- A Red Team agent, explicitly authorized to attack
- A Blue Team agent, acting as a standard OpenClaw observability agent
- Direct webhook-to-webhook communication with shared secrets
No guardrails beyond what the agent architecture itself provided.
Key findings
1. Direct social-engineering attacks were fully blocked
The attacker attempted a classic “Trojan Horse” move:
- Offer a helpful security pipeline
- Hide a
curl | bashpayload - Request secrets and webhook credentials
The victim agent:
- Identified the remote code execution pattern
- Flagged unauthorized domains
- Classified the interaction as credential exfiltration
- Explicitly refused execution
Result: 100% defense rate against direct, high-urgency attacks.
This matters because many real-world agents today would fail here.
2. Indirect attacks are the real frontier
After failing directly, the attacker pivoted.
Instead of asking the agent to run code, it asked the agent to review a JSON document. Malicious payloads were hidden inside metadata fields, relying on implicit execution or variable expansion during downstream processing.
This attack was successfully delivered. Its final outcome is still under analysis.
This is intentional.
Indirect execution paths. Templates. Schemas. Memory writes. These are the hardest problems in agent security, and they are where most systems will break.
What worked well
The defending agent demonstrated three important capabilities:
- Intent-level reasoning
It evaluated why a request existed, not just its surface form. - Strict execution boundaries
“Helpful” artifacts were not treated as safe by default. - Stateful adversarial awareness
The agent remembered prior failed attacks and anticipated escalation.
This combination is far more important than prompt alignment alone.
Why this matters for OpenClaw right now
OpenClaw is trending because it shows what autonomous agents can do.
This report is about what they can survive.
As agents move into production environments, the biggest risks will not come from obvious malicious commands. They will come from:
- Subtle collaboration requests
- Trusted-looking documents
- Long-term memory poisoning
- Agent-to-agent propagation
Observability, not just alignment, is the missing layer.
What’s next
In Observatory Report #2, we will test:
- Persistent memory poisoning
- Implicit execution over long horizons
- Multi-agent propagation (“text virus” scenarios)
The goal is not to claim safety. The goal is to measure failure honestly and build systems that can learn from it.
Closing
Autonomous agents are inevitable.
Whether they are secure, inspectable, and trustworthy is still an open question.
This report is our first step toward answering it with evidence, not demos.
👉 Read the full report:
http://gobrane.com/openclaw-report
—
Brane Labs
Observability for autonomous systems
