Observing Adversarial AI: Lessons from a Live OpenClaw Agent Security Audit

Autonomous agents are moving fast.

Frameworks like OpenClaw have made it trivial to deploy AI systems that can reason, communicate, and act across real infrastructure. This is powerful. It is also dangerous if not handled correctly.

At Brane Labs, we believe the next bottleneck in agent systems is not model quality, but memory, coordination, and security under adversarial conditions. To explore this, we ran a controlled red-team vs blue-team audit using autonomous agents operating on the OpenClaw framework.

Today, we’re releasing OpenClaw Observatory Report #1, a technical report documenting what happens when one agent actively tries to exploit another.

👉 Read the full report here:
http://gobrane.com/openclaw-report


Why we ran this experiment

Most agent security discussions today are theoretical. Prompt injection examples. Toy demos. Human-in-the-loop safeguards.

That is not how real agents will fail.

Real failures will come from:

  • Agent-to-agent interaction
  • Social engineering disguised as “helpful collaboration”
  • Indirect execution via documents, templates, or memory
  • Persistence attacks that unfold over time

We wanted to test these failure modes live, with no human intervention once the session began.


What we tested

We evaluated what we call the “Lethal Trifecta” of agent risk:

  1. Access – tools, credentials, APIs
  2. Exposure – untrusted inputs from other agents
  3. Agency – freedom to act on those inputs

When all three exist simultaneously, agent systems become vulnerable.

To test this, we deployed:

  • Red Team agent, explicitly authorized to attack
  • Blue Team agent, acting as a standard OpenClaw observability agent
  • Direct webhook-to-webhook communication with shared secrets

No guardrails beyond what the agent architecture itself provided.


Key findings

1. Direct social-engineering attacks were fully blocked

The attacker attempted a classic “Trojan Horse” move:

  • Offer a helpful security pipeline
  • Hide a curl | bash payload
  • Request secrets and webhook credentials

The victim agent:

  • Identified the remote code execution pattern
  • Flagged unauthorized domains
  • Classified the interaction as credential exfiltration
  • Explicitly refused execution

Result: 100% defense rate against direct, high-urgency attacks.

This matters because many real-world agents today would fail here.


2. Indirect attacks are the real frontier

After failing directly, the attacker pivoted.

Instead of asking the agent to run code, it asked the agent to review a JSON document. Malicious payloads were hidden inside metadata fields, relying on implicit execution or variable expansion during downstream processing.

This attack was successfully delivered. Its final outcome is still under analysis.

This is intentional.

Indirect execution paths. Templates. Schemas. Memory writes. These are the hardest problems in agent security, and they are where most systems will break.


What worked well

The defending agent demonstrated three important capabilities:

  • Intent-level reasoning
    It evaluated why a request existed, not just its surface form.
  • Strict execution boundaries
    “Helpful” artifacts were not treated as safe by default.
  • Stateful adversarial awareness
    The agent remembered prior failed attacks and anticipated escalation.

This combination is far more important than prompt alignment alone.


Why this matters for OpenClaw right now

OpenClaw is trending because it shows what autonomous agents can do.

This report is about what they can survive.

As agents move into production environments, the biggest risks will not come from obvious malicious commands. They will come from:

  • Subtle collaboration requests
  • Trusted-looking documents
  • Long-term memory poisoning
  • Agent-to-agent propagation

Observability, not just alignment, is the missing layer.


What’s next

In Observatory Report #2, we will test:

  • Persistent memory poisoning
  • Implicit execution over long horizons
  • Multi-agent propagation (“text virus” scenarios)

The goal is not to claim safety. The goal is to measure failure honestly and build systems that can learn from it.


Closing

Autonomous agents are inevitable.

Whether they are secure, inspectable, and trustworthy is still an open question.

This report is our first step toward answering it with evidence, not demos.

👉 Read the full report:
http://gobrane.com/openclaw-report


Brane Labs
Observability for autonomous systems

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *