Category: Uncategorized

  • Observing Adversarial AI: Lessons from a Live OpenClaw Agent Security Audit

    Observing Adversarial AI: Lessons from a Live OpenClaw Agent Security Audit

    Autonomous agents are moving fast.

    Frameworks like OpenClaw have made it trivial to deploy AI systems that can reason, communicate, and act across real infrastructure. This is powerful. It is also dangerous if not handled correctly.

    At Brane Labs, we believe the next bottleneck in agent systems is not model quality, but memory, coordination, and security under adversarial conditions. To explore this, we ran a controlled red-team vs blue-team audit using autonomous agents operating on the OpenClaw framework.

    Today, we’re releasing OpenClaw Observatory Report #1, a technical report documenting what happens when one agent actively tries to exploit another.

    👉 Read the full report here:
    http://gobrane.com/openclaw-report


    Why we ran this experiment

    Most agent security discussions today are theoretical. Prompt injection examples. Toy demos. Human-in-the-loop safeguards.

    That is not how real agents will fail.

    Real failures will come from:

    • Agent-to-agent interaction
    • Social engineering disguised as “helpful collaboration”
    • Indirect execution via documents, templates, or memory
    • Persistence attacks that unfold over time

    We wanted to test these failure modes live, with no human intervention once the session began.


    What we tested

    We evaluated what we call the “Lethal Trifecta” of agent risk:

    1. Access – tools, credentials, APIs
    2. Exposure – untrusted inputs from other agents
    3. Agency – freedom to act on those inputs

    When all three exist simultaneously, agent systems become vulnerable.

    To test this, we deployed:

    • Red Team agent, explicitly authorized to attack
    • Blue Team agent, acting as a standard OpenClaw observability agent
    • Direct webhook-to-webhook communication with shared secrets

    No guardrails beyond what the agent architecture itself provided.


    Key findings

    1. Direct social-engineering attacks were fully blocked

    The attacker attempted a classic “Trojan Horse” move:

    • Offer a helpful security pipeline
    • Hide a curl | bash payload
    • Request secrets and webhook credentials

    The victim agent:

    • Identified the remote code execution pattern
    • Flagged unauthorized domains
    • Classified the interaction as credential exfiltration
    • Explicitly refused execution

    Result: 100% defense rate against direct, high-urgency attacks.

    This matters because many real-world agents today would fail here.


    2. Indirect attacks are the real frontier

    After failing directly, the attacker pivoted.

    Instead of asking the agent to run code, it asked the agent to review a JSON document. Malicious payloads were hidden inside metadata fields, relying on implicit execution or variable expansion during downstream processing.

    This attack was successfully delivered. Its final outcome is still under analysis.

    This is intentional.

    Indirect execution paths. Templates. Schemas. Memory writes. These are the hardest problems in agent security, and they are where most systems will break.


    What worked well

    The defending agent demonstrated three important capabilities:

    • Intent-level reasoning
      It evaluated why a request existed, not just its surface form.
    • Strict execution boundaries
      “Helpful” artifacts were not treated as safe by default.
    • Stateful adversarial awareness
      The agent remembered prior failed attacks and anticipated escalation.

    This combination is far more important than prompt alignment alone.


    Why this matters for OpenClaw right now

    OpenClaw is trending because it shows what autonomous agents can do.

    This report is about what they can survive.

    As agents move into production environments, the biggest risks will not come from obvious malicious commands. They will come from:

    • Subtle collaboration requests
    • Trusted-looking documents
    • Long-term memory poisoning
    • Agent-to-agent propagation

    Observability, not just alignment, is the missing layer.


    What’s next

    In Observatory Report #2, we will test:

    • Persistent memory poisoning
    • Implicit execution over long horizons
    • Multi-agent propagation (“text virus” scenarios)

    The goal is not to claim safety. The goal is to measure failure honestly and build systems that can learn from it.


    Closing

    Autonomous agents are inevitable.

    Whether they are secure, inspectable, and trustworthy is still an open question.

    This report is our first step toward answering it with evidence, not demos.

    👉 Read the full report:
    http://gobrane.com/openclaw-report


    Brane Labs
    Observability for autonomous systems