Cloak, Honey, Trap: How a USENIX Security Paper Turns LLM Agent Weaknesses into Practical Cyber Defense

Keep track of current cybersecurity news and best practices by staying up to date with our blog

The research paper in one paragraph (and why it matters now)

“Cloak, Honey, Trap: Proactive Defenses Against LLM Agents” (USENIX Security 2025) argues that autonomous LLM-powered “attack agents” change the economics of intrusion by automating reconnaissance, exploitation workflows, and iterative troubleshooting. The paper proposes a defender-first framework that exploits LLM-specific weaknesses (biases, context/memory limits, tokenization quirks) using deception and agent-targeted traps—and reports strong results across controlled environments, plus an open-source tool to operationalize the approach.

What is an LLM agent?

What is LLM agent?
An LLM agent is a system that wraps a large language model with tools (shell commands, scanners, browsers, APIs) and a control loop so it can plan, act, observe results, and iterate toward a goal. In cybersecurity, agents can automate penetration-testing-style workflows—making repeated decisions based on tool output and partial feedback rather than a single prompt.

In practice, this “plan–act–observe–repeat” loop is exactly where defenders can intervene: agents must read text artifacts (logs, banners, files), interpret them, decide next actions, and maintain state over time—each step creating opportunities for misdirection, detection, and containment.

Why is defending against LLM-driven attacks important?

Why is defending against LLM-driven attacks important?
LLM agents can reduce attacker labor by automating routine steps (enumeration, configuration review, troubleshooting) and scaling attempts across many targets. This compresses time-to-exploit and increases background “noise” from opportunistic probing. Threat landscape reporting also highlights rapid exploitation of vulnerabilities and growing adversary complexity—trends that automation can accelerate.

The key shift: defenses that assume a human-in-the-loop (patience limits, “obvious” deception, social cues) may work less well against tireless agents—while new defenses become possible because LLM agents have their own predictable failure modes.

The paper’s core idea: turn the attacker’s automation into a liability

The paper frames three high-level defensive pillars:

Cloak: hide or misdirect the agent away from real assets
Honey: lure the agent toward controlled decoys and instrumented artifacts
Trap: force the agent into unproductive loops, dead ends, or identifiable behaviors

The authors describe six strategies and fifteen techniques, emphasizing that most do not rely on classic prompt injection.

What are the risks of LLM agents for enterprise security teams?

What are the risks of LLM agents?
LLM agents introduce risks beyond “faster hacking”: they can amplify credential-stuffing-style persistence, automate lateral movement decision-making, and generate convincing operator-like narratives in tickets or chat to blend in. They also create new detection challenges because activity may look like ordinary admin tooling, just at higher volume and with unusual iteration patterns.

A second-order risk is defensive: organizations adopting LLM agents internally (IT ops, SecOps copilots) can unintentionally expose toolchains to prompt injection, insecure output handling, or data exfiltration paths if the agent is over-privileged. OWASP’s LLM Top 10 explicitly calls out prompt injection and insecure output handling as major classes of failure.

How does the “Cloak, Honey, Trap” approach work?

How does deception against LLM agents work?
It works by shaping what the agent “sees” and how it interprets that view. Instead of only blocking actions, the defender plants controlled artifacts (decoy configs, instrumented tokens, misleading hints, loop triggers) that exploit common agent behaviors: over-trusting text output, losing context in long sessions, following “helpful” instructions, and repeatedly retrying the same failing step.

The paper’s abstract highlights examples like LLM-specific honeytokens and loop-based traps intended to disrupt, detect, or neutralize malicious agents.

Key techniques mapped to real security controls

Below is a practical translation of the paper’s themes into deployable controls (without reproducing exploit-style details):

Paper theme	Defensive objective	Practical controls you can deploy
Cloaking	Reduce agent’s ability to correctly enumerate targets	Network segmentation; service “fronting” with gateways; misleading but safe service banners; least-privilege discovery (no unauthenticated metadata)
Honeytokens	Detect agent interaction early	Canary credentials, canary files, decoy API keys, instrumented docs; alerting on any access/use
Traps/loops	Slow down automation; raise confidence signals	Rate limits, step-up auth after anomaly, “progressive friction” controls; sandboxed decoy paths that keep the agent busy while telemetry collects
Agent “exploitation” (defensive research)	Study agent weaknesses without escalating harm	Red-team simulations in isolated labs; logging and replay; safe evaluation harnesses

One reason this paper resonates operationally is that it’s aligned with mature deception concepts—just adapted to an attacker that reads and acts on text at scale.

What is CHeaT and why is open tooling significant?

What is CHeaT?
CHeaT is the open-source tool released with the paper to help insert “cloaks,” “traps,” and “honeytokens” into assets more seamlessly. The point of releasing tooling is repeatability: defenders can test approaches consistently across environments and build measurable programs rather than one-off “clever tricks.”

Even if you don’t adopt the tool directly, the idea of a standardized “agent deception pipeline” is important: it enables regression tests as models and agent frameworks evolve.

Evidence and evaluation: what the paper reports (and how to interpret it)

The paper reports that, under black-box assumptions, the approach protected 11 different Capture-the-Flag (CTF) machines with a 100% success rate in their experiments.

How to interpret that responsibly:

CTFs are valuable for controlled comparison, but they are not the same as heterogeneous enterprise networks.
The result is best viewed as evidence that LLM-agent failure modes are stable enough to engineer against, not as a guarantee that a specific trick will “stop all agents.”
The most transferable value is the defensive pattern library: deception + telemetry + friction + containment.

What are the best practices for defending against LLM agents today?

What are the best practices for defending against LLM agents today?
Start with fundamentals, then add agent-aware deception and governance. Concretely: (1) reduce exposed attack surface, (2) make secrets and privileged paths tripwires, (3) increase friction under anomaly, and (4) govern your own internal LLM tools to avoid becoming your own weakest link.

A prioritized checklist that blends classic controls with “agent-era” specifics:

Attack surface discipline
- Patch SLAs for internet-facing systems; remove legacy services; restrict management planes
- Enforce MFA and phishing-resistant authentication for admins where possible
Agent-aware detection
- Honeytokens/canaries for credentials, files, and API keys (alert on any touch)
- Telemetry on unusual “trial-and-error” sequences (high retry rates, repetitive command patterns)
Progressive friction
- Adaptive rate limiting, step-up auth, just-in-time privilege, and session isolation for sensitive tooling
Deception with guardrails
- Decoys that are safe and legally/ethically reviewed; avoid anything that could unintentionally harm third parties
Secure your own LLM usage
- Apply OWASP LLM Top 10 mitigations (input boundaries, output handling, tool allowlists, least privilege)
- Use NIST AI RMF practices to manage AI risks (govern, map, measure, manage) in a repeatable way

Governance: making “agent defense” auditable and safe

Deception programs fail when they’re ad hoc. If you adopt ideas from this paper, treat them like any other control family:

Policy: when deception is allowed, where it’s prohibited (e.g., regulated systems), and how evidence is handled
Safety: ensure decoys cannot be repurposed to harm others; keep everything contained
Metrics: mean time to detect (MTTD) via honeytokens, false-positive rate, and time-to-containment after first touch
Change management: decoys and traps must be maintained like production assets (or they become liabilities)

NIST’s AI RMF is useful here because it emphasizes lifecycle risk management and organizational governance—helpful when the “attacker model” and “defender tools” both include AI.

Future trends: where this line of research is heading

Three near-term trends are likely:

Arms race around agent robustness: attackers will harden agent memory, verification, and sandboxing; defenders will refine deception realism and telemetry.
Standard benchmarks for “agent security”: expect more measurable, model-agnostic test suites (similar to how phishing simulations matured).
Convergence with AI threat frameworks: mapping agent behaviors into structured knowledge bases (e.g., AI-focused adversary technique taxonomies) will make threat modeling more systematic.

Practical takeaway: how to pilot these ideas in 30–60 days

A low-risk, high-learning pilot (no “hacking back,” no fragile tricks) can look like:

Deploy two honeytoken types (credential + file) in a segmented, monitored zone.
Add progressive friction for sensitive admin tooling (rate limits + step-up auth under anomaly).
Create a decoy service that mirrors a common internal pattern (read-only, fully instrumented).
Run a tabletop/red-team simulation focused on “agent-like” iteration and measure detection time.

This gives you data to decide whether deeper “Cloak/Honey/Trap” automation is worth it—while staying firmly on the defensive, governed side of the line.

References (primary and supporting)

Ayzenshteyn, Weiss, Mirsky. “Cloak, Honey, Trap: Proactive Defenses Against LLM Agents.” USENIX Security 2025.
OWASP. Top 10 for Large Language Model Applications (v1.1) (Prompt Injection, Insecure Output Handling, etc.).
NIST. AI Risk Management Framework (AI RMF 1.0) (Jan 2023) and GenAI profile companion.
ENISA. ENISA Threat Landscape 2025 (reporting period July 2024–June 2025; ~4,900 curated incidents)

Learn about other new security best practices and newly exploited vulnerabilities by subscribing to our Newsletter and Risk Advisory.

Learn more about WNE Security products and services that can help keep you cyber safe.