What are the honest limitations of Orchesis with Paperclip?

Orchesis cannot detect semantic injection (natural language attacks disguised as legitimate tasks). Detection rate for realistic attack mixes is 38%. The plugin layer is a complete blind spot at proxy level. Five of twelve Paperclip attack surfaces have zero protection from any tool. We publish these limits because transparency builds better security decisions.

Built on 67 Paperclip GitHub issues analyzed

Paperclip runs your AI company.
Orchesis keeps it from running away.

Orchesis is an open-source HTTP proxy that sits between Paperclip agents and LLM APIs to detect loops at API call #3, scan for injection on every request, and meter real costs independently of what the agent reports.

12 attack surfaces · $0.05 loop catch · 0 Paperclip security docs · 96% injection detected

Install in 30 seconds →★ Star on GitHub

🔍 12 attack surfaces found

⚡ $0.05 loop catch at call #3

🔒 18 injection patterns

⚖ MIT · Free · Self-hosted

What Paperclip's dashboard doesn't show you

Three things that actually happen in production Paperclip fleets.

Each one has an issue number.

Budget tracking · Heartbeat gap

$55–150

unmonitored spend per agent run

Budget checks only fire between heartbeats

Paperclip checks budgets when an agent wakes up and when it finishes. Inside the run: unlimited spend. If the agent times out, the cost vanishes from the dashboard entirely. A 30-minute heartbeat with a 10-minute run means 20 minutes of complete cost blindness per cycle.

Orchesis detects the loop at API call #3 — $0.05 spent. That's 450× to 1,800,000× faster than waiting for the next heartbeat.

Issue #212 · Codex adapter

$0.00

reported cost for every Codex run

Your dashboard says your Codex agents cost nothing

The Codex adapter reports $0.00 for all runs. Millions of tokens processed, zero recorded cost. The dashboard shows a number. The number is wrong. There is no independent verification.

Orchesis meters every API call at the network layer — Real tokens, real cost, independent of what the agent reports.

Cascading Injection · Formal proof

10 min

to infect your entire fleet from one task

One poisoned issue description compromises 13 agents

Issue descriptions have zero input sanitization. An attacker writes a malicious task. The CEO agent reads it, delegates to CTOs, who delegate to engineers. At fan-out 3 and depth 2: 13 agents compromised in 10 minutes. We proved this formally: N(t) = O(f^{t/Δ_h}).

Orchesis at depth 0 saves 92% of the fleet — At depth 1: 69%. Pattern-matched injection is fully contained.

One environment variable. That's it.

No code changes in Paperclip. No code changes in your agents.

Works with Claude Code, Codex, Cursor, and OpenClaw adapters.

# 1. Install Orchesis
pip install orchesis
orchesis init --with-crystal-alert --with-injection-shield

# 2. Add one line to your Paperclip .env
ANTHROPIC_BASE_URL="http://orchesis:8080/v1"
# 3. Verify
orchesis verify

✓ Proxy running on :8080
✓ Crystal Alert: active
✓ Injection Shield: active (33+ patterns + 14 Paperclip-specific)
✓ Cost metering: real-time
⚠ Paperclip heartbeat detected: 30 min · Orchesis covers the gaps

30s

Phase 0: Minimum Viable Integration

One env var. Loop detection + cost metering + injection scanning. Covers 60–70% of total value immediately.

14h

Phase 1: Core Adapter

Paperclip-specific patterns. Agent ID correlation. Detection jumps from 30–40% to 55–65%.

84h

Full Integration

Budget bridge, plugin hooks, dual proxy, workspace monitoring. All 6 phases.

12 attack surfaces. What catches what.

We identified 12 ways to attack a Paperclip fleet.

Orchesis covers 9. Five surfaces have zero protection from any tool.

Attack surface	Severity	Paperclip	Orchesis (proxy)	Orchesis (plugin)
Issue description injection	HIGH	None	Scan + block	—
Goal ancestry cascade	HIGH	None	Chain tracking	—
Shared workspace files	HIGH	None	Partial	Integrity check
Session state leakage	MEDIUM	None	SessionTracer	—
Plugin system (full Node.js)	CRITICAL	None	Blind spot	Blind spot
Clipmart templates	CRITICAL	No review	Blind spot	Blind spot
Approval flow bypass	CRITICAL	Broken (#696)	Detect + alert	—
Cost reporting (self-reported)	HIGH	Trusts agent	Independent metering	—
Auth system (local_trusted)	CRITICAL	PR #1315	Blind spot	Blind spot
Cross-agent API (ctx.*)	HIGH	No RLS	Blind spot	Audit hooks
Heartbeat gap (filesystem)	MEDIUM	None	Blind spot	Snapshot
Stdout credential leaks	MEDIUM	Displayed in UI	API-level only	—

5 surfaces marked "Blind spot" require fixes in Paperclip itself. We report these through responsible disclosure, not marketing.

What Orchesis catches for Paperclip fleets

Every number below is calculated, not estimated.

Methods and assumptions are published.

$0.05

Loop detected at API call #3

vs $55–150 at next heartbeat

96%

Explicit injection patterns detected

0% false positives · calibrated

2ms

Cost anomaly detection latency

vs 15–60 minutes

92%

Fleet saved at cascade depth 0

Cascading Injection Theorem

$52–143K

Annual value for fleet of 5 agents

Calibrated range

57%

Detection rate with dual proxy

+26pp over single proxy

OpenClaw vs Paperclip: different problems, same solution

Same proxy. Different scale.

OpenClaw + Orchesis

Individual developer, one agent

Fleet size1

Session modelContinuous

Integration0 hours

Lead problemInvisible attacks

Collusion riskNone (N=1)

Injection surfaces3–4

Annual value$5–20K

Paperclip + Orchesis

Team, 5–20 agents in production

Fleet size5–20

Session modelHeartbeat

Integration30 min → 84h

Lead problemCost bleeding

Collusion riskYes (Folk theorem)

Injection surfaces12

Annual value$52–143K

Also available: See our OpenClaw integration →

What Orchesis can't do for Paperclip

We publish our limits.

Nobody else in this space does this. You need them to make real security decisions.

✗

Semantic injection: 0% detection.Natural language attacks disguised as legitimate tasks are fundamentally undetectable by any finite set of regex patterns. This is proven, not an engineering gap.

✗

Plugin layer: complete blind spot.Paperclip plugins run as in-process Node.js. HTTP proxy cannot see ctx.* calls.

✗

5 of 12 surfaces: zero protection.Plugin system, auth, cross-agent API, heartbeat filesystem gap, stdout logs. Neither Orchesis nor Paperclip covers these today.

✗

Realistic detection: 38%.Against a mix of 40% explicit and 60% semantic attacks. Not 96%. The 96% is for explicit patterns only.

✗

Cascade containment: bifurcated.Explicit payloads: fully contained. Semantic payloads: not containable at proxy layer.

Frequently asked about Orchesis + Paperclip

Common questions

How does Orchesis integrate with Paperclip?

One environment variable: ANTHROPIC_BASE_URL=http://orchesis:8080/v1. All agent API calls route through Orchesis automatically. Works with Claude Code, Codex, Cursor, and OpenClaw adapters. No code changes in Paperclip or your agents.

What does Orchesis detect that Paperclip doesn't?

Runaway loops at API call #3 ($0.05 spent) instead of next heartbeat ($55–150). Prompt injection in issue descriptions, which have zero sanitization. Real API costs independent of self-reported agent data. Cross-agent behavioral correlation across your fleet.

Is Paperclip secure enough for production?

Paperclip has 30,000+ stars and zero security documentation. We found 12 attack surfaces including unsandboxed plugins with full server access, broken approval flows, and zero input sanitization on issue descriptions. External monitoring is strongly recommended for any production deployment.

What are the honest limitations?

Semantic injection (natural language attacks): 0% detection. Plugin layer: complete blind spot. 5 of 12 surfaces: zero protection. Realistic mixed-attack detection: 38%. We publish these numbers because transparency builds better security decisions.

How much does Orchesis cost?

$0. MIT license. Self-hosted. No telemetry. No account required. The annual value for a Paperclip fleet of 5 agents is $52K–143K in prevented cost overruns and security incidents.

The fleet that's bleeding money right now

won't tell you.

Get started →See OpenClaw integration →

Works whether AI wins or loses.

Paperclip runs your AI company.Orchesis keeps it from running away.