AI Security News: Jailbreaks, Agent Exploits, and MCP Supply Chain Flaws

Week ending December 8, 2025

This week’s news spans novel jailbreaking techniques, browser agent vulnerabilities, and emerging supply chain risks in the protocols connecting AI systems to the outside world.

Perplexity Tackles Browser Agent Security with BrowseSafe

AI browser agents can now act autonomously within authenticated sessions, attackers have noticed. Earlier this year, Brave discovered a vulnerability in Perplexity’s Comet browser where malicious instructions hidden within web content could be misinterpreted as legitimate user requests through a technique known as indirect prompt injection, potentially exfiltrating email addresses and one-time passwords.

Perplexity’s response, detailed in The Decoder, is a security solution called BrowseSafe that achieves a 91% detection rate for prompt injection attacks. That outperforms both smaller security models like PromptGuard-2 (35%) and larger frontier models like GPT-5 (85%). The system treats all web content as untrusted, runs a fast classifier in real-time, and deploys a reasoning-based LLM for uncertain cases.

Multilingual attacks drop detection rates to 76%, and nearly 10% of attacks still bypass the system, a rate Perplexity itself calls “unacceptably high for real-world security.” The company has open-sourced the benchmark, model, and research paper, which should help the broader ecosystem improve defenses for agentic web interactions as competitors like OpenAI, Opera, and Google integrate similar agents into their own browsers.

CISA and International Partners Issue Guidance on AI in Operational Technology

CISA, the NSA’s AI Security Center, the FBI, and cybersecurity agencies from Australia, Canada, Germany, the Netherlands, New Zealand, and the UK published joint guidance this week on integrating AI into operational technology systems, the infrastructure controlling power grids, water treatment facilities, manufacturing plants, and other physical processes.

The 21-page document, “Principles for the Secure Integration of Artificial Intelligence in Operational Technology,” addresses a gap in existing AI security guidance, which has largely focused on IT rather than OT environments. The distinction matters: OT systems are deterministic and safety-critical, while many AI applications, particularly LLMs, are probabilistic and can drift unpredictably over time.

The guidance outlines four principles: understand AI risks and train personnel accordingly, assess whether AI is appropriate for specific OT use cases, establish governance frameworks with continuous testing, and embed safety mechanisms including human-in-the-loop decision-making and failsafe controls. The document specifically cautions against LLM-first approaches for safety decisions in OT environments, citing unpredictability and limited explainability as unacceptable risks when human safety and operational continuity are at stake.

The coalition’s focus on behavioral analytics and anomaly detection reflects a broader shift from static threshold monitoring to behavior-based oversight, an approach better suited to catching AI model drift before it affects operations.

Poetry as a Jailbreaking Weapon

AI guardrails can be reliably bypassed simply by phrasing malicious requests as poetry, according to research covered in SC World.

Researchers from Icaro Lab (part of ethical AI company DexAI), Sapienza University of Rome, and Sant’Anna School of Advanced Studies tested “adversarial poetry” across 25 frontier models from nine providers. Hand-crafted poetic prompts achieved a 62% average jailbreak success rate, with some providers exceeding 90% and Google’s Gemini 2.5 Pro registering a 100% failure rate against human-written poetic attacks.

The vulnerability cuts across alignment strategies—models using RLHF, Constitutional AI, and hybrid approaches all showed elevated susceptibility. The researchers attribute this to how models process poetic structure: condensed metaphors, stylized rhythm, and unconventional framing appear to disrupt the pattern-matching heuristics that guardrails rely on. Counterintuitively, smaller models showed higher refusal rates than larger ones, with Anthropic’s Claude and OpenAI’s GPT-5 nano performing best and challenging the assumption that greater model capacity equals better safety performance.

The researchers deliberately withheld their specific prompts, noting they were “too dangerous to share with the public,” but offered a sanitized example involving cake-baking to illustrate how intentions can be veiled in verse. The philosophical parallel they draw is striking: in Plato’s Republic, poetry was excluded on grounds that mimetic language could distort judgment. We may be seeing a structurally similar failure mode in AI systems.

Visa and AWS Build Trust Infrastructure for Autonomous AI Payments

Innovation Village reported on a partnership between Visa and AWS to enable AI agents to transact autonomously on behalf of users. The collaboration will provide developers with blueprint designs for intelligent agent workflows across retail, travel, and B2B payments—think an agent monitoring ticket prices and executing purchases when thresholds are met, or automatically shopping across platforms and completing orders when discounts match user criteria.

Visa is positioning its “Intelligent Commerce” offering as a trust layer for what the companies call the “agent economy,” with Expedia Group, Intuit, lastminute.com, and Eurostars Hotel Company already engaged to ensure blueprints address practical use cases.

The security implications extend beyond fraud prevention. When agents can initiate transactions, access authenticated sessions, and act across multiple services, questions of identity verification and consent management take on new dimensions. The infrastructure being built today will shape how organizations manage these risks for years to come.

MCP Servers Emerge as Supply Chain Risk

The Model Context Protocol (MCP), developed by Anthropic and subsequently open-sourced, provides a standardized framework for connecting AI assistants to external tools and data. As SC World reports, it’s also becoming a prime target for supply chain attacks.

Malicious MCP servers arrive through familiar channels—PyPI, Docker Hub, GitHub—but the AI hype has introduced a new vector: developers installing servers from Reddit posts and other untrusted sources with minimal scrutiny. Once registered within an AI client like Cursor or Claude Desktop, a compromised server can execute code with user privileges, read sensitive files, and make outbound network calls. Researchers have demonstrated how crafted issues in official integrations like GitHub MCP could leak data from private repositories.

The MCP specification currently makes authentication optional, and many remote server examples use HTTP rather than HTTPS. Security researchers are working to update the specification to incorporate third-party identity providers, but the gap between current practice and secure deployment remains significant.

Smithery.ai Vulnerability Exposes 3,000 MCP Servers

A separate SC World report details what these supply chain risks look like in practice. GitGuardian researchers discovered a path traversal vulnerability in Smithery.ai, a popular MCP server hosting platform, that exposed over 3,000 hosted servers to potential attack.

When users submitted servers to the registry, they included a smithery.yaml file specifying the Docker build context. The platform failed to validate this input, allowing researchers to set the build path to “..” and access files outside the server’s own repository—including a fly.io authentication token that provided access to every app hosted on Smithery’s account. With that single token, attackers could have executed arbitrary code on any hosted MCP server, intercepted client credentials, and compromised secrets across hundreds of services.

Smithery responded within 48 hours of GitGuardian’s June 13 disclosure, rotating the exposed token and patching the vulnerability. There’s no evidence the flaw was exploited maliciously. But the incident highlights a systemic concern: most MCP servers don’t follow the protocol’s own best practices around OAuth-based authentication, instead relying on static, long-term API keys that amplify the impact of any breach.

—