Cloudforce One Research Reveals How Adversarial Deception Is Evolving to Target AI Reasoning Systems

Dubai, UAE, 7th May 2026 – Today, Cloudforce One released a new report stemming from a massive study conducted across seven leading AI models. The team took a look at both Frontier and Non-Frontier models to see how their reasoning is, and can be bypassed by threat actors. 

The research found thatattackers are now using lures – blocks of text designed to emotionally manipulate or confuse AI models – to trick security auditors into white-listing malicious code. This research is a technical reality check. As organizations shift to lean heavily on autonomous systems, and LLMs, the perimeter is changing. The attack surface has expanded beyond the network, with a massive target now shifting to the model’s reasoning itself… so what happens if the models that run critical parts of your business are tampered with?

Some High-Level Takeaways:

  • The 1% bypass zone: Subtle deception is most effective. When safety lures—i.e., comments claiming the code is benign—make up less than 1% of a file, AI detection rates plummeted to 53%. In this case, the lures subtly nudge model reasoning without triggering the protesting too much suspicion.
  • The U-curve of deception: Moderate attempts to trick AI often work, but protesting too much (over 1,000 comments) triggers a repetition alarm that causes the AI to flag the code as fraudulent.
  • The context trap: The greatest threat isn’t linguistic, it’s structural. By burying malicious payloads inside large library bundles (like React SDKs), attackers crashed detection rates to just 12%, effectively exhausting the AI’s focus.
  • Linguistic profiling: The study found that AI models have developed stereotypes. For example, some models flagged Russian or Chinese comments as high-risk signals regardless of the code’s actual function, while being more trusting of languages like Estonian.
  • AI reasoning as an attack surface: Threat actors are increasingly focusing on manipulating model cognition rather than breaking traditional security controls.
  • Structural obfuscation is highly effective: Embedding malicious code within large, legitimate-looking software packages significantly reduces detection rates.
  • Bias in model interpretation: Language-based heuristics may introduce unintended bias, affecting security decisions in inconsistent ways.
  • Scaling complexity increases vulnerability: Larger and more complex code contexts reduce the ability of models to accurately identify malicious intent.

Industry Implications

As enterprises accelerate adoption of AI-driven security, automation, and development pipelines, the findings suggest a pressing need to reassess how trust is established in AI-generated decisions.

Cloudforce One emphasizes that organizations must move beyond traditional prompt safety approaches and adopt more robust model evaluation, adversarial testing, and context-aware security frameworks.

Cloudforce One is the company’s dedicated threat intelligence and research team focused on tracking advanced cyber threats, emerging attacker techniques, and security risks impacting global digital infrastructure.

The full report can be found here.

-Ends-

About Cloudflare 

Cloudflare, Inc. (www.cloudflare.com / @cloudflare) is on a mission to help build a better Internet. Cloudflare’s suite of products protect and accelerate any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare have all web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks. Cloudflare was named to Entrepreneur Magazine’s Top Company Cultures 2018 list and ranked among the World’s Most Innovative Companies by Fast Company in 2019. Headquartered in San Francisco, CA, Cloudflare has offices in Austin, TX, Champaign, IL, New York, NY, San Jose, CA, Seattle, WA, Washington, D.C., Toronto, Lisbon, London, Munich, Paris, Beijing, Singapore, Sydney, and Tokyo.