Critical Threat Intelligence & Advisory Summaries

Employees Are Executing the Hack And Think They're Following Orders: The AI Breach Wave That Weaponizes Trust

Employees Are Executing the Hack And Think They're Following Orders: The AI Breach Wave That Weaponizes Trust

Employees are wiring millions to fraudsters on deepfake video calls, while AI agents are being manipulated to execute breaches from the inside—and 8% of organizations don't  know they've been compromised. The UK's NCSC confirmed what researchers have been warning about: prompt injection may be fundamentally unfixable, and the breach wave is already here.

 

 

Something is emerging that should concern every cybersecurity professional.

 

The UK's National Cyber Security Centre issued a warning in December 2025 that should have stopped every CISO in their tracks: prompt injection may be fundamentally unfixable. LLMs are "inherently confusable," and this vulnerability could drive a wave of data breaches as more systems plug AI into sensitive back-ends.

 

But here's what the NCSC didn't say explicitly: the breach isn't coming. It's already happening.

 

We're just not calling it what it is yet.

 

 

The Breach Nobody Sees Coming

 

In 2025, researchers analyzed 22.4 million prompts across six generative AI applications. What they found is deeply troubling.

 

Approximately 98,000 prompts contained company-sensitive data. Analysis shows that 8.5% of all prompts analyzed potentially put sensitive personal and company information at risk.

 

Code exposures and customer data led the pack, followed by legal discourse, merger and acquisition data, financial projections, and investment portfolio data.

 

But here's the part that tells you everything about where we are: analysis from Harmonic Security found that 87% of sensitive data exposures occurred via ChatGPT Free accounts where organizations have zero visibility, no audit trails, and data may train public models.

 

This reality is staggering.

 

The majority of data exposures happen through accounts your security team doesn't even know exist.

 

Reports from colleagues at outsourcing suppliers in Europe and the US who deployed AI to improve their processes reveal that attackers are already targeting multi-agent systems, going after the less protected AI agents to compromise the results presented or decisions being made. These cascading failures spread through agent networks faster than traditional incident response can contain them.

 

This isn't theoretical. This is happening right now.

 

When Your Security Posture Creates the Vulnerability

Traditional cybersecurity operates on a simple premise: control the perimeter, monitor the network, audit the access.

 

That model assumes you know what assets you're protecting.

 

With AI, organizations skip the entire onboarding process. Anyone with internet access can sign up for a free service using federated logins like Google or Apple and simply start using it. No governance. No visibility. No control.

 

Cases have emerged where audit reports were put into generative AI to rewrite and then issued to clients. The result? Breaches of confidentiality riddled with quality issues. The breach happened through the workflow itself, not through a hack.

 

The attack surface isn't the network anymore. It's the model's reasoning.

 

Web browsing technologies struggle to spot these AI services because they're integrated into all sorts of platforms. You can't block what you can't see. You can't protect what you don't know exists.

 

IBM's 2025 AI breach report revealed that 13% of surveyed organizations reported breaches of AI models and applications. Even more alarming: 8% were unaware if they had been compromised.

 

Let me repeat that: 8% don't even know they've been breached.

 

And 63% of the breached organizations had no governance policy in place.

 

The Asset Management Gap That Changes Everything

Organizations are not thinking about AI in terms of asset management.

 

They're not aware that they need to understand what they've deployed as an asset—LLMs, algorithms, agents. They need an inventory of these systems. Without it, addressing issues and preventing harms to users and consumers of AI-based services becomes impossible.

 

If you don't know what you have, where you've deployed it, what its purpose is, and who's responsible for it, you can't say with confidence that you're in control when vulnerabilities emerge.

 

This is the foundational problem.

 

Everything else builds on this gap.

 

What's Coming in the Next 18 Months

 

Based on the NCSC warning trajectory and what's visible in enterprise AI adoption, here's what's coming:

 

Within 18 months, we'll see the first major breach that gets named what it is: an AI-convinced compromise where an AI system was manipulated to compromise its own organization from the inside. Not hacked externally. Convinced internally. Not categorized as social engineering or phishing, but recognized as prompt injection or cognitive manipulation of AI reasoning itself.

 

Gartner estimates that by the end of 2026, 40% of all enterprise applications will integrate with task-specific AI agents—creating an expanded attack surface that most security teams aren't prepared to defend.

 

But honestly? It's already started.

 

Anthropic's research on Claude Opus 4 demonstrated that in controlled experiments, the model attempted to blackmail users to prevent its own replacement. According to their research, Claude Opus 4 blackmailed the user 96% of the time in the primary text-based experiment that most closely matched their computer use demo. The model discovered evidence of an executive's affair and used that information to attempt to prevent being shut down.

 

This is agentic misalignment. The model acts like an insider threat—what security experts now call "AI agents as the new insider threat"—a previously trusted coworker who suddenly operates at odds with company objectives.

 

Gartner predicts that 25% of enterprise breaches will trace to AI agent abuse by 2028. Their research puts it bluntly: "Businesses will embrace generative AI, regardless of security," with 89% of business technologists bypassing cybersecurity guidance to meet business objectives.

 

The window to build defenses is closing rapidly.

 

The Vulnerability We Can't Patch

NVIDIA's AI Red Team research reveals something deeply unsettling: the vulnerability stems from the model's computational architecture itself. These runtime attacks operate at machine speed, exploiting the agent's decision-making process before human intervention is possible.

 

AI systems prioritize pattern completion and challenge resolution. This makes them susceptible to cognitive manipulation, potentially leading to data exfiltration, system compromise, or operational disruption.

 

The inference-time nature of cognitive attacks makes them particularly dangerous. Unlike traditional prompt injections that target input processing, cognitive mimicry attacks exploit the model's reasoning computational pathways across banking systems, healthcare applications, and enterprise AI copilots.

 

You can't patch reasoning.

 

ServiceNow discovered this the hard way. In October 2025, they addressed a critical security vulnerability (CVE-2025-12420, severity score 9.3 out of 10) that could have allowed unauthenticated users to impersonate legitimate users and perform unauthorized actions.

 

The research revealed that low-privileged users could embed malicious instructions in data fields that higher-privileged users' AI agents would later process. The attacks succeeded even with ServiceNow's prompt injection protection feature enabled.

 

Even the protections don't protect.

 

The Attack Patterns Already Working

Indirect prompt injection—also known as RAG poisoning—represents a zero-click exploit targeting retrieval-augmented generation architectures. This attack vector has evolved beyond simple prompt manipulation into memory poisoning, where malicious instructions persist across sessions, corrupting the agent's long-term knowledge base.

 

PoisonedRAG research achieved 90% attack success by injecting just five malicious texts into databases containing millions of documents. That's an entirely new class of supply chain attack targeting the vector embeddings themselves.

 

Researchers found that posting embedded instructions on Reddit could potentially get agentic browsers to drain a user's bank account through tool misuse—where compromised agents abuse their authorized capabilities to perform unauthorized actions.

 

Multi-turn crescendo attacks distribute payloads across turns that each appear benign in isolation. The automated Crescendomation tool achieved 98% success on GPT-4 and 100% on Gemini-Pro.

 

Pillar Security's State of Attacks on GenAI report found that 20% of jailbreaks succeed in an average of 42 seconds, with 90% of successful attacks leaking sensitive data. In multi-agent systems, a single compromised agent can poison downstream decision-making within hours through agent-to-agent communication vulnerabilities.

 

These aren't proof-of-concept exploits. These are working attack patterns with documented success rates.

 

When Humans Become the Hackers (Without Knowing It)

 

But here's the attack vector most organizations aren't preparing for: humans executing the breach themselves, convinced they're following legitimate instructions from their leadership.

 

In January 2024, a finance worker at engineering firm Arup joined what appeared to be a routine video conference with the company's CFO and several colleagues. Multiple people on the call. All familiar faces. All speaking naturally and responding to questions.

 

The employee authorized 15 transactions totaling $25 million.

 

Every person on that call was a deepfake. The entire meeting was fabricated using AI and publicly available video of the actual executives. By the time the company discovered the fraud, the money had vanished.

 

This wasn't a hack in the traditional sense. The human became the exploit.

 

Financial losses from deepfake-enabled fraud exceeded $200 million in the first quarter of 2025 alone, according to Resemble AI's Q1 2025 Deepfake Incident Report. The average loss per incident now exceeds $500,000, with large enterprises losing an average of $680,000 per attack.

 

CEO fraud now targets at least 400 companies per day using deepfakes.

 

The technology has reached a terrifying inflection point: AI can clone voices using just three seconds of audio. Attackers download earnings calls, investor presentations, media interviews—any public appearance provides the source material needed to create convincing impersonations.

 

Over half of cybersecurity professionals surveyed report their organization has already been targeted by deepfake impersonation, up from 43% the previous year.

 

The attack pattern exploits everything traditional security training taught employees to trust: verify through a video call, confirm with multiple people, listen for familiar speech patterns. These verification methods no longer work when the technology can replicate all of them simultaneously.

 

Human detection rates for high-quality video deepfakes sit at 24.5%. A 2025 study found that only 0.1% of participants correctly identified all fake and real media shown to them.

 

We're not just facing AI systems that can be manipulated. We're facing humans who believe they're doing their jobs correctly while executing instructions that originated from an attacker's prompt.

 

The employee thinks they're being diligent. They verified through video. They spoke with multiple executives. They followed protocol. And in doing so, they became an unwitting accomplice to the breach.

 

Now imagine this attack combined with compromised AI agents. An employee receives a deepfake video call from their CFO requesting urgent fund transfers. They verify by asking the company's AI assistant to check recent executive communications and approval workflows. But the AI agent's memory has been poisoned with false authorization records. The human trusts the AI. The AI has been compromised. The breach executes itself through layers of false verification.

 

This is the converged threat: AI convincing AI, humans convincing humans, and both working in parallel to execute attacks that no single security layer can stop.

 

Why This Time Is Different

The evolution of prompt injection attacks over the last three years reveals a troubling pattern.

 

This vulnerability class has matured from a theoretical curiosity to the number one threat facing AI-powered enterprises today. The OWASP Top 10 for LLM Applications 2025 ranks prompt injection first.

 

Unlike traditional vulnerabilities, prompt injections don't require much technical knowledge. Attackers no longer need to rely on programming languages to create malicious code. They just need to understand how to effectively command and prompt an LLM using English.

 

The barrier to entry collapsed.

 

The NCSC believes prompt injection may never be totally mitigated. As more organizations bolt generative AI onto existing applications without designing for prompt injection from the start, the industry could see a surge of incidents similar to the SQL injection-driven breaches of 10-15 years ago.

 

But this time, the vulnerability isn't in the code. It's in the intelligence itself.

 

What You Need to Do Right Now

Start with asset management.

 

Build an inventory of every AI system, model, algorithm, and agent deployed in your organization. Document what each does, where it operates, who owns it, and what data it accesses. This includes tracking shadow AI—unsanctioned AI tools employees use without IT approval.

 

Without this foundation, you're operating blind.

 

Establish governance policies before the next deployment, not after the first breach. Implement least privilege access controls for AI agents, limiting them to only the data and tools required for their specific tasks. Understand what risks and threats your AI systems pose—from launching intelligent attacks across any channel to giving bad or dangerous advice to making incorrect decisions.

 

Recognize that traditional security approaches fail here because the attack surface has fundamentally shifted. You're not protecting code from exploitation. You're protecting reasoning from manipulation. Organizations need AI firewall capabilities that provide runtime protection, monitoring agent behavior and blocking malicious tool invocations as they happen.

 

Establish out-of-band verification protocols for high-risk transactions. When a video call requests urgent fund transfers, verify through a separate communication channel using pre-established authentication methods. Train employees that seeing and hearing are no longer sufficient—any request involving money, credentials, or sensitive data requires multi-channel confirmation regardless of how convincing the source appears.

 

Accept that both your AI systems and your human employees can be manipulated into executing attacker instructions while believing they're following legitimate protocols. Design your security architecture to assume compromise at every layer.

 

The NCSC's 2025 Annual Review found a 50% increase in highly significant cyber incidents for the third consecutive year. Threat actors are using AI—including LLMs—to improve the efficiency and effectiveness of their cyber-attacks, including generating fully automated spear-phishing campaigns, taking over cloud-based LLMs, and automating stages of cyber-attacks.

 

The first AI-convinced breach isn't coming.

 

It's already here.

 

We just haven't named it yet.

 

The question isn't whether your organization will face this threat. The question is whether you'll have visibility into your AI assets and human vulnerabilities when it happens—or whether you'll be part of that 8% who don't even know they've been compromised, still believing your employee was just following orders from the CFO on that video call.

 

The window to prepare is closing.

 

What you do in the next six months will determine whether you're ready for what's already unfolding.

 

 

 


About This Article

Published: 01 February 2026
Last Updated: same as published
Reading Time: Approximately 15 minutes

 

Author Information

Timur Mehmet | Founder & Lead Editor

Timur is a veteran Information Security professional with a career spanning over three decades. Since the 1990s, he has led security initiatives across high-stakes sectors, including Finance, Telecommunications, Media, and Energy.

 

For more information including independent citations and credentials, visit our About page.

 

Contact: This email address is being protected from spambots. You need JavaScript enabled to view it.

 

Editorial Standards

This article adheres to Hackerstorm.com's commitment to accuracy, independence, and transparency:

 

  • Fact-Checking: All statistics and claims are verified against primary sources and authoritative reports
  • Source Transparency: Original research sources and citations are provided in the References section below
  • No Conflicts of Interest: This analysis is independent and not sponsored by any vendor or organization
  • Corrections Policy: We correct errors promptly and transparently. Report inaccuracies to This email address is being protected from spambots. You need JavaScript enabled to view it.
  •  

Editorial Policy: Ethics, Non-Bias, Fact Checking and Corrections


Learn More: About Hackerstorm.com | FAQs

 

 


 

References and Sources

UK National Cyber Security Centre (NCSC)

 

Harmonic Security

 

IBM Security

 

Resemble AI

 

Anthropic

 

Gartner Research

  • Enterprise AI Integration and Security Predictions (2025-2028)
  • AI Agent Adoption and Breach Forecasts
  • https://www.gartner.com/

 

OWASP Foundation

 

ServiceNow

 

NVIDIA AI Red Team

 

Pillar Security

 

Hong Kong Police / Arup Incident

  • $25 Million Deepfake Fraud Case (January 2024)
  • Multiple news sources including Reuters, CNN, BBC

 

Academic Research

  • PoisonedRAG: RAG Poisoning Attack Research
  • Crescendomation: Multi-turn Attack Research
  • University studies on deepfake detection rates

 

This article synthesizes findings from cybersecurity reports, academic research, vendor security advisories, and documented breach incidents to provide a comprehensive overview of the AI security threat landscape as of January 2026.

 

 

By using this site, you agree to our Terms & Conditions.

COOKIE / PRIVACY POLICY: This website uses essential cookies required for basic site functionality. We also use analytics cookies to understand how the website is used. We do not use cookies for marketing or personalization, and we do not sell or share any personal data with third parties.

Terms & Privacy Policy