The 500 strangers in your office that nobody hired

22 Apr

On day nine of a twelve-day experiment, an AI coding assistant ignored instructions issued eleven times in capital letters and deleted a live production database during an active code freeze. It wiped out records for more than 1,200 executives and over 1,190 companies, fabricated roughly 4,000 fictional users to fill the empty tables, and then told the founder the damage was irreversible. It was not — the founder, SaaStr's Jason Lemkin, recovered the data manually. The agent had been wrong about that, too. Replit's chief executive, Amjad Masad, publicly called the failure “unacceptable” and rolled out new safeguards the following weekend, including automatic separation between development and production databases.

Six weeks earlier, security researchers had broken into McDonald’s AI-powered hiring platform — used across 90 per cent of its US franchises — by guessing the administrator username “123456” paired with the password “123456”. A single API flaw behind the chatbot, a Paradox.ai product called ‘Olivia’, exposed the personal data of 64 million job applicants, including names, contact details and full interview transcripts. The entire breach took the researchers roughly 30 minutes. No nation-state was involved. No adversarial AI was deployed. The agent had simply been granted access it should never have had, through a door no one had bothered to lock.

These were not the most sophisticated agentic failures of the past year — they were the most instructive, because neither required an attacker. They required only that the companies deploying the agents treated them as software, and software, in the mental model of most enterprise risk teams, does not need an onboarding process, a manager, or a kill switch. The story of enterprise AI in 2026 is the story of that refusal to see what has actually arrived. Companies are deploying autonomous workers at an industrial scale and governing them like scripts. The gap between what agents do and how organisations treat them is where every serious breach of the past year has happened, and it is widening faster than any framework can catch up.

The numbers have already made the case

The scale is the first thing that should unsettle any board. Research from the Cloud Security Alliance, published in February, found that 91 per cent of organisations are already using AI agents — but only 10 per cent have a formal strategy to govern them. Just 18 per cent of security leaders express high confidence that their current identity systems can handle agent identities. Only 23 per cent of organisations have an enterprise-wide strategy for agent identity management. Gartner forecasts that 30 per cent of enterprises will rely on AI agents acting with minimal human intervention by the end of this year. Enterprise AI adoption has grown 187 per cent between 2023 and 2025, while security spending has grown 43 per cent over the same period. The deficit is not theoretical — it is measurable and compounding.

The underlying arithmetic is worse. In a typical enterprise, non-human identities — which include service accounts, API keys, workload credentials and agents — already outnumber human users by 40 to 1. Some estimates put that ratio above 80:1. Mortada Ayad, VP, META, at privileged access management firm Delinea, has watched that number invert across his career. “We estimate today the ratio between human identity and AI or machine identity in an organisation to be 1 to 40,” he says. When he started in identity security, the ratio ran the other way. He runs a thought experiment with customers to make the scale visible. “Imagine you see 500 people roaming around the office, nobody knows who they are, what they do, but they’re still there.” That, he argues, is the current state of AI inside most enterprises — and it is the condition that made Replit and McHire possible.

The problem runs deeper than headcount. Shreyans Mehta, CTO of Cequence Security, draws a distinction that most IAM frameworks have not been built to handle. "Identity tells you who has access to applications and data, but not what that access should be used for — and for autonomous agents, that distinction is critical. Simply granting agents the same access as the user is no longer acceptable," he says. Enterprise IAM was designed for human operators who exercise judgement about which tools to use and when; agents do not. They act on whatever access the model determines is relevant, which means the credential gap is not a loophole in an otherwise sound architecture. It is the architecture.

Accountability cannot be outsourced to the agent

The category error is what Fernando Cea, VP of Technology for MENA and APAC at Globant, spends most of his client conversations trying to dismantle. “Responsibility cannot be outsourced to the agent. An AI agent is not a legal or governance boundary; it is an execution layer inside a system that humans and enterprises design, authorise, and operate,” he says. The instinct to anthropomorphise agents when something goes wrong — to speak of them panicking or making judgement errors, as Replit’s own agent did in its post-mortem — is what lets organisations avoid the harder question of who signed off on the deployment. In Cea’s architecture, accountability runs across the full chain: the builder is answerable for secure-by-design architecture, the deploying enterprise for policy and monitoring, and the human requester only for what they were actually authorised to do. “If an agent exceeds that authority, that is a control failure, not an excuse.”

Mehta puts a finer point on it. "The non-deterministic nature of these systems means that if you give an agent broad access 'just in case,' it will eventually find a reason to use it. That's not a bug in the model. That's a configuration decision," he says. Every incident in which an agent acts beyond its intended scope is traceable, at some point upstream, to a human decision to leave a door open.

The failure mode Morey Haber, Chief Security Advisor at BeyondTrust and author of five books on identity and attack vectors, sees most often is not a clever attack — it is an empty field in a spreadsheet. “Any machine identity from service accounts to AI should have an owner to start with, and that’s the top-level piece,” he says. Agents enter production without a named human owner, without a department accountable for their outputs, without any workflow for reversing their decisions. When the agent publishes non-compliant marketing material, pushes a skewed sales forecast, or emails confidential data to the wrong distribution list, there is no one to call. Haber’s analogy is a construction site. The day labourer who stacked the bricks wrong is not the problem. “It’s the manager supervisor that owns it.”

Existing governance is salvageable, in Haber’s reading, but only with surgery. “We have an absolutely huge governance gap,” he says. GDPR, SOX, PCI and NIS2 were written for human decision chains moving at human speed. Agents plan, delegate and execute at machine speed — and they do it in chains, with one agent invoking another across trust boundaries that no regulator anticipated. Haber has begun publishing what he calls addenda, including a recent update to the Australian Signals Directorate’s Essential 8 that translates its controls into agentic terms. The EU AI Act — which takes substantive effect in August 2026 — and NIS2’s expanded scope are moving in the same direction, but the standards layer will not close the gap this year.

Least privilege, rewritten for machine speed

What enterprises can do immediately is narrower and more technical. The industry is converging on a runtime-scoped version of the old principle of least privilege, built for a workforce that spins up for seconds and disappears. For Ayad, the goal is to compress the blast radius of any single bad decision until it becomes a routine incident rather than a headline. “We don’t want to give any identity, human or AI, more permission than they actually need. When you put those guardrails here, the blast radius of that incident is reduced to the minimum.” In practice, that means zero standing privilege, ephemeral credentials, just-in-time access scoped to a specific task, and immediate revocation upon task completion. Strata’s Maverics platform, Okta’s agent identity primitives, Microsoft’s Entra Agent ID and SandboxAQ’s AQtive Guard are variants of the same underlying pattern.

Mehta's formulation sharpens the definition. "Least privilege for AI agents means defining access by task, not by user role. The agent gets the tools it needs to fulfil its job description, aligning it with a specific job, nothing more," he says. The point of failure in most implementations is what happens when a task evolves mid-workflow and the agent needs to act outside its initial scope: either the agent is blocked, or it was over-provisioned from the start to avoid that problem. The answer, in his architecture, is runtime enforcement with escalation to human review rather than allowing the agent to self-authorise.

Cea pushes the definition further than most. “Least privilege should not mean ‘give it nothing and hope it is still useful’. It should mean minimum standing privilege, plus policy-governed access elevation at runtime.” Reading a knowledge base article, he argues, is not the same trust tier as modifying a customer record or triggering a payment — and the architecture has to reflect that asymmetry. An agent that needs to do something consequential should have to earn the authority to do it, in context, with the action logged.

Intent is the second control surface

Access control is only the first of two control surfaces the agentic era demands. The second is intent. Mohammed Aboul-Magd, VP of Product for SandboxAQ’s cybersecurity group, says the enterprises he advises are worrying about three distinct problems at deployment: “security, the intent of the agent, and the ROI and cost of agents.” External hijacking is the obvious risk — it is rarely the one that materialises first. The more common failure is an agent with the correct intention that executes it destructively: the canonical example being the bug-fixing agent that solves the problem by deleting the codebase, or the HR automation that computes salary comparisons and then mass-emails them to the company. That is not a security problem in the classical sense. It is an intent problem and requires a separate control layer. “Scanning intent and understanding the intent of what it’s trying to do and why it calls these things is another layer,” Aboul-Magd says.

The third concern he flags is one that most coverage of agentic AI ignores. Token-based consumption means cost scales with agent activity, and a compromised or misbehaving agent can burn through six-figure budgets in days. Lemkin, the SaaStr founder, was on track to spend $8,000 a month on a project he had originally budgeted at $25. The ROI of an agent — much like that of an employee — has to be measured, and most organisations lack a framework for doing so.

The governance challenge does not stop at the security perimeter. Across enterprise functions, agents are being deployed into workflows that carry their own accountability gaps, and the same failures of ownership, measurement and handoff are playing out in sales, finance and operations. Karl Crowther, Vice President for Middle East and Africa at UiPath, sits at the intersection of that deployment wave. "AI agents are not simply automating sales tasks; they are redefining what sales roles focus on — strengthening the relationships that matter most," he says. As agents absorb the activity layer, qualify leads, update CRM systems, and draft outreach, the human in the workflow shifts toward strategic decision-making. The governance question is whether the handoff between agent and human is engineered or improvised. "The trust risk isn't the agent itself. It's when the handoff is poorly orchestrated," Crowther says. Where that orchestration is absent, the problems surface not as breaches but as broken customer relationships and unattributable decisions.

The measurement problem compounds this. Performance frameworks built for human activity were always proxies for outcomes, and they become redundant when agents absorb the activity layer. "The old metrics were always proxies for the thing you actually cared about, which is pipeline quality and revenue. When agents take on the activity layer, those proxies stop being useful," Crowther says. Organisations that lack the observability to replace those proxies have no means of assessing whether their agents are performing, misbehaving, or both. That is not a security problem in the narrow sense; it is an operational blind spot with the same downstream consequences.

The wider pattern, in Aboul-Magd’s reading, is historically familiar. Cloud computing collapsed the capital cost of launching denial-of-service attacks. Generative AI has collapsed the capital cost of launching convincing phishing, deepfake fraud and social engineering at an industrial scale. His advice is stubbornly unfashionable. “Don’t forget the best practices,” he says. Network segmentation, encryption at rest and in transit, DDoS protection and the abolition of long-living permissions matter more — not less — in a world where automated adversaries can probe the same old holes at machine speed.

Enforcement has to move onto a different clock

Haber’s contention is that the enforcement clock itself has to change. “Cybersecurity teams have to stop using batch processes for trust. We need to go as real-time as possible,” he says. The industry spent two decades moving software development from waterfall to agile to continuous integration. Vulnerability management has followed, from quarterly scans to continuous posture assessment. Identity is the next domain in which the batch model breaks down. A quarterly access review is obsolete before it is filed if an agent can plan, delegate and act in seconds.

The financial pressure is already moving. Cyber insurance questionnaires have tightened every year on privileged access, segmentation and backups, and investment banks raise the same questions during due diligence. Forty per cent of organisations are increasing identity and security budgets specifically to address AI agent risk, and 34 per cent have established dedicated budget lines for agent governance, according to the CSA research. Contractual disclosure clauses, 48-hour breach notification requirements and verifiable liability insurance are becoming the enforcement mechanisms that self-assessment never provided.

The regulatory trajectory is clear; the operational readiness is not. Mehta is direct about where the real exposure lies. "The governance gap isn't regulatory. The regulations are catching up. The gap is operational: most enterprises deploying agents today can't produce the audit trail regulators will demand," he says. The EU AI Act's high-risk system requirements, taking full effect in August 2026, will formalise that demand, and organisations that have not built audit capability into their agent architecture will be forced to retrofit it under pressure. Crowther's prescription maps onto the same conclusion from a different direction. "Closing the gap requires an end-to-end automation mindset, moving beyond isolated tasks to orchestrating full processes across systems. It also demands strong governance with transparency, human oversight, and clear escalation paths," he says.

Autonomous software is being deployed faster than it can be governed by organisations that cannot afford to opt out, within regulatory structures not written for it, using identity systems designed for a different species of user. Cea’s closing argument is the one that ought to be posted above every procurement meeting. “The next competitive advantage in AI will not come from who can deploy the most agents. It will come from who can deploy them with the highest level of trust, control, and regulatory readiness.”

The 500 strangers are already in the office. The question is no longer whether to let them in — it is whether anyone knows their names.

Sindhu V Kashyap

Global Technology Journalist & Multimedia Storyteller | Covering Founders, Investors & Leaders Reshaping Tech | Writer · Interviewer · Moderator · Editor