The $67 Billion Problem: How AI Hallucinations Are Costing Enterprises More Than They Know

23 Mar

In 2023, Cambridge Dictionary named 'hallucination' its word of the year, not in the classical human sense, but in a newly minted computer-science definition. The word captured something that billions of people were experiencing in real time: an AI system capable of producing fluent, confident, entirely fabricated outputs. Two years on, the phenomenon has moved from curiosity to crisis.

The numbers are stark. According to a Deloitte survey, 47% of enterprise AI users made at least one major business decision based on hallucinated content. Knowledge workers now spend an average of 4.3 hours per week verifying AI outputs, according to Microsoft's 2025 data, and the financial toll reached an estimated $67.4 billion globally in 2024. And yet adoption continues to climb, driven by competitive pressure, capital already committed, and the growing sense that standing still is the riskiest option of all.

What has emerged is a peculiar paradox. The Stack Overflow 2025 Developer Survey found that usage of AI tools rose to 84% of respondents, even as trust plummeted to just 29%, down eleven percentage points from the previous year. The more enterprises use these tools, the more they understand what can go wrong.

Stuck in the Pilot Stage

Across the enterprise technology landscape, a pattern has become entrenched. Organisations deploy AI in controlled, internal-facing environments: chatbots for employees, summarisation tools, code-generation assistants, all while holding back from anything customer-facing or operationally critical. The ambitions are large; the deployments are small.

Sumeet Agrawal, VP of Product Management at Informatica, now acquired by Salesforce, has spent close to two decades at the intersection of analytics and AI. "If you see AI right now, most of the customers — if you look at enterprise adoption — are mostly in the piloting phase right now, and the things are not moving into production," he said. What deployment does exist, he added, is overwhelmingly internal-facing: tools kept safely away from customers and anything that could affect the business externally.

The reasoning is straightforward: anything external-facing carries reputational risk. The case that has become something of a shorthand in the industry involves Air Canada, whose AI chatbot incorrectly promised a bereavement fare discount to a grieving customer. When the airline tried to disavow the chatbot's response, a tribunal ruled it was responsible for its AI's outputs. "It hallucinated, and it hallucinated confidently," Agrawal said. "Air Canada's reputation was at stake; there were financial losses. It is no longer about what AI can do, but whether you can trust what AI can do."

The distinction between capability and trustworthiness sits at the heart of the enterprise AI dilemma. The models have become extraordinarily powerful. Whether organisations can depend on them, particularly in high-stakes environments, remains unresolved.

The Anatomy of a Trust Deficit

Kurt Muehmel, Head of AI Strategy at Dataiku, draws a careful distinction between two separate problems that often get conflated under the umbrella of 'hallucinations'. The first is the classical definition: a model that confidently states something false as though it were true. The second is subtler and, he argued, ultimately more significant for enterprise deployment.

"The hallucination problem specifically — meaning the model is very convinced that it's telling you something which is true but it is in fact false — that specific problem is really, really diminishing" for the largest, highest-performing models, Muehmel said. Progress has been real: according to data compiled by AllAboutAI from the Vectara leaderboard, four models now record sub-1% hallucination rates on summarisation benchmarks, with Google's Gemini-2.0-Flash-001 achieving just 0.7% as of April 2025. The rate of improvement is accelerating, with some models recording a 64% drop in hallucination rates over 2025.

Yet resolving the technical hallucination problem does not, on its own, resolve the trust problem. "There remains a very significant trust problem, and there are two reasons for this. The first is the perception of hallucinations, and the fact that older and smaller models continue to hallucinate. So there is a reality of hallucination which is largely being solved for the largest, highest-performing models, but there's a lingering perception of hallucination which hinders trust," Muehmel said.

A deeper issue is structural. Even a model that does not hallucinate in the narrow sense can still arrive at the wrong conclusion for a specific business, lacking the context and institutional knowledge an organisation has spent years developing. The enterprise trust problem, Muehmel argued, cannot be solved by deploying better models alone. It requires organisations to build AI systems that reason in ways specific to how their businesses actually work.

The Governance Imperative

For Agrawal, the path from pilot to production runs through governance. He outlined a 'top-down, bottom-up' approach: top-down covering the policies, risk frameworks, and oversight structures organisations need before deploying AI at scale; bottom-up covering the technical mechanisms of traceability, explainability, and observability that make AI outputs defensible.

The regulatory environment is beginning to provide external scaffolding for these internal decisions. The EU AI Act, ISO 42001, and NIST's AI Risk Management Framework each offer structures for organisations navigating responsible AI deployment. Gartner has introduced its own framework, called AI TRiSM — Trust, Risk and Security Management. "As a top-down point of view," Agrawal said, "organisations need to decide which of these AI acts they are following, what is their risk appetite, and then establish a governance committee or some kind of stakeholder group to approve anything before it moves outside."

The bottom-up technical layer is equally critical. Retrieval-Augmented Generation (RAG) is one of the most effective tools available, anchoring AI systems to verified, traceable sources rather than the statistical patterns of training data alone. When properly integrated, RAG can reduce hallucination rates by up to 71%. A Salesforce study found that 49% of data leaders reached incorrect conclusions due to a lack of business context and hallucinations, underscoring that data quality and governance are as important as the model itself.

The Internalisation Imperative

Muehmel's view of the enterprise AI challenge is, if anything, more urgent. Too many organisations are responding to the trust problem by outsourcing it, paying consulting partners, systems integrators, or AI lab employees to build solutions on their behalf, rather than developing the internal capability to understand and govern AI themselves.

"I see a lot of organisations just hoping that they can pay somebody to solve it for them," he said. "Whether that's the new startup with the great pitch, or their consulting partner, tens of millions of dollars to run a transformation programme. Each one of those approaches has its limitations, because you're not internalising these practices. You're becoming dependent on external resources for something which is going to be a critical business function."

If AI is genuinely transformative, as Muehmel believes, then it cannot be treated as a service to be procured. We must ensure this is understood from the inside, by enabling the people who know the business best to build and govern AI systems themselves, rather than delegating to external parties who may produce technically sophisticated but institutionally opaque solutions.

The Question of Pace

Neither Agrawal nor Muehmel believe AI adoption will stall. The competitive pressures are too strong; the capital already deployed is too substantial. Only 27% of organisations currently report trusting fully autonomous AI agents, down from 43% just a year earlier, according to enterprise adoption research compiled in early 2026. Only 2% of enterprises report deploying AI agents at full scale, with more than 80% lacking the mature AI infrastructure required to govern agentic systems at scale.

Muehmel drew a parallel with the early internet era. Capital overinvestment, followed by a correction, is historically normal during major technology transitions. What is less predictable is the eventual shape of the transformation and the consequences for the organisations that get there first versus those that do not. "The real question is at what pace organisations are going to survive this and come out the other side as the winners," he said. "In the same way that when the internet came around, there was a real reshuffling of the most valuable companies in the world, I think the same is absolutely true right now."

Agrawal was equally direct about what he expected to see over the coming years. Pilot numbers will increase, but so will production failures, for organisations that have not laid the groundwork. "The companies that spend time putting their right data strategy and governance strategy in place are more likely to go into production with AI and get ROI," he said. "A lot of AI projects will fail if they have not done this groundwork."

The path forward lies not in waiting for models to become trustworthy enough to deploy without oversight. That threshold, if it arrives, remains distant. The more actionable imperative is to build the governance, the traceability, and the institutional capability to work with AI as it actually is: powerful, useful, and still imperfect. The organisations that approach it that way are the ones most likely to convert their pilots into something that endures.

The Source Code Editorial