Confluent's Argument Is Simple: AI Doesn't Fail at the Model Layer. It Fails in the Pipeline

19 May

Confluent, the data streaming platform acquired by IBM and now operating as an IBM company, has announced a set of new capabilities designed to remove the security and tooling barriers that prevent most AI projects from reaching production. The releases, timed to coincide with Current London, extend Confluent Intelligence and Confluent Cloud with automated data privacy controls, private cloud connectivity, natural-language infrastructure management, and tighter integration with the tools data engineering teams already use daily.

Sean Falconer, head of AI at Confluent, did not dress it up: "Most AI projects fail before they reach a single customer because the data layer breaks down. Teams have the models and the mandate, but security risks and fragmented data stop them from shipping. We're fixing that by making the streaming layer the foundation for secure, production-ready AI."

That is a pointed diagnosis. The AI production problem is not, in Confluent's reading, a model problem or a compute problem. It is a plumbing problem, one that sits in the infrastructure connecting raw data to the applications that are supposed to use it.

Why AI projects stall before they ship

A McKinsey report cited in the announcement found that eight in ten companies identify data limitations as the primary obstacle to scaling AI agents. The same report found the causes clustered in two places: security teams blocking data from entering AI pipelines due to exposure risk, and developers losing hours switching between tools to inspect and manage the data flows their AI depends on. Both appear to be organisational failures presented as technical ones. The data exists. The models exist. The pipeline in between is where production goes to die.

What Confluent is releasing addresses both sides of that failure directly, and the way it has packaged the releases reflects a deliberate attempt to close several blocking conditions at once rather than incrementally.

Taking sensitive data out of the risk equation

The most consequential release for regulated industries is a new function that detects and removes personally identifiable information directly inside the data pipeline, before it reaches an AI model. In most current setups, handling sensitive data means pulling it out of the pipeline, running it through a separate system, and then reinserting the cleaned version, a process that creates latency, introduces additional exposure points, and requires security sign-off at each step. The new function, built into Flink SQL, Confluent's stream processing layer, handles that in place. Nothing leaves the pipeline first.

For financial services, healthcare, and insurance teams that have watched AI projects stall at the data governance review stage, removing that intermediate step is not a minor convenience. It is often the specific blocker that determines whether a project ships.

Letting AI run its own infrastructure

The MCP server release is the one that says something about where enterprise software development is heading. It allows AI agents to build, configure, and troubleshoot Confluent's streaming infrastructure through natural language. Agent Skills sit on top of that, encoding a company's own standards into how those operations are carried out, so the AI acts within organisational guardrails rather than in spite of them.

The shift that makes this meaningful is not technical. Engineering teams that now build with AI coding assistants as a core part of their workflow have been managing a split: the application logic through AI tools, the underlying infrastructure through a separate set of manual processes. Closing that split means the data pipeline, the layer that Falconer argues is where AI projects actually fail, can now be managed through the same interface developers are already in.

The dbt question

In all five releases, the open-source dbt adapter may generate the most immediate adoption. dbt has become the standard framework for data engineering teams managing pipelines, and most teams that run data infrastructure at any serious scale are already inside it. The adapter means Confluent's stream processing jobs can be defined, tested, and deployed using standard dbt commands and project structure, without requiring teams to learn a parallel toolchain.

That matters because tool adoption debt is one of the real, unglamorous reasons organisations stay on slower batch-based data processing longer than their use cases justify. The cost of learning something new, when the existing toolchain largely works, is a persistent brake on infrastructure upgrades. Removing that cost by meeting teams in the tool they already use is as much a distribution strategy as a product one.

Azure Private Link solves a narrower but operationally important problem. Confluent jobs running on Azure can now connect to Azure OpenAI, Azure SQL, and Cosmos DB over Microsoft's private network, keeping AI workloads off the public internet without requiring additional network architecture.

IBM's fingerprints are on the roadmap

The releases build on the integrations Confluent announced at IBM Think earlier this month, in which Confluent Cloud was embedded as a core component of IBM's watsonx.data platform, providing both the data foundation and a real-time context layer for AI running across hybrid cloud and on-premises environments. A separate integration brought mainframe data into the same streaming pipeline, connecting IBM's Z-system infrastructure to the real-time layer.

The IBM relationship gives Confluent reach into enterprise procurement conversations, where compliance-heavy capabilities, PII redaction, private connectivity, and governance controls are most important. Those are also the conversations that independent infrastructure vendors typically reach only after the architecture decisions have already been made.

Running beneath all five releases is the Real-Time Context Engine, which also hit general availability alongside today's announcements. It continuously processes historical and live data and delivers it as a current, governed context into AI applications. Confluent's argument is that this layer, the one that sits between raw data sources and the AI workloads meant to use them, is the part of the stack that separates AI that works in production from AI that works in a demo.

ConfluentAIIBM

Sindhu V Kashyap

Global Technology Journalist & Multimedia Storyteller | Covering Founders, Investors & Leaders Reshaping Tech | Writer · Interviewer · Moderator · Editor