How Snorkel AI helps enterprises build AI agents they can trust

AI agents are getting better at answering questions and automating tasks. But inside many enterprises, there’s a familiar frustration: systems that look impressive in demos often fall apart when exposed to real workflows, data, and expert expectations. Snorkel AI CEO and cofounder Alexander Ratner explored this in a recent installment of Founder Friday on LinkedIn Live.  

Ratner has seen this pattern repeatedly. Teams build agents that perform well on curated examples, but when experts test them against real scenarios, accuracy drops sharply. According to Ratner, the issue is rarely the model itself. More often, the data and evaluation behind the system don’t reflect how work is done.  

Snorkel AI was founded to help enterprises systematically improve agent reliability using datacentric development, not just model tuning. The company focuses on helping enterprises move AI agents from proofs of concept to systems that achieve expertlevel accuracy in highstakes workflows. But if the data doesn’t match real usage, even strong demos fail to deliver lineofbusiness value. And without quick proof of value, pilots often stall or shut down. 

To address this, Snorkel AI emphasizes what Ratner calls a minimal reliable agent loop—a repeatable system that aligns evaluation, data, and iteration to how work happens. Evaluation reflects how experts judge quality, not synthetic benchmarks. Data mirrors real tasks, including edge cases and compliance needs. Iteration focuses on measurable improvements, with each change tested and refined. This approach came into focus in a collaboration with a global bank building a legal contract extraction agent on Azure. 

 The team developed an impressive demo, but when legal and banking experts tested the system against real inquiries, accuracy was only about 25 percent. Rather than tuning prompts or adjusting surface behaviors, the team reworked evaluation criteria to match expert judgment and aligned training data to realworld usage. Each iteration was driven by measurable signals instead of guesswork. 

 The result was a system that reached 95 percent expert acceptance, reducing manual review and increasing confidence in highstakes decisions. It moved from a proof of concept to a productionquality solution running in a secure Azure environment. For Ratner, that outcome illustrates the difference between building something that looks good and building something that works. 

Snorkel AI’s work is closely tied to where enterprises already operate. Many customers rely on Azure for core systems, enabling integration of highaccuracy agents into secure, scalable environments. As one of the first AI startups invited into the Microsoft for Startups Pegasus Program, Snorkel AI also benefited from early access to Azure AI services and highperformance infrastructure that supports experimentation, training, and evaluation at scale. 

Ratner points to speed of experimentation as a key advantage. Access to purposebuilt AI infrastructure enables teams to test new approaches quickly while maintaining rigor. During a period when enterprise buying cycles were shifting rapidly from predictive to generative to agentic systems, Microsoft’s long-term focus on data and finetuning reinforced Snorkel AI’s belief that reliability depends on more than model selection alone. 

For founders building enterprise AI today, Ratner’s advice is rooted in experience.  Stay deeply engaged with real users, even before it scales. Early in Snorkel AI’s journey, founders embedded directly with customers, learning how systems performed and earning trust through accountability. That proximity helped shape the company’s datacentric approach and ensured that solutions were built around real needs, not assumptions. 

 As AI agents take on more responsibility, success depends on whether experts trust them when the work is messy, and the stakes are real—and that requires building with real data, expert evaluation, and disciplined iteration from the start. 

In a recent LinkedIn Live conversation, Ratner joined Heena Purohit, director, AI Startups, Microsoft for Startups, to discuss why many enterprise AI pilots fail to reach production, what it takes to build reliable AI agents, and how founders can earn trust with enterprise buyers. Watch the replay on LinkedIn to catch the full conversation. 

Follow Microsoft for Startups on LinkedIn for more Founder Friday highlights, and visit Microsoft for Startups to learn how to get started building with Azure.