Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
2025 is anticipated to be the year AI gets real, bringing specific, tangible benefit to enterprise.
However, according to a new State of AI Development Report from AI development platform Vellum, we’re not quite there yet: Just 25% of enterprises have deployed AI into production, and only a quarter of those have yet to see measurable impact.
This seems to indicate that many enterprises have not yet identified viable use cases for AI, keeping them (at least for now) in a pre-build holding pattern.
“This reinforces that it’s still pretty early days, despite all the hype and discussion that’s been happening,” Akash Sharma, Vellum CEO, told VentureBeat. “There’s a lot of noise in the industry, new models and model providers coming out, new RAG techniques; we just wanted to get a lay of the land on how companies are actually deploying AI to production.”
Enterprises must identify specific use cases to see success
Vellum interviewed more than 1,250 AI developers and builders to get a true sense of what’s happening in the AI trenches.
According to the report, the majority of companies still in production are in various stages of their AI journeys — building out and evaluating strategies and proofs of concept (PoC) (53%) beta testing (14%) and, at the lowest level, talking to users and gathering requirements (7.9%).
By far, enterprises are focused on building document parsing and analysis tools and customer service chatbots, according to Vellum. But they are also interested in applications incorporating analytics with natural language, content generation, recommendation systems, code generation and automation and research automation.
So far, developers report competitor advantage (31.6%), cost and time savings (27.1%) and higher user adoption rates (12.6%) as the biggest impacts they’ve seen so far. Interestingly, though, 24.2% have yet to see any meaningful impact from their investments.
Sharma emphasized the importance of prioritizing use cases from the very start. “We’ve anecdotally heard from people that they just want to use AI for the sake of using AI,” he said. “There’s an experimental budget associated with that.”
While this makes Wall Street and investors happy, it doesn’t mean AI is actually contributing anything, he pointed out. “Something generally everyone should be thinking about, is, ‘How do we find the right use cases? Usually, once companies are able to identify those use cases, get them into production and see a clear ROI, they get more momentum, they get past the hype. That results in more internal expertise, more investment.”
OpenAI still at the top, but a mixture of models will be the future
When it comes to models used, OpenAI maintains the lead (no surprise there), notably its GPT 4o and GPT 4o-mini. But Sharma pointed out that 2024 offered more optionality, either directly from model creators or through platform solutions like Azure or AWS Bedrock. And, providers hosting open-source models such as Llama 3.2 70B are gaining traction, too — such as Groq, Fireworks AI and Together AI.
“Open Source models are getting better,” said Sharma. “Closed source competitors to OpenAI are catching up in terms of quality.”
Ultimately, though, enterprises aren’t going to just stick with just one model and that’s it — they will increasingly lean on multi-model systems, he forecasted.
“People will choose the best model for each task at hand,” said Sharma. “While building an agent, you might have multiple prompts, and for each individual prompt the developer will want to get the best quality, lowest cost and lowest latency, and that may or may not come from OpenAI.”
Similarly, the future of AI is undoubtedly multimodal, with Vellum seeing a surge in adoption of tools that can handle a variety of tasks. Text is the undisputed top use case, followed by file creation (PDFs or Word) images, audio and video.
Also, retrieval-augmented generation (RAG) is a go-to when it comes to information retrieval, and more than half of developers are using vector databases to simplify search. Top open-source and proprietary models include Pinecone, MongoDB, Quadrant, Elastic Search, PG vector, Weaviate and Chroma.
Everyone’s getting involved (not just engineering)
Interestingly, AI is moving beyond just IT and becoming democratized across enterprises (akin to the old ‘it takes a village’). Vellum found that while engineering was most involved in AI projects (82.3%), they are being joined by leadership and executives (60.8%), subject matter experts (57.5%), product teams (55.4%) and design departments (38.2%).
This is largely due to the ease of use of AI (as well as the general excitement around it), Sharma noted.
“This is the first time we’re seeing software being developed in a very, very cross functional way, especially because prompts can be written in natural language,” he said. “Traditional software usually tends to be more deterministic. This is non-deterministic, which brings more people into the development fold.”
Still, enterprises continue to face big challenges — notably around AI hallucinations and prompts; model speed and performance; data access and security; and getting buy-in from important stakeholders.
At the same time, while more non-technical users are getting involved, there is still a lack of pure technical expertise in-house, Sharma pointed out. “The way to connect all the different moving parts is still a skill that not that many developers have today,” he said. “So that’s a common challenge.”
However, many existing challenges can be overcome by tooling, or platforms and services that help developers evaluate complex AI systems, Sharma pointed out. Developers can perform tooling internally or with third-party platforms or frameworks; however, Vellum found that nearly 18% of developers are defining prompts and orchestration logic without any tooling at all.
Sharma pointed out that “lack of technical expertise becomes easier when you have proper tooling that can guide you through the development journey.” In addition to Vellum, frameworks and platforms used by survey participants include Langchain, Llama Index, Langfuse, CrewAI and Voiceflow.
Evaluations and ongoing monitoring are critical
Another way to overcome common issues (including hallucinations) is to perform evaluations, or use specific metrics to test the correctness of a given response. “But despite that, [developers] are not doing evals as consistently as they should be,” said Sharma.
Particularly when it comes to advanced agentic systems, enterprises need solid evaluation processes, he said. AI agents have a high degree of non-determinism, Sharma pointed out, as they call external systems and perform autonomous actions.
“People are trying to build fairly advanced systems, agentic systems, and that requires a large number of test cases and some sort of automated testing framework to make sure it performs reliably in production,” said Sharma.
While some developers are taking advantage of automated evaluation tools, A/B testing and open-source evaluation frameworks, Vellum found that more than three-quarters are still doing manual testing and reviews.
“Manual testing just takes time, right? And the sample size in manual testing is usually much lower than what automated testing can do,” said Sharma. “There might be a challenge in just the awareness of techniques, how to do automated, at-scale evaluations.”
Ultimately, he emphasized the importance of embracing a mix of systems that work symbiotically — from cloud to application programming interfaces (APIs). “Consider treating AI as just a tool in the toolkit and not the magical solution for everything,” he said.
Source link lol