Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
IBM is staking its claim at the top of the open-source AI leaderboard with its new Granite 3.1 series out today.
The Granite 3.1 large language models (LLMs) offer enterprise users extended context length of 128K tokens, new embedding models, integrated hallucination detection and improved performance. According to IBM, the new Granite 8B Instruct model tops open source rivals of the same size including Meta Llama 3.1, Qwen 2.5 and Google Gemma 2. IBM ranked its models across a series of academic benchmarks included in the OpenLLM Leaderboard.
The new models are part of accelerated release cadence of IBM’s Granite open source models. Granite 3.0 was just released in October. At the time, IBM claimed that it has a $2 billion book of business related to generative AI. With the Granite 3.1 update, IBM is focusing on packing more capability into smaller models. The basic idea is that smaller models are easier for enterprises to run and are more cost efficient to operate.
“We’ve also just boosted all the numbers — all the performance of pretty much everything across the board has improved,” David Cox, VP for AI models at IBM Research, told VentureBeat. “We use Granite for many different use cases, we use it internally at IBM for our products, we use it for consulting, we make it available to our customers and we release it as open source, so we have to be kind of good at everything.”
Why performance and smaller models matter for enterprise AI
There are any number of ways an enterprise can evaluate the performance of an LLM with benchmarks.
The direction that IBM is taking is running models through the gamut of academic and real world tests. Cox emphasized that IBM tested and trained its models to be optimized for enterprise use cases. Performance isn’t just about some abstract measure of speed, either; rather, it is a somewhat more nuanced measure of efficiency.
One aspect of efficiency that IBM is aiming to push forward is helping users spend less time to get desired results.
“You should spend less time fiddling with prompts,” said Cox. “So, the stronger a model is in an area, the less time you have to spend engineering prompts.”
Efficiency is also about model size. The larger a model, the more compute and GPU resources it typically requires, which also means more cost.
“When people are doing minimum viable prototype kind of work, they often jump to very large models, so you might go to a 70 billion parameter model or a 405 billion parameter model to build your prototype,” said Cox. “But the reality is that many of those are not economical, so the other thing we’ve been trying to do is drive as much capacity as possible into the smallest package possible.”
Context matters for enterprise agentic AI
Aside from the promise of improved performance and efficiency, IBM has dramatically expanded Granite’s context length.
With the initial Granite 3.0 release, the context length was limited to 4k. In Granite 3.1, IBM has extended that to 128k, allowing for the processing of much longer documents. The extended context is a significant upgrade for enterprise AI users, both for retrieval augmented generation (RAG) as well as agentic AI.
Agentic AI systems and AI agents often need to process and reason over longer sequences of information, such as larger documents, log traces or extended conversations. The increased 128k context length allows these agentic AI systems to have access to more contextual information, enabling them to better understand and respond to complex queries or tasks.
IBM is also releasing a series of embedding models to help accelerate the process of converting data into vectors. The Granite-Embedding-30M-English model can achieve performance of 0.16 seconds per query, which IBM claims is faster than rival options including Snowflake’s Arctic.
How IBM has improved Granite 3.1 to serve enterprise AI needs
So how did IBM manage to improve its performance for Granite 3.1? It wasn’t any one specific thing, but rather a series of process and technical innovations, Cox explained.
IBM has developed increasingly advanced multi-stage training pipelines, he said. This has allowed the company to extract more performance from models. Also, a critical part of any LLM training is data. Rather than just focusing on increasing the quantity of training data, IBM has put a strong emphasis on improving the quality of data used to train the Granite models.
“It’s not a quantity game,” said Cox. “It’s not like we’re going to go out and get 10 times more data and that’s magically going to make models better.”
Reducing hallucination directly in the model
A common approach to reducing the risk of hallucinations and errant outputs in LLMs is to use guardrails. Those are typically deployed as external features alongside an LLM.
With Granite 3.1, IBM is directly integrating hallucination protection into the model. The Granite Guardian 3.1 8B and 2B models now include a function-calling hallucination detection capability.
“The model can natively do its own guardrailing, which can give different opportunities to developers to catch things,” said Cox.
He explained that performing hallucination detection in the model itself optimizes the overall process. Internal detection means fewer inference calls, making the model more efficient and accurate.
How enterprises can use Granite 3.1 today and what’s next
The new Granite models are all now freely available as open source to enterprise users. The models are also available via IBM’s Watsonx enterprise AI service and will be integrated into IBM’s commercial products.
The company plans on keeping an aggressive pace for updating the Granite models. Looking forward, the plan for Granite 3.2 is to add multimodal functionality that will debut in early 2025.
“You’re gonna see us over the next few point releases, adding more of these kinds of different features that are differentiated leading up to the stuff that we’ll announce at the IBM Think conference next year,” said Cox.
Source link lol