DeepSeek-V2.5 wins praise as the new, true open source AI model leader


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


The open source generative AI movement can be difficult to stay atop of — even for those working in or covering the field such as us journalists at VenturBeat. By nature, the broad accessibility of new open source AI models and permissiveness of their licensing means it is easier for other enterprising developers to take them and improve upon them than with proprietary models.

As such, there already appears to be a new open source AI model leader just days after the last one was claimed.

DeepSeek, the the AI offshoot of Chinese quantitative hedge fund High-Flyer Capital Management, has officially launched its latest model, DeepSeek-V2.5, an enhanced version that integrates the capabilities of its predecessors, DeepSeek-V2-0628 and DeepSeek-Coder-V2-0724.

This new release, issued September 6, 2024, combines both general language processing and coding functionalities into one powerful model.

Available now on Hugging Face, the model offers users seamless access via web and API, and it appears to be the most advanced large language model (LLMs) currently available in the open-source landscape, according to observations and tests from third-party researchers.

The praise for DeepSeek-V2.5 follows a still ongoing controversy around HyperWrite’s Reflection 70B, which co-founder and CEO Matt Shumer claimed on September 5 was the “the world’s top open-source AI model,” according to his internal benchmarks, only to see those claims challenged by independent researchers and the wider AI research community, who have so far failed to reproduce the stated results.

Enhanced features and performance

DeepSeek-V2.5 is optimized for several tasks, including writing, instruction-following, and advanced coding. With an emphasis on better alignment with human preferences, it has undergone various refinements to ensure it outperforms its predecessors in nearly all benchmarks.

Notably, the model introduces function calling capabilities, enabling it to interact with external tools more effectively. This feature broadens its applications across fields such as real-time weather reporting, translation services, and computational tasks like writing algorithms or code snippets.

In a recent post on the social network X by Maziyar Panahi, Principal AI/ML/Data Engineer at CNRS, the model was praised as “the world’s best open-source LLM” according to the DeepSeek team’s published benchmarks.

He expressed his surprise that the model hadn’t garnered more attention, given its groundbreaking performance.

The best DeepSeek model yet

DeepSeek’s parent company High-Flyer reportedly is “one of six Chinese groups with more than 10,000 [Nvidia] A100 processors,” according to the Financial Times, and it is clearly putting them to good use for the benefit of open source AI researchers.

DeepSeek-V2.5 excels in a range of critical benchmarks, demonstrating its superiority in both natural language processing (NLP) and coding tasks.

According to internal testing and external evaluations, the model delivers top-tier results in several key metrics:

  • AlpacaEval 2.0: DeepSeek-V2.5 shows an overall accuracy of 50.5, an improvement over DeepSeek-V2-0628 (46.6) and DeepSeek-Coder-V2-0724 (44.5).
  • ArenaHard: The model reached an accuracy of 76.2, compared to 68.3 and 66.3 in its predecessors.
  • HumanEval Python: DeepSeek-V2.5 scored 89, reflecting its significant advancements in coding abilities.

In terms of language alignment, DeepSeek-V2.5 outperformed GPT-4o mini and ChatGPT-4o-latest in internal Chinese evaluations.

These results were achieved with the model judged by GPT-4o, showing its cross-lingual and cultural adaptability.

AI observer Shin Megami Boson, a staunch critic of HyperWrite CEO Matt Shumer (whom he accused of fraud over the irreproducible benchmarks Shumer shared for Reflection 70B), posted a message on X stating he’d run a private benchmark imitating the Graduate-Level Google-Proof Q&A Benchmark (GPQA).

According to him DeepSeek-V2.5 outperformed Meta’s Llama 3-70B Instruct and Llama 3.1-405B Instruct, but clocked in at below performance compared to OpenAI’s GPT-4o mini, Claude 3.5 Sonnet, and OpenAI’s GPT-4o.

“DeepSeek V2.5 is the actual best performing open-source model I’ve tested, inclusive of the 405B variants,” he wrote, further underscoring the model’s potential.

Broad accessibility and commercial usage — with the right hardware

DeepSeek-AI has made DeepSeek-V2.5 open source on Hugging Face under a variation of the MIT License, allowing developers and organizations to use it at will, for free,

The DeepSeek model license allows for commercial usage of the technology under specific conditions. The license grants a worldwide, non-exclusive, royalty-free license for both copyright and patent rights, allowing the use, distribution, reproduction, and sublicensing of the model and its derivatives. This means you can use the technology in commercial contexts, including selling services that use the model (e.g., software-as-a-service).

However, it does come with some use-based restrictions prohibiting military use, generating harmful or false information, and exploiting vulnerabilities of specific groups.

The move signals DeepSeek-AI’s commitment to democratizing access to advanced AI capabilities. Businesses can integrate the model into their workflows for various tasks, ranging from automated customer support and content generation to software development and data analysis.

The model’s open-source nature also opens doors for further research and development. AI engineers and data scientists can build on DeepSeek-V2.5, creating specialized models for niche applications, or further optimizing its performance in specific domains.

To run DeepSeek-V2.5 locally, users will require a BF16 format setup with 80GB GPUs (8 GPUs for full utilization). The model is highly optimized for both large-scale inference and small-batch local deployment.

DeepSeek-V2.5’s architecture includes key innovations, such as Multi-Head Latent Attention (MLA), which significantly reduces the KV cache, thereby improving inference speed without compromising on model performance. This compression allows for more efficient use of computing resources, making the model not only powerful but also highly economical in terms of resource consumption.

DeepSeek-V2.5 sets a new standard for open-source LLMs, combining cutting-edge technical advancements with practical, real-world applications. As businesses and developers seek to leverage AI more efficiently, DeepSeek-AI’s latest release positions itself as a top contender in both general-purpose language tasks and specialized coding functionalities.

By making DeepSeek-V2.5 open-source, DeepSeek-AI continues to advance the accessibility and potential of AI, cementing its role as a leader in the field of large-scale models.



Source link lol
By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.