Amazon Web Services (AWS) is committed to supporting the development of cutting-edge generative artificial intelligence (AI) technologies by companies and organizations across the globe. As part of this commitment, AWS Japan announced the AWS LLM Development Support Program (LLM Program), through which we’ve had the privilege of working alongside some of Japan’s most innovative teams. From startups to global enterprises, these trailblazers are harnessing the power of large language models (LLMs) and foundation models (FMs) to boost productivity, create differentiated customer experiences, and drive meaningful progress across a variety of industries by taking advantage of purpose-built generative AI infrastructure on AWS. Notably, 12 of the 15 organizations who successfully participated in the program used the powerful compute capabilities of AWS Trainium to train their models and are now exploring AWS Inferentia for inference. Earlier this year, at the conclusion of the program, the LLM Program held a media briefing, where several pioneering companies presented their results and stories. In this blog post, we share a recap of those results and cover how the participating organizations used the LLM Program to accelerate their generative AI initiatives.
AWS LLM Development Support Program in Japan
Since its launch, the LLM Program has welcomed 15 diverse companies and organizations, each with a unique vision for how to use LLMs to drive progress in their respective industries. The program provides comprehensive support through guidance on securing high-performance compute infrastructure, technical assistance and troubleshooting for distributed training, cloud credits, and support for go-to-market. The program also facilitated collaborative knowledge-sharing sessions, where the leading LLM engineers came together to discuss the technical complexities and commercial considerations of their work. This holistic approach enabled participating organizations to rapidly advance their generative AI capabilities and bring transformative solutions to market.
Let’s dive in and explore how these organizations are transforming what’s possible with generative AI on AWS.
Ricoh innovates with curriculum learning to train a bilingual LLM
Ricoh recognized that the development of Japanese LLMs was lagging behind English or multilingual LLMs. To address this, the company’s Digital Technology Development Center developed a Japanese-English bilingual LLM through a carefully crafted curriculum learning strategy.
Takeshi Suzuki, Deputy Director of the Digital Technology Development Center, explains Ricoh’s approach:
“Although new model architectures for FMs and LLMs are rapidly emerging, we focused on refining our training methodologies to create a competitive advantage, rather than solely pursuing architectural novelty.”
This led them to adopt a curriculum learning approach that gradually introduced increasingly complex data to their model.
“If a large amount of difficult Japanese data is introduced from the start into the initial English-trained weights of Llama 2 13B Chat, it can lead to a forgetting effect, hindering learning,” Suzuki says. “Therefore, we started with a substantial amount of English data, then gradually incorporated lower-quality English and Japanese data, before finally fine-tuning on high-quality Japanese content.”
To bring this innovative curriculum learning methodology to life, Ricoh used Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instances, powered by Trainium. By using an on-demand cluster of 64 trn1.32xlarge instances (1,024 Trainium chips) with support from the LLM Program, Ricoh performed large-scale distributed training for their 13-billion-parameter bilingual LLM (Llama2-based). In benchmarks using the Japanese llm-jp-eval, the model demonstrated strong logical reasoning performance important in industrial applications.
Stockmark mitigates hallucination by pre-training a Japanese LLM
Stockmark wanted to build highly reliable LLMs for industrial applications and decided to pretrain a Japanese LLM to tackle the challenge of hallucination (factually inaccurate output)—a critical concern in many real-world use cases.
“In the industrial world, there is a demand for LLMs where hallucination is suppressed even more than it is in ChatGPT.”
– Kosuke Arima, CTO and co-founder of Stockmark.
Hallucination mitigation depends heavily on the amount of knowledge in LLMs. Multilingual LLMs, which are often used globally, contain only about 0.1 percent of training data in Japanese. Stockmark determined that retrieval augmented generation alone was insufficient to meet the needs of enterprise search or application search, because the LLMs used weren’t proficient in Japanese. So, they decided to develop Japanese LLMs in-house.
“To support practical business use cases, we pre-trained a 13-billion-parameter LLM from scratch using a total of 220 billion tokens of Japanese text data, including not only public data but also original web corpus and patent data for business domains.”
– Dr. Takahiro Omi, VP of Research of Stockmark.
Stockmark quickly developed Stockmark-13b LLM using 16 Trn1 instances powered by Trainium chips in about 30 days. Furthermore, to deploy the developed Stockmark-13b into their own services, they conducted a technical validation of inference using the AWS Inferentia2 chip, and published in a notebook.
NTT builds lightweight, high-performance LLMs for sustainable AI
The NTT group, together with Intel and Sony, has established Innovative Optical and Wireless Network (IOWN) as a new industry forum whose mission is to meet social and technological needs of society through innovative and sustainable technology. As part of this effort, NTT Human Informatics Laboratories is developing the lightweight, high-performance LLM tsuzumi (named after a traditional Japanese percussion instrument). Instead of increasing the parameter size, tsuzumi enhances the quality and quantity of Japanese training data, enabling high Japanese processing ability with a lightweight model. As described in their press release, tsuzumi demonstrates high Japanese language proficiency, as evaluated by the Rakuda benchmark, and possesses multi-modal capabilities that are currently in progress.
“Tsuzumi’s high Japanese language proficiency and multi-modal capabilities can benefit a variety of industry-specific and customer support use cases. In the healthcare and life sciences domain, tsuzumi can help parse electronic medical records, contributing to personalized medical care and accelerating drug discovery,” he explains. “For contact centers, tsuzumi’s multi-modal capabilities, such as visual understanding of manuals and charts, are expected to enhance both customer experience and employee experience.”
– Dr. Kyosuke Nishida, Senior Distinguished Researcher at NTT Human Informatics Laboratories.
By participating in the LLM Program, NTT was able to quickly launch a cluster of 96 NVIDIA H100 GPUs (12 EC2 P5 instances using AWS ParallelCluster). This enabled highly efficient, distributed training through the Elastic Fabric Adapter’s high-speed 3,200 Gbps inter-node communication. The AWS team also provided technical expertise to help NTT seamlessly migrate and validate its environment on AWS.
Customer innovations in domain-specific, multilingual, and multimodal generative AI
From intelligent chatbots that engage in witty banter to multimodal frameworks for autonomous vehicle systems, the LLM Program participants demonstrated the transformative potential of generative AI by using Trainium.
Domain-specific models: Trainium enabled creation of LLMs tailored to specific domains and tasks, unlocking new frontiers of efficiency and specialization. KARAKURI built an LLM (karakuri-ai/karakuri-lm-70b-chat-v0.1) to create customer support chatbots that not only have Japanese proficiency but also respond with a helpful demeanor. Meanwhile, Watashiha injected a dose of humor into the AI realm, developing OGIRI—a humor-focused foundation model that delivers delightfully funny responses to user queries. Poetics created an LLM adept at deciphering the nuances of online business meetings for their meeting analysis tool Jamroll. The Matsuo Institute pre-trained an LLM based on elyza/ELYZA-japanese-Llama-2-7b to develop an LLM-powered recommendation system that can intelligently curate personalized experiences for retail and travel customers. Aiming to build an LLM that specializes in specific tasks, Lightblue developed a small, lightweight LLM that will also reduce inference costs. To address the scalability challenges posed by a shrinking workforce, Recruit built an LLM through continued pre-training (with C4-ja, Wikipedia-ja, Pile, and in-house corpora) and instruction tuning (with databricks-dolly-15k-ja, ichikara-instruction, and in-house instruction data) on elyza/ELYZA-japanese-Llama-2-7b-fast and meta-llama/Llama-2-13b-hf models.
Multi-modal models: Several participants, such as Sparticle, have ventured into the realm of multimodal AI, weaving together language and visual modalities. Turing, with its innovative multi-modal Heron framework, is enhancing LLMs with the ability to interpret and navigate the visual landscape. Preferred Networks (PFN) has crafted a general-purpose vision FM that can seamlessly integrate and process both textual and visual information. As part of their future work, PFN will continue to develop multi-modal FMs based on PLaMo LLM, using the development method established in the LLM Program.
Linguistically-diverse models: The program participants also experimented with the training data, changing the ratio of English to Japanese or using training corpus in other languages. CyberAgent used Trainium to evaluate LLM performance when changing the ratio of Japanese to English included in training data, and expanded to grouped query attention (GQA) and verified architectures such as RetNet and Sparse Mixture of Experts (MoE) for their use cases. Using Trainium, Rinna built Nekomata 14B, based on the Qwen model trained on Chinese and English, by continued pre-training with 66-billion-token Japanese data, in just 6.5 days. Ubitus developed and released Taiwan LLM 13B (Taiwan-LLM-13B-v2.0-base) through joint research with National Taiwan University.
Fueling generative AI innovation in Japan
From startups to enterprises, organizations of all sizes have successfully trained their generative AI foundation models and large language models in the LLM Program. This testament to the program’s success was further underscored by the involvement and support of Japan’s Ministry of Economy, Trade, and Industry (METI). Several of the LLM Program participants will continue to develop their FMs and LLMs as part of the Generative AI Accelerator Challenge (GENIAC), where AWS will provide compute resources as METI announced and described in AWS Japan blog.
AWS will continue to support companies and organizations in their efforts to deploy these transformative models and bring generative AI innovation into real-world applications. We see the immense potential of FMs and LLMs to bolster Japan’s national strengths if implemented widely across various sectors. From a global perspective, AWS is committed to facilitate the development and adoption of these technologies worldwide, driving innovation and progress that will shape the future.
Visit AWS Trainium to learn how you can harness the power of purpose-built AI chips to build next-innovative foundation models while lowering costs.
This post is contributed by AWS LLM Development Support Program Executive Committee Yoshitaka Haribara, Akihiro Tsukada, Daishi Okada, Shoko Utsunomiya, and Technical Core Team Hiroshi Tokoyo, Keita Watanabe, and Masaru Isaka with the Executive Sponsorship represented by Yukiko Sato
About the Authors
Yoshitaka Haribara is a Senior Startup ML Solutions Architect at AWS Japan. In this role, Yoshitaka helps startup customers build generative AI foundation models and large language models on AWS, and came up with the idea of the LLM Program. In his spare time, Yoshitaka enjoys playing the drums.
Shruti Koparkar is a Senior Product Marketing Manager at AWS. She helps customers explore, evaluate, and adopt Amazon EC2 accelerated computing infrastructure for their machine learning needs.
Source link
lol