finetuning

Fine-tuning Llama 3.1 with Long Sequences

Fine-tuning Llama 3.1 with Long Sequences

We are excited to announce that Mosaic AI Model Training now supports the full context length of 131K tokens when fine-tuning the Meta Llama 3.1 model family. With this new capability, Databricks customers can build even higher-quality Retrieval Augmented Generation (RAG) or tool use systems by using long context length enterprise data to create specialized models.The size of an LLM’s input prompt is determined by its context length. Our customers are often limited by short context lengths, especially in use cases like RAG and multi-document analysis. Meta Llama 3.1 models have a long context length of 131K tokens. For comparison,…
Read More
Pre-training, Fine-tuning, and Transfer learning. To make these ideas more relatable, let’s use a real-world analogy

Pre-training, Fine-tuning, and Transfer learning. To make these ideas more relatable, let’s use a real-world analogy

Pre-training is like the education students receive in school. Just as teachers train students in a broad range of subjects, a model is pre-trained on a vast amount of data, learning general knowledge. This foundational learning requires significant effort and resources, similar to the years of schooling and the dedication of many teachers. Fine-tuning occurs after students graduate from school and choose a specialized field, such as medicine, engineering, or law. In this phase, they receive targeted training in their chosen domain, much like how a pre-trained model is fine-tuned for specific tasks. Before this specialization, the students (or the…
Read More
Training Diffusion Models with  Reinforcement Learning

Training Diffusion Models with Reinforcement Learning

Training Diffusion Models with Reinforcement Learning replay Diffusion models have recently emerged as the de facto standard for generating complex, high-dimensional outputs. You may know them for their ability to produce stunning AI art and hyper-realistic synthetic images, but they have also found success in other applications such as drug design and continuous control. The key idea behind diffusion models is to iteratively transform random noise into a sample, such as an image or protein structure. This is typically motivated as a maximum likelihood estimation problem, where the model is trained to generate samples that match the training data as closely as…
Read More
Rethinking the Role of PPO in RLHF

Rethinking the Role of PPO in RLHF

Rethinking the Role of PPO in RLHF TL;DR: In RLHF, there’s tension between the reward learning phase, which uses human preference in the form of comparisons, and the RL fine-tuning phase, which optimizes a single, non-comparative reward. What if we performed RL in a comparative way? Figure 1: This diagram illustrates the difference between reinforcement learning from absolute feedback and relative feedback. By incorporating a new component - pairwise policy gradient, we can unify the reward modeling stage and RL stage, enabling direct updates based on pairwise responses. Large Language Models (LLMs) have powered increasingly capable virtual assistants, such as…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.