I recently needed to classify sentences for a particular use case at work. Remembering Jeremy Howard’s Lesson 4: Getting started with NLP for absolute beginners, I first adapted his notebook to fine-tune DEBERTA.
It worked, but not to my satisfaction, so I was curious what would happen if I used a LLM like LLAMA 3. The problem? Limited GPU resources. I only had access to a Tesla/Nvidia T4 instance.
Research led me to QLORA. This tutorial on Fine tuning LLama 3 LLM for Text Classification of Stock Sentiment using QLoRA was particularly useful. To better understand the tutorial, I adapted Lesson 4 into the QLORA tutorial notebook.
QLORA uses two main techniques:
- Quantization: Reduces model precision, making it smaller.
- LORA (Low-Rank Adaptation): Adds small, trainable layers instead of fine-tuning the whole model.
This allowed me to train LLAMA 3 8B on a 16GB VRAM T4, using about 12GB of VRAM. The results were surprisingly good, with prediction accuracy over 90%.
Confusion Matrix:
[[83 4]
[ 4 9]]
Classification Report:
precision recall f1-score support
0.0 0.95 0.95 0.95 87
1.0 0.69 0.69 0.69 13
accuracy 0.92 100
macro avg 0.82 0.82 0.82 100
weighted avg 0.92 0.92 0.92 100
Balanced Accuracy Score: 0.8231653404067196
Accuracy Score: 0.92
Here’s the iPython notebook detailing the process.
This approach shows it’s possible to work with large language models on limited hardware. Working with constraints often leads to creative problem-solving and learning opportunities. In this case, the limitations pushed me to explore and implement more efficient fine-tuning techniques.
Source link
lol