“We are delving deeper into the capabilities of MLFlow tracing. This functionality will be instrumental in diagnosing performance issues and enhancing the quality of responses from our Customer Call Support chatbot. Additionally, we are working on several exciting projects, including establishing a feedback loop for our wildfire LLM and implementing more agent-based RAG initiatives. Our goal is also to make LLMs more accessible across Xcel, enabling teams to utilize them for tasks such as tagging, sentiment analysis, and any other applications they might need.” – Blake Kleinhans, Senior Data Scientist, Xcel Energy
Introduction
Xcel Energy is a a leading electric and natural gas energy company serving 3.4 million electricity customers and 1.9 million natural gas customers across eight states: Colorado, Michigan, Minnesota, New Mexico, North Dakota, South Dakota, Texas and Wisconsin. Xcel Energy wanted to build a Retrieval-Augmented Generation (RAG) architecture-based chatbot leveraging Databricks Mosaic AI to assist them with streamlining operations and better serving their customers. Xcel Energy’s data scientists identified several high-value use cases to test, including rate case reviews, legal contracts reviews, and analysis of earnings call reports. As an example, as the cost of energy fluctuates, Xcel Energy must recalibrate their rates to align with market factors, a process that could take several months. Meanwhile, Xcel Energy’s leadership was eager to gain insights from earnings call reports without searching through hundreds of pages of PDFs, and their legal team wanted quick access to details from customer contracts.
The data team’s goal was to implement a scalable and efficient generative AI system that could retrieve relevant data from a large document corpus and generate accurate, context-aware responses using large language models (LLMs). The Databricks Data Intelligence Platform’s capabilities streamlined every phase of the development, from data governance and model integration to monitoring and deployment. Now, rate cases based on a review of complex documentation, including energy price reports and government regulations, take 2 weeks instead of up to 6 months.
“Databricks enabled rapid development and deployment of our RAG-based chatbots, significantly improving our time to value. The platform seamlessly integrated with our internal data sources and existing dashboard tools, allowing our team to focus on improving quality rather than setting up infrastructure from scratch. Additionally, Databricks made it easy for us to experiment with different embeddings and language models to achieve the best performance possible.” – Blake Kleinhans, Senior Data Scientist, Xcel Energy
Data Management and Preparation
A critical first step in the project was establishing effective methods for data governance and management. As a utility provider, Xcel Energy had to ensure strict security and governance to avoid any risk of leaking sensitive or proprietary data. Each use case required a variety of documents, some public (earnings reports) and some sensitive (legal contracts). Databricks Unity Catalog enabled centralized data management for both structured and unstructured data, including the document corpus for the chatbot’s knowledge base. It provided fine-grained access controls that ensured that all data remained secure and compliant, a significant advantage for projects involving sensitive or proprietary data.
To keep their Generative AI platform up-to-date, relevant data needed to be made available in the RAG-based chatbot as soon as it was ingested. For data preparation, Databricks Notebooks and Apache Spark™ were leveraged to process large datasets from diverse sources, including government websites, legal documents, and internal invoices. Spark’s distributed computing capabilities allowed the team to ingest and preprocess documents rapidly into their data lake, enabling Xcel Energy to transfer large data workflows into a Vector Store in minimal time.
Embedding Generation and Storage
Embeddings were critical to the retrieval mechanism of the RAG architecture. The team utilized Databricks Foundation Model APIs to access state-of-the-art embedding models such as databricks-bge-large-en and databricks-gte-large-en which provided high-quality vector representations of the document corpus. These embeddings eliminated the need to deploy or manage model infrastructure manually, simplifying the process of embedding generation.
The embeddings were then stored in Databricks Vector Search, a serverless and highly scalable vector database integrated within the Databricks environment. This ensured efficient similarity search, which formed the backbone of the retrieval component of the chatbot. The seamless integration of Vector Search within the Databricks ecosystem significantly reduced infrastructure complexity.
LLM Integration and RAG Implementation
Xcel was able to test different LLMs using Databricks Foundation Model APIs. These APIs provide access to pretrained, state-of-the-art models without the overhead of managing deployment or compute resources. This ensured that the LLMs could be easily incorporated into the chatbot, providing robust language generation with minimal infrastructure management.
Their initial deployment was with Mixtral 8x7b-instruct with 32k token length after trying Llama 2 and DBRX models. Mixtral, a sparse mixture of experts (SMoE) model, matched or outperformed Llama 2 70B and GPT 3.5 on most benchmarks while being four times faster than Llama 70B on inference. Xcel Energy prioritized output quality and used Mixtral until switching to Anthropic’s Claude Sonnet 3.5 in AWS Bedrock, accessed in Databricks via Mosaic AI Gateway and Vector Search for RAG.
The RAG pipeline was built using LangChain, a powerful framework that integrates seamlessly with Databricks’ components. By utilizing Databricks Vector Search for similarity search and combining it with LLM query generation, the team built an efficient RAG-based system capable of providing context-aware responses to user queries. The combination of LangChain and Databricks simplified the development process and improved system performance.
Experiment Tracking and Model Management with MLflow
The project fully utilized MLflow, a widely adopted open-source platform for experiment tracking and model management. Using MLflow’s LangChain integration, the team was able to log various configurations and parameters of the RAG model during the development process. This enabled versioning and simplified the deployment of LLM applications, providing a clear path from experimentation to production.
Additionally, AI Gateway allowed the team to centrally manage credentials and model access, enabling efficient switching between LLMs and controlling costs through rate limiting and caching.
Model Serving and Deployment
The deployment of the chatbot was streamlined using Databricks Model Serving. This serverless compute option provided a scalable and cost-effective solution for hosting the RAG-based chatbot, allowing the model to be exposed as a REST API endpoint with minimal setup. The endpoint could then be easily integrated into front-end applications, streamlining the transition from development to production.
Model Serving also enabled GPU-based scaling, reducing latency and operational costs. This scalability was crucial as the project expanded, allowing the chatbot to handle increasing user loads without significant architectural changes.
Monitoring and Continuous Improvement
Post-deployment, Databricks SQL was used to implement monitoring solutions. The team created dashboards that tracked essential metrics such as response times, query volumes, and user satisfaction scores. These insights were crucial for continuously improving the chatbot’s performance and ensuring long-term reliability.
By integrating monitoring into the overall workflow, the team was able to proactively address potential issues and optimize system performance based on real-time feedback.
Conclusion: Benefits of Databricks for GenAI Applications
The Databricks Data Intelligence Platform enabled rapid development and deployment of the RAG-based chatbot, significantly reducing the complexities typically associated with managing large-scale AI projects. The integration of tools like Unity Catalog, Foundation Model APIs, Vector Search, MLflow, and Model Serving provided a cohesive, end-to-end AI Agent System for building GenAI applications.
By focusing on scalability, infrastructure simplicity, and model governance, the platform allowed the team to concentrate on refining the RAG architecture and easily improve chatbot performance. The platform’s robust capabilities ensured that the project could scale efficiently as user demand increased, making Databricks an ideal choice for developing and deploying advanced GenAI applications. Xcel Energy’s data science team appreciated the freedom to easily upgrade to more advanced LLMs as they become available, without disrupting their entire architecture.
Looking ahead, Xcel Energy anticipates being able to further extend the use of GenAI tools across the company, democratizing access to data and insights.
Source link
lol