Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
More than 40% of marketing, sales and customer service organizations have adopted generative AI — making it second only to IT and cybersecurity. Of all gen AI technologies, conversational AI will spread rapidly within these sectors, because of its ability to bridge current communication gaps between businesses and customers.
Yet many marketing business leaders I’ve spoken to get stuck at the crossroads of how to begin implementing that technology. They don’t know which of the available large language models (LLMs) to choose, and whether to opt for open source or closed source. They’re worried about spending too much money on a new and uncharted technology.
Companies can certainly buy off-the-shelf conversational AI tools, but if they’re going to be a core part of the business, they can build their own in-house.
To help lower the fear factor for those opting to build, I wanted to share some of the internal research my team and I have done in our own search for the best LLM to build our conversational AI. We spent some time looking at the different LLM providers, and how much you should expect to fork out for each one depending on inherent costs and the type of usage you’re expecting from your target audience.
We chose to compare GPT-4o (OpenAI) and Llama 3 (Meta). These are two of the major LLMs most businesses will be weighing against each other, and we consider them to be the highest quality models out there. They also allow us to compare a closed source (GPT) and an open source (Llama) LLM.
How do you calculate LLM costs for a conversational AI?
The two primary financial considerations when selecting an LLM are the set up cost and the eventual processing costs.
Set up costs cover everything that’s required to get the LLM up and running towards your end goal, including development and operational expenses. The processing cost is the actual cost of each conversation once your tool is live.
When it comes to set up, the cost-to-value ratio will depend on what you’re using the LLM for and how much you’ll be using it. If you need to deploy your product ASAP, then you may be happy paying a premium for a model that comes with little to no set up, like GPT-4o. It may take weeks to get Llama 3 set up, during which time you could already have been fine-tuning a GPT product for the market.
However, if you’re managing a large number of clients, or want more control over your LLM, you may want to swallow the greater set up costs early to get greater benefits down the line.
When it comes to conversation processing costs, we will be looking at token usage, as this allows the most direct comparison. LLMs like GPT-4o and Llama 3 use a basic metric called a “token” — a unit of text that these models can process as input and output. There’s no universal standard for how tokens are defined across different LLMs. Some calculate tokens per word, per sub words, per character or other variations.
Because of all these factors, it’s hard to have an apples-to-apples comparison of LLMs, but we approximated this by simplifying the inherent costs of each model as much as possible.
We found that while GPT-4o is cheaper in terms of upfront costs, over time Llama 3 turns out to be exponentially more cost effective. Let’s get into why, starting with the setup considerations.
What are the foundational costs of each LLM?
Before we can dive into the cost per conversation of each LLM, we need to understand how much it will cost us to get there.
GPT-4o is a closed source model hosted by OpenAI. Because of this, all you need to do is set your tool up to ping GPT’s infrastructure and data libraries through a simple API call. There is minimal setup.
Llama 3, on the other hand, is an open source model that must be hosted on your own private servers or on cloud infrastructure providers. Your business can download the model components at no cost — then it’s up to you to find a host.
The hosting cost is a consideration here. Unless you’re purchasing your own servers, which is relatively uncommon to start, you have to pay a cloud provider a fee for using their infrastructure — and each different provider might have a different way of tailoring the pricing structure.
Most of the hosting providers will “rent” an instance to you, and charge you for the compute capacity by the hour or second. AWS’s ml.g5.12xlarge instance, for example, charges per server time. Others might bundle usage in different packages and charge you yearly or monthly flat fees based on different factors, such as your storage needs.
The provider Amazon Bedrock, however, calculates costs based on the number of tokens processed, which means it could prove to be a cost-effective solution for the business even if your usage volumes are low. Bedrock is a managed, serverless platform by AWS that also simplifies the deployment of the LLM by handling the underlying infrastructure.
Beyond the direct costs, to get your conversational AI operating on Llama 3 you also need to allocate far more time and money towards operations, including the initial selection and setting up a server or serverless option and running maintenance. You also need to spend more on the development of, for example, error logging tools and system alerts for any issues that may arise with the LLM servers.
The main factors to consider when calculating the foundational cost-to-value ratio include the time to deployment; the level of product usage (if you’re powering millions of conversations per month, the setup costs will rapidly be outweighed by your ultimate savings); and the level of control you need over your product and data (open source models work best here).
What are the costs per conversation for major LLMs?
Now we can explore the basic cost of every unit of conversation.
For our modeling, we used the heuristic: 1,000 words = 7,515 characters = 1,870 tokens.
We assumed the average consumer conversation to total 16 messages between the AI and the human. This was equal to an input of 29,920 tokens, and an output of 470 tokens — so 30,390 tokens in all. (The input is a lot higher due to prompt rules and logic).
On GPT-4o, the price per 1,000 input tokens is $0.005, and per 1,000 output tokens $0.015, which results in the “benchmark” conversation costing approximately $0.16.
GPT-4o input / output | Number of tokens | Price per 1,000 tokens | Cost |
Input tokens | 29,920 | $0.00500 | $0.14960 |
Output tokens | 470 | $0.01500 | $0.00705 |
Total cost per conversation | $0.15665 |
For Llama 3-70B on AWS Bedrock, the price per 1,000 input tokens is $0.00265, and per 1,000 output tokens $0.00350, which results in the “benchmark” conversation costing approximately $0.08.
Llama 3-70B input / output | Number of tokens | Price per 1,000 tokens | Cost |
Input tokens | 29,920 | $0.00265 | $0.07929 |
Output tokens | 470 | $0.00350 | $0.00165 |
Total cost per conversation | $0.08093 |
In summary, once the two models have been fully set up, the cost of a conversation run on Llama 3 would cost almost 50% less than an equivalent conversation run on GPT-4o. However, any server costs would have to be added to the Llama 3 calculation.
Keep in mind that this is only a snapshot of the full cost of each LLM. Many other variables come into play as you build out the product for your unique needs, such as whether you’re using a multi-prompt approach or single-prompt approach.
For companies that plan to leverage conversational AI as a core service, but not a fundamental element of their brand, it may well be that the investment of building the AI in-house simply isn’t worth the time and effort compared to the quality you can get from off-the-shelf products.
Whatever path you choose, integrating a conversational AI can be incredibly useful. Just make sure you’re always guided by what makes sense for your company’s context, and the needs of your customers.
Sam Oliver is a Scottish tech entrepreneur and serial startup founder.
DataDecisionMakers
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!
Source link lol