LLMs: the hype is real… but where’s the ROI?

Exploring why some use cases still miss the mark—and what it’ll take to turn LLM hype into business value.

The Promise of LLMs

Reflecting on my PhD work 10 years ago, I was helping a startup develop an expert Q&A system. The goal was to create smart recommendations that could understand user intent and learning objectives, then suggest the next meaningful question or topic. Without transformers back then, I used traditional NLP techniques, ontologies, and Hierarchical Hidden Markov Models (HHMMs) to build a complex solution. It delivered somewhat acceptable results and validated some hypotheses but was far from production-ready.

Today, I could probably build a proof of concept that solves this problem in just 30 minutes using LLMs. Transformers have revolutionized text and speech processing, not only improving performance but solving problems that once seemed unsolvable.

The Current Landscape

Despite the rapid advancement of LLMs and their adoption across industries, most projects don’t make it past the proof-of-concept (PoC) phase. In fact, according to a Gartner report from 2018, 85% of AI projects fail, and earlier this year it estimated that 30% of Generative AI projects to be abandoned at PoC stage by 2025 [1].

Why? There are many reasons: poor data quality, escalating costs or unclear business value, complexity in making solutions production-ready (e.g., dealing with hallucinations, addressing security and privacy concerns, etc.) or lack of engineering capabilities to deploy and manage such solutions at scale.

But most importantly, many use cases simply don’t deliver meaningful ROI.

The Use Case Everyone is Trying

Let’s consider a typical use case many organizations are adopting today: Employees spend a significant amount of time searching through internal documents, guidelines, and regulations to perform daily tasks. To solve this, organizations implement a Retrieval-Augmented Generation (RAG) solution, enabling employees to chat with the knowledge base and ask questions directly.

A Rudimentary ROI Calculation

On paper, implementing an LLM-based solution like a RAG system seems like a great way to boost efficiency. Imagine an employee spends two hours a day searching for relevant information to complete their tasks. If this LLM-powered solution reduces that time to just 10 minutes, it saves roughly 37 hours per month—almost an entire workweek. This can make organisations think they will save on payroll by reducing the need for as many employees to perform the same amount of work.

However, the reality is often more complex. Payroll savings often don’t materialise as expected. Reducing the time it takes to find information doesn’t necessarily mean companies will reduce staff. For many organisations, headcount remains stable because the tasks employees are freed from are often low-level, administrative tasks, not critical or high-value work that can lead to direct cost reductions. The employee may now spend that time on other tasks, but it’s not always easy to quantify the added value this shift in focus brings.

Moreover, LLM solutions bring significant ongoing costs. For example, let’s assume you implement an enterprise chat solution in a RAG setup using Azure services (e.g. Azure OpenAI, Azure AI Search, Azure Machine Learning, Azure App Services, etc.). You have 2000 employees using the solution daily. If each of them performs five queries per day, with an average input of 100 tokens and output of 1000 tokens per query, the cost can add up quickly. Using GPT-4 via the Azure OpenAI API, for example, this could amount to around $5,000 per month or $60,000 per year. Now, compare that cost to the potential savings from reduced manual search times: even if each employee’s productivity improves, the financial gain may not offset the expense of running the LLM solution.

There are additional hidden costs too. Implementing and maintaining such a system requires significant engineering resources. You need a dedicated team to ensure the solution runs smoothly, address system issues, and manage updates, security concerns, and potential downtime. Furthermore, LLMs are prone to issues like hallucinations (providing incorrect or misleading information), which can lead to costly errors if not handled correctly. In many cases, organisations find themselves investing in extra layers of validation and quality control to catch these mistakes, further driving up the total cost of ownership.

In this scenario, while the goal is to reduce manual work, the cost-benefit equation can become lopsided. The anticipated payroll savings don’t happen, and while productivity may increase, it doesn’t translate directly into measurable financial gain significant enough to justify the LLM’s operational costs. As it stands today, the ROI often doesn’t stack up in favor of such implementations, as such solutions target often low-level, administrative tasks, not critical or high-value work that impact the core value chain.

LLM Use Cases That Drive Revenue: Impacting the Core Business

While many LLM implementations focus on improving internal productivity, the real value lies in use cases that directly impact a company's core value chain, generating new revenue streams.

Take a retail business, for instance. Instead of just using LLMs internally to optimise tasks like document retrieval, the company integrates LLMs into its customer-facing applications. By analysing vast amounts of data on customer preferences, purchase history, and real-time behaviour, an LLM can craft highly personalised product recommendations, offer dynamic pricing based on customer profiles, and provide natural language support that is both context-aware and highly responsive.

In this scenario, the LLM isn't just making employees more efficient—it's directly driving more revenue by enhancing the customer’s shopping experience. A well-designed LLM solution can lead to higher conversion rates, larger average order values, and improved customer retention. This kind of personalization, powered by LLMs, goes far beyond traditional recommendation engines and creates a competitive edge for the business.

Moreover, by offering 24/7, highly intelligent customer service through AI-driven chatbots, companies can drastically reduce support costs while increasing customer satisfaction and retention. These chatbots can handle nuanced customer inquiries, make relevant product suggestions, and even assist with troubleshooting—all of which drive the customer closer to completing a purchase or staying loyal to the brand.

The ROI here is clear: LLMs directly generate new revenue streams by driving customer engagement and increasing sales, impacting the core business rather than merely reducing internal operational costs.

The Gap to Bridge

As organizations embark on their LLM journey, it's crucial to approach implementation with a discerning eye. Not all use cases will drive tangible value, making it imperative to rigorously validate ROI before committing resources to a production solution.

The deployment and maintenance of LLM solutions in production environments present unique challenges, as Generative AI introduces new engineering concepts with limited standardization or established best practices. Nevertheless, this new field gives everyone an equal chance, as we're all learning at the same time.

It's important to recognize that while LLM-based chatbots offer powerful capabilities, they're not a universal solution. Our perception of how LLMs can reshape human-software interaction is often constrained by our imagination, leading to potential misapplications. For example, not every document processing challenge necessitates a chatbot with a RAG implementation. Just because we have a hammer doesn’t mean every problem is a nail.

What’s Needed to Overcome These Challenges

To make LLMs more viable for value-generating production use cases, several things need to happen. First, LLM API calls must become cheaper, as the current costs often outweigh the value generated by many use cases. Additionally, LLMs themselves need to get smaller and more powerful, improving efficiency without sacrificing capability. The reliability and determinism of outputs also need significant enhancement to ensure consistency, especially in business-critical environments.

On the technical side, we need better tools for development, deployment, and production management. This includes mature MLOps (machine learning operations) and FinOps (financial operations) practices, as well as more options for self-hosting LLMs, which would allow companies to have greater control over their models and infrastructure. Migration between different LLM models needs to become more seamless to accommodate evolving business needs, and there must be a stronger focus on monitoring security and privacy concerns to protect sensitive data in LLM implementations.

Final Thoughts

LLMs are like new toys in a kindergarten. Right now, we're still poking at them, turning them over, figuring out what they can do. But just as children grow into master builders, we too will learn how to harness the full potential of these models. The journey to maturity in LLM applications may take another few years, but with every experiment, every lesson, we’re getting closer to use cases that generate real value.

With Generative AI, we’re on the cusp of defining new user experiences, engineering best practices and shaping the policies that will ensure AI’s safe and responsible integration into our lives. The future isn’t just about solving today’s problems but unlocking possibilities we can barely imagine. Those who keep pushing forward, testing, failing, and learning will not only ride the wave of this revolution but shape its course. The next chapter of AI isn’t about waiting—it’s about building the future now.