Inflection AI helps address RLHF uniformity issues with unique models for enterprise, agentic AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

A recent exchange on X (formerly Twitter) between Wharton professor Ethan Mollick and Andrej Karpathy, the former Director of AI at Tesla and co-founder of OpenAI, touches on something both fascinating and foundational: many of today’s top generative AI models — including those from OpenAI, Anthropic, and Google— exhibit a striking similarity in tone, prompting the question: why are large language models (LLMs) converging not just in technical proficiency but also in personality?

The follow-up commentary pointed out a common feature that could be driving the trend of output convergence: Reinforcement Learning with Human Feedback (RLHF), a technique in which AI models are fine-tuned based on evaluations provided by human trainers.

Building on this discussion of RLHF’s role in output similarity, Inflection AI’s recent announcements of Inflection 3.0 and a commercial API may provide a promising direction to address these challenges. It has introduced a novel approach to RLHF, aimed at making generative models not only consistent but also distinctively empathetic.

With an entry into the enterprise space, the creators of the Pi collection of models leverage RLHF in a more nuanced way, from deliberate efforts to improve the fine-tuning models to a proprietary platform that incorporates employee feedback to tailor gen AI outputs to organizational culture. The strategy aims to make Inflection AI’s models true cultural allies rather than just generic chatbots, providing enterprises with a more human and aligned AI system that stands out from the crowd.

Inflection AI wants your work chatbots to care

Against this backdrop of convergence, Inflection AI, the creators of the Pi model, are carving out a different path. With the recent launch of Inflection for Enterprise, Inflection AI aims to make emotional intelligence — dubbed “EQ” — a core feature for its enterprise customers.

The company says its unique approach to RLHF sets it apart. Instead of relying on anonymous data-labeling, the company sought feedback from 26,000 school teachers and university professors to aid in the fine-tuning process through a proprietary feedback platform. Furthermore, the platform enables enterprise customers to run reinforcement learning with employee feedback. This enables subsequent tuning of the model to the unique voice and style of the customer’s company.

Inflection AI’s approach promises that companies will “own” their intelligence, meaning an on-premise model fine-tuned with proprietary data that is securely managed on their own systems. This is a notable move away from the cloud-centric AI models many enterprises are familiar with — a setup Inflection believes will enhance security and foster greater alignment between AI outputs and the ways people use it at work.

What RLHF is and isn’t

RLHF has become the centerpiece of gen AI development, largely because it allows companies to shape responses to be more helpful, coherent, and less prone to dangerous errors. OpenAI’s use of RLHF was foundational to making tools like ChatGPT engaging and generally trustworthy for users. RLHF helps align model behavior with human expectations, making it more engaging and reducing undesirable outputs.

However, RLHF is not without its drawbacks. RLHF was quickly offered as a contributing reason to a convergence of model outputs, potentially leading to a loss of unique characteristics and making models increasingly similar. Seemingly, alignment offers consistency, but it also creates a challenge for differentiation.

Previously, Karpathy himself pointed out some of the limitations inherent in RLHF. He likened it to a game of vibe checks, and stressed that it does not provide an “actual reward” akin to competitive games like AlphaGo. Instead, RLHF optimizes for an emotional resonance that’s ultimately subjective and may miss the mark for practical or complex tasks.

From EQ to AQ

To mitigate some of these RLHF limitations, Inflection AI has embarked on a more nuanced training strategy. Not only implementing improved RLHF, but it has also taken steps towards agentic AI capabilities, which it has abbreviated as AQ (Action Quotient). As White described in a recent interview, Inflection AI’s enterprise aims involve enabling models to not only understand and empathize but also to take meaningful actions on behalf of users — ranging from sending follow-up emails to assisting in real-time problem-solving.

While Inflection AI’s approach is certainly innovative, there are potential short falls to consider. Its 8K token context window used for inference is smaller than what many high-end models employ, and the performance of their newest models has not been benchmarked. Despite ambitious plans, Inflection AI’s models may not achieve the desired level of performance in real-world applications.

Nonetheless, the shift from EQ to AQ could mark a critical evolution in gen AI development, especially for enterprise clients looking to leverage automation for both cognitive and operational tasks. It’s not just about talking empathetically with customers or employees; Inflection AI hopes that Inflection 3.0 will also execute tasks that translate empathy into action. Inflection’s partnership with automation platforms like UiPath to provide this “agentic AI” further bolsters their strategy to stand out in an increasingly crowded market.

Navigating a post-Suleyman world

Inflection AI has undergone significant internal changes over the past year. The departure of CEO Mustafa Suleyman in Microsoft’s “acqui-hire,” along with a sizable portion of the team, cast doubt on the company’s trajectory. However, the appointment of White as CEO and a refreshed management team has set a new course for the organization.

After an initial licensing agreement with the Redmond tech giant, Inflection AI’s model development was forked by the two companies. Microsoft continues to build on a version of the model focused on integration with its existing ecosystem. Meanwhile, Inflection AI continued to independently evolve Inflection 2.5 into today’s 3.0 version, distinct from Microsoft’s.

Pi’s… actually pretty popular

Inflection AI’s unique approach with Pi is gaining traction beyond the enterprise space, particularly among users on platforms like Reddit. The Pi community has been vocal about their experiences, sharing positive anecdotes and discussions regarding Pi’s thoughtful and empathetic responses.

This grassroots popularity demonstrates that Inflection AI might be on to something significant. By leaning into emotional intelligence and empathy, Inflection is not only creating AI that assists but also AI that resonates with people, whether in enterprise settings or as personal assistants. This level of user engagement suggests that their focus on EQ could be the key to distinguishing themselves in a landscape where other LLMs risk blending into one another.

What’s next for Inflection AI

Moving forward, Inflection AI’s focus on post-training features like Retrieval-Augmented Generation (RAG) and agentic workflows aims to keep their technology at the cutting edge of enterprise needs. Inflection AI says the ultimate goal is to usher in a post-GUI era, where AI isn’t just responding to commands but actively assisting with seamless integrations across various business systems.

The jury’s still out on whether Inflection AI’s novel approach will significantly enhance output similarity. However, if White and his team’s innovative ideas bear fruit, EQ could emerge as a pivotal metric for evaluating the effectiveness of your company’s generative technology.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link lol