Understanding LLMs Through the Analysis of Search Engines

Feb 28, 2024

TLDR:

While LLMs and search engines are different products, analyzing search engines can provide insights into the LLM business.
Like search engines, LLM companies may achieve monopoly not through a single technology, but through a vast user base. Vast user base combined with an efficient engineer creates the moat.
Open source solutions offer a quick starting point for newcomers, but do not rival the capabilities of leading LLM players, due to limitations in learning from and serving a vast user base.

Introduction

This article represents a rare exploration into the alignment of technology, product, and market for generative AI. We are particularly focused on examining the products for people's information needs through a historical lens.

Over the last twenty years, Google has evolved into an AI behemoth, surpassing a $1 trillion valuation. The question now arises: Could OpenAI—or another large language model (LLM) provider—achieve the same financial milestone?

Some speculate that LLMs could replace search engines. However, evidence suggests that LLMs and search engines serve distinct purposes. LLMs excel in creativity and generating expert-like advice, albeit with a tendency for hallucination. Search engines, on the other hand, thrive on aggregating information and offering real-time updates, drawing on collective crowd knowledge.

Why a search engine monopoly exists

Over the past two decades, Google has amassed billions of users. Its rise to prominence is largely attributed to its ability to fulfill a growing need among internet users: efficiently navigating the ever-expanding sea of online content. Crucially, Google pioneered an ingenious business model centered around search advertising. As users input queries, Google not only presents organic results but also integrates sponsored content. This approach harmoniously aligns the interests of users and advertisers: the higher the search quality, the more valuable the advertising space becomes, driving up Google’s revenue.

However, this doesn’t fully explain Google's dominance in the search engine market. Numerous competitors, ranging from established corporations to startups, have attempted to dethrone Google. This includes early contenders like Yahoo!, Inktomi, and AltaVista, as well as recent challengers like DuckDuckGo, Neeva, You.com, and Bing. Yet, as of 2023, the consensus is that no other company poses a significant threat to Google's leadership.

Why does Google hold a monopolistic position? Several factors contribute to this phenomenon, with one of the most significant being the superior quality of Google's data compared to that of its competitors. Owing to its vast user base, Google garners the most robust user feedback signals. These signals, reflecting collective user behavior—such as the most frequented websites and the trending pages over a given period—are best interpreted by Google. Additionally, Google enjoys a competitive edge in minimizing operational costs. It leverages various strategies, including but not limited to data centers, caching/batching requests, and accelerators, to manage scalable requests efficiently. Notably, strategies that might seem inconsequential at a smaller scale can lead to considerable cost savings at Google's scale, benefiting them significantly over smaller rivals.

Will the new technology change the search monopoly?

In more recent developments, technologies such as retrieval-augmented generation (RAG) and search generative experience (SGE) have garnered significant interest. A number of new startups (including Perplexity AI and You.com) have been working to develop chat-based or RAG based interfaces as a new search experience. These startups are still in the early stage, but we can draw the following conclusions:

Chat-based LLMs have outperformed search engines in niche areas like coding help and recipe suggestions, especially when the knowledge is stable and the user wants to know the expert knowledge
However, the new RAG or chat-based techniques are far from impacting search advertising. On the contrary, most of the startups have not received much revenue from search ads. Instead, these startups very likely will consider other business models such as subscription or enterprise customers.

We still believe that the search engine monopoly will exist in the foreseeable future. Google’s dominant position in search ads will likely not change as long as it can keep its enormous user base.

There is a possibility that search engines will decline. However, we realize that new technology may not be enough to disrupt the search engine industry: Google can also adapt the SGE technology and improve its product. After all, the LLM-based approach is fundamentally different from the search engine experience. For venture capitalists, the pertinent question isn't whether search engines will decline, but how LLMs will ascend.

The moat for LLM Companies: vast user base combined with an efficient engineering team.

Search Engines and LLMs are two different products. However, both AI products can benefit greatly from the vast user base. Similar to the success in search engines, an LLM company with a vast user base can improve its quality and reduce its serving cost significantly faster than its competitors with a smaller number of users. Thanks to its substantial user base, OpenAI leads the advantage among LLM companies. We can list a few as below:

OpenAI has addressed a serving challenge that others have not. Unlike most LLM companies, which have not reached millions of users or tested their infrastructure at scale, OpenAI stands out. It is likely the only company to have successfully deployed MoE LLMs on a large scale, a task more complex than managing dense LLMs.
OpenAI has also gained significant insights from its user base. While the broader LLM community focuses on public benchmarks, OpenAI leverages real user feedback to refine metrics that matter most to their audience. This approach has widened its lead over competitors.

Furthermore, once the LLM company claims dominance in its user base, it can find a lot of ways to improve its monopoly. Recall that after its initial success, Google ventured into further consolidating its dominance through investments in Chrome, Android, among other technologies, which further consolidate its monopoly. OpenAI can do similar things, for example, build new ecosystems or even design new chips. Other competitors may not have the same opportunity, since they are serving fewer customers and do not have the resources to afford strategic investments.

What about the open-source solution?

Huggingface's community is productive, and its model hub is rapidly advancing. Yet, most models focus on public benchmarks rather than real-world applications, making it difficult to adapt them to the needs of a million users. The evaluation of LLMs is complex, and leading companies cannot share user data for legal and business reasons, widening the gap between open-source solutions and top LLM companies.

Open-source solutions offer a starting point for customized or in-app search functions. Standalone applications like TikTok, Instagram, Pinterest, and Slack may develop their own LLMs or multimodal GPTs to maintain user privacy and control recommendations, without relying on external crawlers like Google or OpenAI.

The slow growth in enterprise search and the absence of dominant vertical search engines in areas like finance and medicine suggest potential challenges for universally accessible LLMs like OpenAI's GPT-4 in these domains. It remains to be seen whether open-source LLMs will find success here, highlighting a space where Google and others have not rapidly expanded.

Maybe a good example is Lucene, a popular open-source search engine solution which serves many use cases but cannot match Google's scale. Similarly, we expect open-source Large Language Models (LLMs) to emerge but not rival the capabilities of leading market players, due to limitations in learning and serving at scale.

The biggest uncertainty for LLM monopoly

The key question is whether LLM companies can sustain their growth and meet customer expectations. Unlike traditional search engine users, who don't expect perfect answers from Google and are willing to explore links for useful information, LLM users have higher standards. For instance, paying $20 for OpenAI services, users anticipate a highly capable AI assistant. Consequently, while some users are impressed with LLMs' abilities, others struggle to use them effectively due to their limitations. Currently, LLMs' user base is in its infancy, leading to significant instability despite its critical importance for LLM companies.

Unlike search engines, which generate income through ads, LLMs rely on subscription fees. This model diverges from Google's approach of supporting free search access through advertising revenue, proposing a new vision where every individual shall pay $20 to hire an AI assistant.

We believe OpenAI will keep growing. But the biggest uncertainty is whether OpenAI or another LLM company will obtain the monopoly and become the next Google or even surpass it.

We envision two possible futures: (1) A single LLM company emerges as a monopoly through continuous innovation and builds the moat by a vast user base. (2) Innovation among LLM companies stagnates, making it difficult to distinguish one from another (e.g., comparing GPT4 to Bard becomes challenging), resulting in no clear market leaders. In this scenario, LLM providers face intense competition, and a large company could prevail by integrating LLMs into its existing product lineup.

Acknowledgment: the author is very thankful for the many discussions and feedback from our friends, including Roger Luo, Yantao Zheng, Yiwen Rong, Zicheng Liu, and many others. Our views and opinions expressed in this article may change quickly and do not necessarily reflect the views or positions of any entities they represent.

Zhi Ouyang

Mar 4, 2024

Good writing. One extra thought: the slow growth in enterprise search and the absence of dominant vertical search engines will be different on applying LLM to enterprise/verticals. A more capable LLM has the ability to improve vertical LLM a lot without much customization (other than some general approach to data and alignment). However a more capable search doesn't always propagate their advantages when restricted to a private data corpus.

However, the biggest uncertainty is still the commercialization model. The subscription + enterprise API feels just the beginning, cannot wait to see how it evolves to be the real money making machine.

Expand full comment

Embedding VC’s Substack

Discussion about this post

Ready for more?