Modern AI R&D Differs from Classic Research
(image credit: made by FLUX.1)
The recent debate between Yann LeCun, Meta's Chief AI Scientist, and Elon Musk, CEO of Tesla and founder of xAI, has drawn significant attention due to their differing views on AI development and its risks.
Yann values "scientific novelty" and emphasizes the importance of "research freedom." In contrast, Elon believes in the "first principle" and prioritizes engineering and product development over research. Their differing views on R&D reflect the evolution of AI development.
Yann, a long-standing researcher with an excellent reputation, began his career in the late 1980s at AT&T Bell Labs. Bell Labs was renowned for its spirit of scientific discovery, contributing to ten Nobel Prizes. In those days, they believed great work was driven by individuals with liberal, courageous, and dedicated minds, not by management. Bell Labs is the birthplace of radio astronomy, the transistor, the laser, the photovoltaic cell, the charge-coupled device, information theory, and the Unix operating system. Most of these inventions were made by one or a few individuals.
But the world has changed. Now, at Tesla, hundreds of engineers work on the Full Self-Driving (FSD) system. Hundreds of researcher’s names are listed in Llama3 papers, similar to recent GPT and Gemini papers.
Why has this change occurred? One reason is the increasing complexity of modern AI research. For instance, training a state-of-the-art LLM requires multiple teams for:
Data collection and annotation
Pretraining and post-training
Human evaluation
Safety
Various downstream tasks
Another reason is that generative AI research and development now requires significantly more resources than before. A top-tier researcher might utilize hundreds or even thousands of GPUs, which translates to investments worth hundreds of millions of dollars. Additionally, the costs associated with data collection and model evaluation contribute to the dramatically increasing expenses.
A third reason is that much of AI development is becoming homogenized. As transformer models dominate the generative AI landscape, we see many state-of-the-art models, like Grok and Llama, exhibiting very similar architectures. This suggests that modern AI may have less differentiation, forcing many companies to compete intensely in areas like customization, cost, and other aspects.
Challenges to lead R&D in genAI
Leading research and development in generative AI presents unique challenges compared to traditional research institutions. In the past, organizations like AT&T or Microsoft Research adopted a strategy of giving researchers the freedom to explore diverse ideas. This approach resembled academic research, characterized by minimal top-down management and a focus on assembling smart individuals to independently discover new avenues of inquiry.
But nowadays companies seems not be able to afford the free research as before. Instead, leaders like Elon Mask have to proactively seek for resources, including data, computing resource, infrastructure to assembly a strong team in genAI. For example, Elon just spent billions of dollars to build Colossus GPU super computer within a year, which is probably exceed the execution speed of many for traditional research groups.
When acquiring substantial resources, modern generative AI leaders must balance these assets effectively. For example, if a company has a large number of GPU chips but lacks the infrastructure to support them, GPU utilization will be low. Conversely, a company with sufficient computing support but too many engineers competing for resources may find the outcomes of its generative AI efforts difficult to control. Moreover, most generative AI models today are data-hungry, meaning companies must develop robust data strategies to continuously improve their AI. Without such strategies, the potential of computing power and talent may not be fully realized. Resource acquisition and allocation are daunting tasks, requiring leaders to make tough decisions to integrate data, computing, and talent effectively.
Company leaders also need to carefully consider the balance between investment and returns. Given the enormous investments involved and the often unclear business models, generative AI leaders face significant pressure. While we anticipate new applications for generative AI in the long term, these developments take time to mature. In the near term, companies may update existing applications with new generative AI models to demonstrate the value of their investments. Not every company needs to train a foundation model; some startups may use open-source models and enhance user experience by fine-tuning these public models. However, it remains uncertain how much investment is required to maintain a competitive edge.
A crucial responsibility for company leaders is to build a culture that fosters generative AI development. Given the need to allocate resources and handle increasing and often unclear product requirements, there may be a temptation to create multiple teams for various functions and roles. However, this approach risks introducing bureaucracy. According to Parkinson's Law, work expands to fill the time and resources available for its completion, leading to inefficiencies as more staff are hired and more work is created to keep everyone busy.
As researchers consume more resources than ever, it’s essential to increase talent density and enhance productivity. Excessive management or control can harm the team before issues even become apparent. This suggests that a relatively flat hierarchy is beneficial in generative AI development. Core AI development should be driven by key talent rather than an abundance of managers. Leaders should focus on:
Reducing overhead in communication and engineering
Balancing the requirements between research and product development
Growing functional heads in data, infrastructure, and service, without expanding teams too quickly or excessively
The ideal culture provides the team with autonomy while also offering feedback for continuous improvement. Some researchers should rapidly embrace new technologies and aim for breakthroughs, while engineers focus on building high-quality products. Ultimately, the integration of research and product development should be smooth and efficient.
What does it mean to investors?
Modern AI research requires substantial investments, and strong leaders like Elon Musk and Mark Zuckerberg are often able to secure resources more quickly than others. For teams aiming to train a large LLM base model, having insufficient computing resources can lead to failure. It is relatively easy for investors to identify who possesses more computing power and better data, which can be seen as advantages in the AI race.
However, I would argue that culture is an equally important consideration. Over time, the value of computing resources will diminish as hardware efficiency continues to improve dramatically. Latecomers may be able to catch up at a lower cost. In contrast, a team with a strong culture can be more efficient and effective, attract top talent, and sustain healthy growth. Even a team with limited computing resources but a good culture can still excel in areas like post-training or other projects. Conversely, a team lacking a positive culture may struggle to function effectively or even collapse.
There are different types of strong cultures. For example, Yann LeCun advocates for a culture that promotes scientific novelty and open research, while Elon Musk seems to prefer a culture focused on engineering efficiency with minimal bureaucracy. Different startups may hold different beliefs, such as prioritizing scaling laws or user experience innovation, leading to various types of strong cultures. In all cases, building a strong culture requires significant effort, which is often overlooked.
Acknowledgment: the author is very thankful for the many discussions and feedback from my friends, especially Roger Luo and Yi Wang. Our views and opinions expressed in this article may change quickly and do not necessarily reflect the views or positions of any entities they represent.