(2023-05-05) Google: We Have No Moat And Neither Does OpenAI

Google leaker: "We Have No Moat, And Neither Does OpenAI". The text below is a very recent leaked document, which was shared by an anonymous individual on a public Discord server who has granted permission for its republication. It originates from a researcher within Google.

While we’ve been squabbling, a third faction has been quietly eating our lunch.

I’m talking, of course, about open source. Plainly put, they are lapping us. Things we consider “major open problems” are solved and in people’s hands today.

LLMs on a Phone

Scalable Personal AI:

Responsible Release: This one isn’t “solved” so much as “obviated”.

Multimodality

While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable.

At the beginning of March the open source community got their hands on their first really capable foundation model, as Meta’s LLaMA was leaked to the public. It had no instruction or conversation tuning, and no RLHF. Nonetheless, the community immediately understood the significance of what they had been given.

In many ways, this shouldn’t be a surprise to anyone

The current renaissance in open source LLMs comes hot on the heels of a renaissance in image generation.

In both cases, low-cost public involvement was enabled by a vastly cheaper mechanism for fine tuning called low rank adaptation, or LoRA, combined with a significant breakthrough in scale (latent diffusion for image synthesis, Chinchilla for LLMs)

LoRA is an incredibly powerful technique we should probably be paying more attention to

LoRA works by representing model updates as low-rank factorizations, which reduces the size of the update matrices by a factor of up to several thousand. This allows model fine-tuning at a fraction of the cost and time

Part of what makes LoRA so effective is that - like other forms of fine-tuning - it’s stackable

Large models aren’t more capable in the long run if we can iterate faster on small models

Data quality scales better than data size

Directly Competing With Open Source Is a Losing Proposition

Individuals are not constrained by licenses to the same degree as corporations

Being your own customer means you understand the use case

there is a vast outpouring of creativity, from anime generators to HDR landscapes. These models are used and created by people who are deeply immersed in their particular subgenre, lending a depth of knowledge and empathy we cannot hope to match.

Owning the Ecosystem: Letting Open Source Work for Us

Paradoxically, the one clear winner in all of this is Meta (Facebook). Because the leaked model was theirs, they have effectively garnered an entire planet's worth of free labor. Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products.

Google itself has successfully used this paradigm in its open source offerings, like Chrome and Android.

Epilogue: What about OpenAI?

All this talk of open source can feel unfair given OpenAI’s current closed policy. Why do we have to share, if they won’t? But the fact of the matter is, we are already sharing everything with them in the form of the steady flow of poached senior researchers. Until we stem that tide, secrecy is a moot point.


Edited:    |       |    Search Twitter for discussion