Monday, September 9, 2024

The Invention of Large Language Models

Most people think that LLMs were invented by Google in 2016, or by OpenAI several years later.

AI expert Andrej Karpathy wrote on his blog:

The Unreasonable Effectiveness of Recurrent Neural Networks

May 21, 2015

The concept of attention is the most interesting recent architectural innovation in neural networks.

He constructs some LLMs, and his results seem pitiful compared to what is done today, but I would say he has a proof of concept.

Google introduced the transformer in Attention Is All You Need, a 2017 paper. As you can see, attention was already a hot idea at the time.

I am not sure who should get credit for inventing LLM. The basic ideas of neural nets go back decades. They got a whole lot smart when gaming GPU chips became fast and widely available, and AI researchers figured out how to use them efficiently.

1 comment: