From the course: Generative AI: Working with Large Language Models

What are large language models?

- [Narrator] Have you seen the terms BERT or GPT-3 in articles online? These are examples of large language models and their underlying architecture is based on transformers. Transformers were proposed by a team of researchers from Google in 2017 in a paper called Attention is All You Need. This paper has been a turning point in NLP. Now, parameters are values in a model that are updated during the training of a model. Large language models have millions and often billions of such parameters and are trained on enormous amounts of data. Most of what we look at is focused on the two-year period since the release of GPT-3. So that's from May, 2020 to July of 2022. We'll cover models released by Google Research including GLaM and PaLM, Gopher and Chinchilla, that was released by Deep Mind, and the Megatron-Turing NLG by Microsoft and Nvidia. And finally, we'll wrap up with the work done by both Meta AI and Hugging Face to make large language models available to researchers outside of big tech. Meta AI released the open tree train transformer and Hugging Face coordinated a research effort with over 1000 researchers to create the blue model. As you can see, there's been a lot of activity with large language models since the release of GPT-3. Before we get into the architecture details of these models let's look at a couple of examples of where they're used in production.

Contents