From the course: Generative AI: Working with Large Language Models

Unlock the full course today

Join today to access over 23,100 courses taught by industry experts.

Going further with Transformers

Going further with Transformers

- [Jonathan] We've covered a ton of material in this course. We've looked at many of the large language models since GPT-3. Let's review them really quickly. We saw how Google reduced training and inference costs by using sparse mixtures of expert models with GLaM. A month later, Microsoft teamed up with Nvidia to create the Megatron Turing LG model that was three times larger than GPT-3 with 530 billion parameters. In the same month, the DeepMind team released Gofer and their largest 280 billion parameter model which was their best performing model. A few months later, the DeepMind team introduced Chinchilla, which turned a lot of our understanding of large language models on its head. The main takeaway was that large language models up to this point had been undertrained. Google released the 540 billion parameter modeled PaLM in April training it on their Pathways infrastructure, and this has been the best performing…

Contents