From the course: Generative AI: Working with Large Language Models

Unlock the full course today

Join today to access over 23,100 courses taught by industry experts.

Megatron-Turing NLG Model

Megatron-Turing NLG Model

- [Instructor] A lot of the research after GPT-3 was released seemed to indicate that scaling up models improved performance. So Microsoft and Nvidia partnered together to create the Megatron-Turing NLG model, a massive three times more parameters than GPT-3. Modelwise, the architecture uses the transformers decoder just like GPT-3, but you can see that it has more layers and more attention heads than GPT-3. So for example, GPT-3 has 96 layers while as Megatron-Turing's NLG has 105. GPT-3 has 96 attention heads, and Megatron-Turing's NLG model has 128 and finally, Megatron-Turing's NLG model has 530 billion parameters versus GPT-3's 175 billion. Now, the researchers identified a couple of challenges with working with large language models. It's hard to train big models because they don't fit in the memory of one GPU because it would take a long time to do all the compute operations required. Efficient parallel…

Contents