From the course: Generative AI: Working with Large Language Models
Unlock the full course today
Join today to access over 23,100 courses taught by industry experts.
Scaling laws
From the course: Generative AI: Working with Large Language Models
Scaling laws
- [Instructor] Up to this point, we've looked at a couple of models, but now is a good time to try and understand why we have such large parameter models. Around the time of the release of GPT-3, the OpenAI team released some results around what they called the scaling laws for large models. They suggested that the performance of large models was a function of the model parameters, the size of the data set, and the total amount of compute available for training. They performed several experiments on language models. Let's take a look at some of the results. On the Y axis is the test loss. The test loss will converge for each of the models. So the lower the test loss, the better performing the model. Across the x axis is the number of parameters of the model. You can increase the sizes of these models by making them wider or increasing the number of layers. So as we go across, we're going with models with a hundred thousand to…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
GPT-34m 32s
-
(Locked)
GPT-3 use cases5m 27s
-
(Locked)
Challenges and shortcomings of GPT-34m 17s
-
(Locked)
GLaM3m 6s
-
(Locked)
Megatron-Turing NLG Model1m 59s
-
(Locked)
Gopher5m 23s
-
(Locked)
Scaling laws3m 14s
-
(Locked)
Chinchilla7m 53s
-
(Locked)
BIG-bench4m 24s
-
(Locked)
PaLM5m 49s
-
(Locked)
OPT and BLOOM2m 51s
-
-