From the course: Generative AI: Working with Large Language Models

Unlock the full course today

Join today to access over 23,100 courses taught by industry experts.

Scaling laws

Scaling laws

- [Instructor] Up to this point, we've looked at a couple of models, but now is a good time to try and understand why we have such large parameter models. Around the time of the release of GPT-3, the OpenAI team released some results around what they called the scaling laws for large models. They suggested that the performance of large models was a function of the model parameters, the size of the data set, and the total amount of compute available for training. They performed several experiments on language models. Let's take a look at some of the results. On the Y axis is the test loss. The test loss will converge for each of the models. So the lower the test loss, the better performing the model. Across the x axis is the number of parameters of the model. You can increase the sizes of these models by making them wider or increasing the number of layers. So as we go across, we're going with models with a hundred thousand to…

Contents