From the course: Generative AI: Working with Large Language Models

Unlock the full course today

Join today to access over 23,100 courses taught by industry experts.

GLaM

GLaM

- [Instructor] The Google research team noted that training large dense models requires significant amount of compute resources, and they proposed a family of language models called GLaM or Generalist Language Models. They use a sparsely activated mixture of experts architecture to scale and because they're using a sparse model, they have significantly less training costs compared to an equivalent dense model. Now these models use only 1/3 of the energy to train GPT-3 and still have better overall zero shot and one shot performance across the board. The largest GLaM model has 1.2 trillion parameters which is approximately seven times larger than GPT-3. Now the GLaM model architecture is made up of two components. The upper block is a transformer layer and so you can see the multi-head attention and the feed forward network. And in the bottom block you have the mixture of experts layer. Again, you have a multi-head…

Contents