From the course: Generative AI: Working with Large Language Models

Unlock the full course today

Join today to access over 23,100 courses taught by industry experts.

Self-attention

Self-attention

- [Instructor] One of the key ingredients to transformers is self-attention. In this example text, "The monkey ate that banana because it was too hungry." How is the model able to determine that the it corresponds to the monkey and not the banana? It does this using a mechanism called self-attention that incorporates the embeddings for all the other words in the sentence. So when processing the word it, self-attention will take a weighted average of the embeddings of the other context words. The darker the shade, the more weight that word is given and every word is given some weight. And you can see that both banana and monkey come up as likely for the word it. But monkey has the higher weighted average. So what's happening under the hood? As part of the self-attention mechanism, the authors of the original transformer take the word embeddings and project it into three vector spaces, which they called query, key, and…

Contents