From the course: Generative AI: Working with Large Language Models
Unlock the full course today
Join today to access over 23,100 courses taught by industry experts.
Self-attention
From the course: Generative AI: Working with Large Language Models
Self-attention
- [Instructor] One of the key ingredients to transformers is self-attention. In this example text, "The monkey ate that banana because it was too hungry." How is the model able to determine that the it corresponds to the monkey and not the banana? It does this using a mechanism called self-attention that incorporates the embeddings for all the other words in the sentence. So when processing the word it, self-attention will take a weighted average of the embeddings of the other context words. The darker the shade, the more weight that word is given and every word is given some weight. And you can see that both banana and monkey come up as likely for the word it. But monkey has the higher weighted average. So what's happening under the hood? As part of the self-attention mechanism, the authors of the original transformer take the word embeddings and project it into three vector spaces, which they called query, key, and…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.