From the course: AI Trends

Unlock this course with a free trial

Join today to access over 23,100 courses taught by industry experts.

Multimodal prompting

Multimodal prompting

From the course: AI Trends

Multimodal prompting

- Currently, a lot of the prompt engineering focuses on single input and single output. So when we say modal, it means a mode. It could be text, it could be images, it could be videos, it could be voice. And perhaps in the future, there might be other modes. Towards the end of 2023, now we have some multimodal models that's capable of being prompted. One example would be GPT Vision where users can input a image and then an instruction or a prompt to ask the model certain questions. So imagine you can input a chart from a slide deck and ask a question, what is in the chart? Then the model is going to describe what's happening in the chart, perhaps mentioning about some numbers, some trends, et cetera, et cetera. Prior to this, what we used to do is we will have to do OCR to extract information from images into text, and then we ask questions to those texts. Now we are able to ask questions directly against an image,…

Contents