Start free trial Sign in

From the course: Generative AI Skills for Creative Content: Opportunities, Issues, and Ethics

Text-to-image generative AI

From the course: Generative AI Skills for Creative Content: Opportunities, Issues, and Ethics

Text-to-image generative AI

“

- Let's now talk about image-based generative AI, which opens whole new worlds of visual possibilities. There are a few categories of this type of tech that I'd like to cover. One is text to image AI, which we'll explore in this video. And the other is image to image AI, which we'll explore in the next video. Text to image AI works by writing text-based prompts to create visual imagery. Content creators can generate artwork or create images for marketing, product, social media and more. There are quite a few text to image AI solutions and more coming onto the market all the time, but I'm going to begin by talking a bit about one of the biggest ones, DALL-E. Like ChatGPT, it was developed by OpenAI and is built on GPT technology where neural networks learn the relationships between words and images. DALL-E can create images with a wide variety of concepts, tones, and styles. As with text-based AI, it's often necessary to hone your prompt through trial and error to get the desired result. Take a look at this image created with the generic prompt of a beautiful sunrise. Again, these are photos of sunrises that don't actually exist on earth. They're simply generated by teaching the AI model what a sunrise should look like. And now we'll hone in with more descriptive language regarding content and style. A beautiful sunrise in a watercolor style, featuring mostly cool colors with one dark tree silhouette in the foreground. All right, and then from there, I'm able to choose any of the images and create multiple variations. And I can edit the image using techniques like in painting and out painting. In painting means that I can erase certain elements within the image and then regenerate the prompt with new information, and it will fill in the erased portion with new elements. Out painting means I can extend the frame of the image in any direction and it will match the environment and it conveniently provides multiple variations of the out painted frame that I can choose from. Let's move on from our sunrise and I'll show you three other DALL-E prompts and their corresponding image results, which we'll use for comparison purposes throughout the rest of this video. Stable Diffusion is another popular text to image AI model. It works a bit differently under the hood, but rather than explain the tech, I'd simply like to show you some side-by-side results. Let's input our same sunrise prompt to see how Stable Diffusion handles it. Stable Diffusion has various fine tuning controls too, such as the guidance scale, and this tells the model how closely it should follow your text prompt. The stronger the guidance, the closer it sticks to your prompt. And here are the three additional comparison prompts and images, so you can see how Stable Diffusion results compared to DALL-E. Keep in mind, these are just images created for me during one particular image generation session. You will always get a completely unique result, but I just wanted to give you a sense of how it compares in a very general sense. Let's now take a look at another model called Midjourney, which is very effective at generating highly artistic images and environments, often with dramatic lighting and fantasy vibes. As of version five, Midjourney has become really good at generating extremely realistic images. In fact, some of the more controversial AI images that have gone viral on the internet were created in Midjourney. That takes us into potential deepfake territory, which we'll talk about later in the course. Let's input the same sunrise prompt to see the types of images that Midjourney generates. Midjourney allows for additional fine tuning as well, things like upscaling and creating different variations of the original generated images. And again, here are our three comparison images so you can see how Midjourney compares to the first two. Again, this is just a moment in time, but I think it's useful to compare and contrast. And then there's Adobe with its recently introduced Firefly AI generator, currently in beta, which lets you create images, photo, art, graphics, text effects, and vector art with impressive control of style, color, tone, lighting, and more. Firefly is integrated within the Adobe Creative Cloud apps which means you can use it directly with applications like Photoshop, Illustrator and After Effects. All right, so one more time. Here is our beautiful sunrise prompt and our three comparison images to show you how Adobe Firefly stacks up to the others. As of the time of this recording, these are some of the most popular text to image generative AI solutions, but there are certainly others with more being added all the time. Here are some others that you might like to try out. There are a couple of main areas that differentiate these tools. One is the difference in the end result. Some tools just do a better job with certain types of images. Another difference is the way you interact with the tool. For example, Firefly has a more graphical interface. The others have a more text-based interface. Between the image quality and the tooling, you may find yourself preferring one tool over others. The good news is you can try most every tool for free. And again, while some people do use these to generate final results, many others like to use them for research, ideation, storyboarding, and other introductory stage creative tasks.

Contents