Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. Let us explore this topic and how it relates to the human experience.
Case study exploring the two leading text-to-image models from OpenAI and StabilityAI. We generate some images together and compare and contrast the results of the same prompt applied to each model.
In this case study, we'll look at the two leading text-to-image models from OpenAI and StabilityAI. These two AI companies have models serving various different domains, but beyond language models, text-to-image seems to be the second-best use case for generative AI right now.
Let's learn by doing. We'll be using the model Routes on Ouro to generate our images. The cool thing about this approach is that we get the lineage of an asset when creating on the platform. In this case, we'll see the original prompt + config and Route used to create each image.
I've been living in north county San Diego the past 7 months or so and I've been captivated by all the natural beauty here. It's really an amazing place. Knowing the real thing well, I want to see how well these models can capture the rolling southern California hills and the butterflies you'll often see.
The default prompt:
Rolling southern California hills with butterflies
It's really pretty good! Well, I thought that until I saw what Stability created. I find that through comparison, you can learn a lot more about something because you can side-by-side compare the things you like in one with the things you don't like because of the other.
There's obviously wan't much to go off of with such a short prompt, but the idea was complete and I think the image captures the idea well.
Stylistically, it feels washed out with too much sameness across the scene. DALL·E has this characteristic realistic oil painting type look for most of the images generated, even when you ask for a different style. I really don't mind it at all - I think it looks good in most situations, however I don't think you could ever get photorealistic from the model. It's missing a sharpness you'd find in a real photograph.
From DALL·E's marketing page,
DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph.
When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life.
Prompt transformation may be the most powerful thing about using OpenAI's model. Writing a good image prompt is hard. We aren't used to describing visual scenes like you need to for good image prompting - we see with our eyes so the characteristics of a scene are self-apparent, and not something we put into language.
Converting the visual domain into the language domain is hard. The data loss in the conversion is large too. OpenAI's decision to ship prompt conversion with the model interface was really smart. You can always turn this functionality off with the API if you want to have full control over the prompt, but I think having it on by default was the right move.
So in our study, the prompt was passed to ChatGPT for an "upgrade". Our new prompt became the following:
Visualize a serene landscape of undulating hills characteristic of Southern California. Imagine these hills clothed in sun-kissed golden grasslands, with an array of native plants adding color and diversity to the scene. A clear blue sky stretches overhead, casting long, gentle shadows that amplify the curves of the hills. Pepper this scene with vibrant butterflies of different hues and sizes - monarchs, swallowtails, blues, and more. They flutter about, their wings catching the sunlight, contributing a touch of magic and movement to the tranquil surroundings.
This was what was used to generate the first image, now let's see what we get out of Stability.
Using the transformed prompt generated and used by OpenAI, we get this vibrant looking scene.
And I like it a lot better! It's so bright and warm, the opposite of OpenAI's. There are more than just rolling hills now: with trees, mountains more apparent, and wildflowers front and center! I also like the placement of butterflies more too as it feels like a little swarm instead of having them scattered all across the scene with no organization.
Before we get too excited, we do need to remember that this image wouldn't have been possible without OpenAI! This is what we got from the transformed prompt, something I couldn't have written myself so easily.
Using the original prompt, no transformation, we get this scene.
It's more photorealistic looking, but it's not my favorite of the bunch.
This was fun, and now let's try to put some of the things we've learned from this exploration into words.
OpenAI's prompt transformation is really powerful, and I think it's a necessary step to getting good image generation even if you don't use DALL·E to generate your final image.
Both models understand perspective (close to camera foreground vs background), but OpenAI's images have a sort of sameness across distances that Stability doesn't. This is really important for visual appeal as we want to see differentiation and structure organizing an image.
Stability has the ability to generate photorealistic images, but DALL·E does not. Both will create a "realistic-oiled" feel which is usually good enough but definitely revealing as AI-generated.
That's all for now! Let me know what you've discovered for yourself and how you find using these models. Looking forwarding to hearing about it and happy building.
Discover assets like this one.