Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. Let us explore this topic and how it relates to the human experience.
Using Stability's Image Control Sketch route to turn a simple pencil sketch into a something nearing real-life.
Discover assets like this one.
The goal of this study is to explore Stability's Control Sketch route now available in the Stability API.
This route calls upon an image-to-image model which takes an image and text prompt as input and returns a new image that has combined both aspects of the reference image and prompt.
It's been really cool to try out so far. In this post I'll share some of the things I've tried and what I've learned so far.
Each run of this model costs you 3 credits, charged to your Stability account. This comes out to $0.03 (3 cents) per run given a price of 1000 credits for $10. Pretty reasonable given how much you can do with this. For comparison, the purely generative models from Stability come in somewhere between 3-8 credits per run depending on the model and quality chosen. These models are priced close to OpenAI's DALLE model as well.
For more info about getting started with the Stability API service, see the Getting started section of my announcement post:
So let's check it out.
Shout-out to my sister for the reference sketch. She didn't give me any background about who this is or what it's supposed to be, so I'm just going to run with the ideas that stick out most to me.
The sketch looks like a woman meant to be sculpted out of marble. There's something about the stoic look on her face and the striking pose that makes it feel that way.
I don't think it was intentional but she actually reminds me of my grandmother and the pictures I saw of her when she was younger.
Now let's see what we can do with the Control Sketch route. The title of each of the images that follows was the prompt I included, in addition to the above reference image.
Pretty amazing. It captures the framing and positioning of the original extremely well.
Something to note with that:
Ensure that everything in your reference sketch it something you expect have included in the transformed version.
This might mean you need to do some cropping or editing beforehand because the model will try its best to include all the details it's given, even if unintentional.
You can also try decreasing the control_strength
parameter which will allow for more creative flexibility away from the reference image.
Now, let's look at another variation. In our first transformation, we were asking for a photorealistic version of the image we already gave but didn't ask for a significant change of material. The sketch already looked like white marble - that was the whole reason I used that description in the prompts from the beginning.
So for the next one I ask for the woman in black marble and it has no problem with it.
Through experience, you start to get an intuition as to what the model is doing behind the scenes.
In this case, it's not just an upscaling (of realism not of resolution) of the original image. The model understands the form of the sketch. From Stability's own documentation they say:
it allows detailed manipulation of the final appearance by leveraging the contour lines and edges within the image
How grounded-in-reality this understanding of form is is still something I'm exploring, but at the very least it knows the boundaries of the objects and can use those boundaries as structure for completely remixed version of the original image.
Remixed like the creation below, our stone lady now brought into the 1920s. She maintains the same pose and facial structure of our original sketch.
Amazing again. It even captures the piece of clothing that hangs down the center of her body like in the original.
While I'm really impressed and can't wait to keep using the model, one thing I did struggle with was getting her gaze to match with the reference. From our perspective, the original is looking up and to the right. Every single one of the generated images is looking down and to the left. Why? I don't know.
Even when I explicitly prompted the model to direct the gaze in that way, I had no luck. Changing the other parameters didn't have much effect either.
I'm going to keep trying but so far it's been one of the limitations I've found so far.
That's all for this study. I'll keep doing these as I explore these models some more and share what I'm finding. Looking forward to seeing how y'all use them as well!