Have you ever found yourself daydreaming, your mind bursting with colors and scenes so vivid they could be real? It’s like there’s an entire world inside you, just waiting to be shared, but when you go to take pencil to paper, the realization sets that you can’t even get close to what you want to create?
This was me. Then along came Stable Diffusion, Midjourney, and DALL-E, opening up a side of myself I wasn’t previously able to touch.
Join me here as we explore the world of Diffusion models, in particular Stable Diffusion: revolutionary software capable of turning dreamers into artists with entire worlds to share.
The Rise of Generative AI in Creative Spaces
Stable Diffusion came into play in August 2022. Since then the explosion in creativity, creation, and dedication by AI artists, coders, and enthusiasts has been enormous. This open-source project transformed into a huge community that contributes to creating a tool capable of generating high quality images and videos using generative AI.
The best part, it’s free and you can access and run it with relative ease. There are hardware requirements, but if you are a gamer or someone else with a powerful GPU, you may have everything you need to get started. And you can also explore Stable Diffusion online for free as well at websites such as Clipdrop.co.
However, Stable Diffusion was not the first time generative AI entered the creative space. Earlier than that, many artists were using various forms of generative AI to enhance, guide, or expand their creative expression. Here are a few popular examples:
1. Scott Eaton is an amazing sculpturist, early generative AI artist pioneer, who combines generative models with 3D printing and metal casting to produce fascinating sculptures. Here is a video of Scott sharing his process back in 2019: https://www.youtube.com/watch?v=TN7Ydx9ygPo&t
2. Alexander Reben is an MIT-trained artist and roboticist, exploring the fusion of humanity and technology. Using generative AI, he crafts art that challenges our relationship with machines, and has garnered global recognition for his groundbreaking installations and innovations.
3. Sofia Crespo merges biology and machine learning, highlighting the symbiosis between organic life and technology. Her standout piece, ‘Neural Zoo‘, challenges our understanding of reality, blending natural textures with the depth of AI computation.
All of these artists (and many more) incorporated machine learning in art before it was cool. They’ve helped pioneer the technology, invested time, energy, and funds to make it possible to create the applications that are available today.
Fortunately, we don’t have to repeat their process. We can dive straight into creation.
How does Stable Diffusion work?
Stable Diffusion operates as a diffusion-based model adept at transforming noisy inputs into clear, cohesive images. During training, these models are introduced to noisy versions of dataset images, and tasked with restoring them to their original clarity. As a result, they become proficient in reproducing and uniquely combining images. With the aid of prompts, area selection, and other interactive tools, you can guide the model’s outputs in ways that are intuitive and straightforward.
The best way to learn is to get hands-on experience, run generations, and compare results. So let’s skip talking about samplers, models, CFG scores, denoising strength, seeds, and other parameters, and get our hands dirty.
My personal experience
My personal experience with generative AI started with Midjourney, which was a revolutionary application of the technology. However, when Stable Diffusion was released, I was struck with its rapidly growing capabilities. It gave me the ability to guide the diffusion models in a way that makes sense, enabling me to create images as I want them to be, rather than whatever I got off of my prompt. It featured inpainting and eventually something called ControlNet, which further increased the ability to guide the models.
One of the most recent projects was working on a party poster to commemorate an event for a Portuguese DJ group, Villager and Friends. We wanted to combine generative AI with scenery from the party location to commemorate the party. We decided on a composition and then generated hundreds of styles for it, and cherrypicked the four best ones, which then got voted on by the community. The winning style was upscaled to a massive format, and will be made available in print for the partygoers. Let me show you the transformation –
The main composition
The Four Selected Styles –
The Winning Style by Community Vote –
A few details to point out about this project:
1. Notice the number 6 present in the background of every image; this is only possible thanks to the ControlNet extension for Stable Diffusion
2. Notice the increased level of detail on the final chosen image. This is a result of an ‘upscaling’ process. The final image is a whopping 8192px x 12288px!
3. Due to the size of the image, a single generation of the final version took about four hours. We had to generate several times due to crashes or ugly results.
4. The final version is unedited. It is raw output directly from the Stable Diffusion Model.
How can you get started with Stable Diffusion?
Running Stable Diffusion locally is the way to go. However in order to do that you will need to have good hardware. The main resource by Stable Diffusion used is VRAM, which is provided by the GPU. The minimum starting point would be a 4GB VRAM GPU. Unfortunately, the best optimization (xformers) are available only for NVidia GPUs.
In order to run Stable Diffusion locally you will need to install some software –
1. A user interface
a. Automatic1111 (balance between simple and flexible)
b. ComfyUI (extremely flexible and difficult to use, resource efficient)
2. Stable Diffusion Model (choose one you prefer on https://civitai.com/)
3. Python (a programming language)
4. PyTorch (a machine learning framework based on Python)
Start with the user interface; it will help you download everything you need to run it. I use Automatic1111; it’s simple enough and flexible enough for me. ComfyUI is better, faster, and capable of using resources more effectively, but also more complicated and requires a lot more learning to use effectively.
The errors generated from both are verbose, so if anything goes wrong, you can copy the error message and search the web for a solution. Pretty much everything you can run into as an issue in your first month of using Stable Diffusion has been solved by someone somewhere on the web.
CivitAI is a great resource for finding new and interesting models. Stable Diffusion 1.5 has the most developed (i.e. trained) models. If you’re looking for a particular style, you can likely find it there – and if you’re not looking for a particular style, you’ll likely discover something new. That said, most models are flexible and receptive to your prompts, and you can increase the weights of your prompts to guide the generation where you want it to go.
Sometimes Stable Diffusion is stubborn. Getting the result you want can be difficult, and this is where ControlNet and other guidance methods come in, helping you create the compositions you want.
This is just the beginning of your journey, but I’m glad you took the steps to learn how to get started. I’m looking forward to seeing your creations and explorations of latent space.
Is AI art, art?
Stable Diffusion enables people to create interesting art that they would otherwise never make. If you have imagination and some basic skills, you don’t need to be constrained and by technique – you can guide Stable Diffusion to putting your imagination onto the page.
Countless NFT artworks are being sold online, often coming from people that don’t necessarily have the skills to do everything on their own, but have learned to guide the diffusion models to produce their desired outcome. Some people simply have the talent for picking winners from a big batch of generated images.
Don’t get me wrong. There is joy in working with traditional art. Mastering the brush and paints of watercolor, oil, the needed strokes to create form on a blank canvas, human proportions and composition techniques are all beautiful and one can definitely still pursue them alongside AI art.
But they also involve a significant time investment, some pain and suffering, a dedication most creatives are not willing to give.
AI art is also difficult; it’s just a different kind of difficulty. Getting exactly what you want is challenging, similar to how it is with traditional art. The thing is, learning to do good AI art is learning art theory and applying it as guidance. So in a way AI art can bring you closer to real art than real art ever could. Something to think about.
In the end it’s up to you to decide if this is art or not. If you are finding ways to express your views, emotions, and ideas through AI, who really cares what others think about it?
Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter.