Testing Stable Diffusion XL on NightCafe

Art, by far, is one of the biggest up-front expenses for a game. For me, it’s an unbearable expense until the game hits 500 copies or so. My strategy so far has centered on becoming an artist, so that I can do the work myself (which I enjoy). The process has taken over 2 years so far, and I anticipate another 2-5 before growing my skills to the level that I desire.

That said, AI offers an opportunity to greatly reduce the work of creating art. I’ll comment on the ethics and legality of the issue in my next blog post.

For the moment, though, I want to focus on errors in AI-generated images, which have limited their usefulness for illustrating games so far. I wrote about the ugly fingers, misshapen faces, and unacceptable composition of DALL-E and Midjourney in August 2022. But I also explained, as a former computer science professor, that the AI creators would easily overcome those issues. Just this week, in fact, an AI-generated image of the Pope fooled many people, despite some issues with his fingers.

So when NightCafe came out with an “Extra Large” model for rendering with StableDiffusion, I decided to take it for a spin. Let’s see how it does on some sample game-related prompts.

3 tests of SDXL

For each prompt, I’ll present the StableDiffusion 2.1 result, then the SDXL result. I generated each using the “Epic” preset.

Prompt: “hard-working but slightly goofy dwarf digging with a magical machine, with smoke and mountains in the background”

Content: SD21 typically renders the dwarf too far in the distance to see much detail. None of the images involved any digging. SDXL also failed to include any sense of digging, but 3 of them included some sort of content besides a dwarf + mountains (i.e., city or a machine).

Aesthetics: SDXL made much better use of hue and value contrasts, versus SD21. The compositions make more sense. Where fingers appear with enough detail to evaluate them, SDXL did a decent job comparable to the Pope image mentioned above.

Judgment: Clear win to SDXL.

Prompt: “happy gnome tilling the healthy soil with the assistance of his enchanted piggy”

Content: SD21 included a piggy in 1 picture, a horribly deformed hen in 1 picture, and a gnome that looks like a pig in 1 picture. SDXL entirely failed to include the piggy. On the other hand, SDXL’s backgrounds include some very cool landscape and architecture.

Aesthetics: The hen in SD21 distracts from numerous other, more subtle problems with SD21. These include really wooden poses, poor positioning of the gnomes (cut off by the image’s side), the hunch on 1 gnome’s back, and overall the really plastic-looking textures on the gnomes. SDXL’s poses look far more natural, much better composed, and free from any horrific hunches and Franken-hens. The colors are incredible. The faces and fingers still need work.

Judgment: With some prompt-tweaking to get the darn piggy to appear, SDXL looks like the winner again.

Prompt: “smiling valkyrie with golden wings watching over a shining medieval city”

Content: Where’s the valkyrie in SD21? I barely see some wings peeking into 1 image. SDXL puts the valkyrie front and center. In all 4 images, SDXL does a fine job of setting an interesting city in the background, though it’s not particularly medieval.

Aesthetics: If I wanted just a city, 2 of the SD21 images would provide a good start, with a lot of work; at least they have some interesting architecture. But as with the other tests above, SDXL really does a great job with color and light. The composition of all 4 images is great. The faces and hands still need work.

Conclusions

SDXL is a huge improvement over SD21 in terms of color, contrast and composition. It still makes noteworthy mistakes on faces and hands. I wouldn’t use it for a game, even just from a technical standpoint. However, these improvements clearly demonstrate that AI creators have made huge strides in less than a year, and I have absolutely 0.0000 doubt that AI-generated art will become as technically suitable for games as any human-generated art.

Notes to self

Remember to post about legality and ethics next.
SDXL costs more than SD21 on NightCafe.
SDXL still mangles hands and faces.
Composition, color and contrast have markedly improved.
Test this again in a few months.