There’s a new version of Stable Diffusion, and it’s truly amazing

 

By Jesus Diaz

We just got the year’s biggest AI news so far, this side of ChatGPT-4: Stable Diffusion XL is out, and it is simply incredible. It’s so good that it has accelerated this projection of the next 10 years of generative AI. Here’s the scoop in a new AI roundup of the most interesting developments and creative uses of generative AI in the past few days.

 

Stable Diffusion XL offers jaw-dropping realism and legible text

There’s a new version of Stable Diffusion, and it’s truly amazing | DeviceDaily.com

[Image: Stability AI]

SDXL is more awesome than Baby Yoda in his new Terminator suit. Its abilities, from all that I’ve seen, make Midjourney V5 feel like Photoshop 5.0. Its photorealism goes way past the uncanny valley and sits firmly on the cusp of an Everest of believability. A bonus: Instead of producing garbage AI-generated text characters, the new SDXL generates truly legible text.

There’s a new version of Stable Diffusion, and it’s truly amazing | DeviceDaily.com

[Image: Stability AI]

The technical reason behind this improvement is the model, which is better at handling controllability, factuality, and consistency—the three big problems currently facing generative AI models. SDXL uses a much more complex model, Scott Draves, VP of engineering at Stability AI, tells me via email: “SDXL was trained using 2.3 billion parameters, whereas prior models were in the range of 900 million.”

There’s a new version of Stable Diffusion, and it’s truly amazing | DeviceDaily.com

[Image: Stability AI]

According to Stability, the new version—which anyone can access now in DreamStudio—offers “enhanced image composition, face generation, rich visuals and jaw-dropping aesthetics.” And they are not exaggerating.

 

There’s a new version of Stable Diffusion, and it’s truly amazing | DeviceDaily.com

[Image: Stability AI]

Draves says one of the XL model’s big advances is it requires less “prompt engineering”: “The model responds better to shorter, more natural language commands. It also has a set of style defaults that make its capabilities even easier to access.” In other words, you won’t have to learn arcane spells to make it produce cool stuff. This is a big leap forward in usability for everyone.

Make your creative juices flow with Create-a-tron

Developed by Matt Reed—the creative technologist at marketing agency Redpepper who made Zuckerberg’s dead eyes‘ HoriZuck Snapchat filter—Create-a-tron is a funky web app that harnesses ChatGPT-4 and DALL-E to start creative brainstorming for all kind of things, from ideas for ad campaigns to viral stunts. “Since I work at a marketing agency, we’re always ideating/brainstorming ideas for clients, so I’m building this thing to make our brains go even further out there,” Reed says. “It uses AI to generate lots more ideas that we can pick from.”

The clunky Robotron: 2084-inspired interface can be a bit overwhelming (Reed promises that the UX will be improved soon), but once you start playing around, it’s actually pretty simple to use and totally worth trying. After signing up, simply choose from a pull-down menu the topic you want the AI to brainstorm. Then enter any keywords in the text field next to it, and hit create. For example: I just tried “stories” for “the search for extraterrestrial intelligence.”

 

There’s a new version of Stable Diffusion, and it’s truly amazing | DeviceDaily.com

[Screenshot: courtesy of the author]

After a pause, eight ChatGPT-generated ideas will appear in the stream. If the titles are not enough, double click on any of them, and DALL-E will create four images to further inspire you. Mouse over any of those images and you will see there’s a “Detail” link that will take you to the idea page. Click on it and you will see a button that says, “Tell me more.” Hit it, and ChatGPT will generate a short rationale that explains how the idea works and why it makes sense. Remember: This is not a production tool, but a way to fire your creativity into unexpected directions.

The musical muse of Staccato

If music is your thing, there’a a bunch of generative AIs dedicated to automatically composing complete songs and scores, like Aiva. But a new obscure site I stumbled upon a couple of days ago called Staccato feels more like a personal musical muse than an automated generator of content. 

Think of Staccato as the Macca to your Lenny. It’s designed to help you come up with possibilities for the next few bars in your songs, maybe when you hit a creative roadblock. For musical help, you just need to upload any MIDI track you have composed and the AI will make suggestions for continuing it.

 

It does the same thing with lyrics, as you can see in the demo video, above. Staccato also has a very clear interface, and it feels accessible to anyone who has a musical ear or some limited Garage Band experience.

Using MidJourney to create an isometric game

This is not a tool, but this week I fell in love with this isometric architectural work in Midjourney, prompted by Javier López, an AI prompt wizard, who can make the AI spit out wonders like this:

Dive into that thread because the level of detail and quality of these images make my eyeballs roll like a slot machine. It’s a joy to zoom in and get lost in all the intricacy that the AI can achieve out of whatever dimension it steals this stuff from. I imagine that, somewhere in the world, there’s already an indie game developer with an awesome idea using this capability to create a new isometric perspective. These are so much fun that they make me want to start developing an Amiga game myself.

 

Balenciaga any face with SadTalker

Did you see the Balenciaga/Harry Potter video? If not, start there:

That video was made with the same core technology as SadTalker, a new web app hosted on Hugging Face (a site that offers tools for developers to build and host applications that use machine learning). SadTalker is the product of a scientific paper on the latest state-of-the-art one-image facial animation, driven by real voice tracks.

What does this all mean? Don’t worry. Just head here to try it out. The UI is spartan but self-explanatory. Just drop the still image you want to animate, the voice track in MP3 or WAV format, and the text. Click generate and see how it works (it’s free, although you may want to pay credits to accelerate the processing and not have to wait in the queue).

 

See, it’s easy. Like a magic spell. Yes, you are Harry Potter now, Balenciaga.

P.S. Have hot tips on creative generative AI projects that we should feature in our future roundups? Send them here.

 

Fast Company

(14)