‘Paint Me a Picture’: NVIDIA Research Shows GauGAN AI Art Demo Now Responds to Words

GauGAN2 uses a deep learning model that turns a simple written phrase, or sentence, into a photorealistic masterpiece.
by Isha Salian

A picture worth a thousand words now takes just three or four words to create, thanks to GauGAN2, the latest version of NVIDIA Research’s wildly popular AI painting demo.

The deep learning model behind GauGAN allows anyone to channel their imagination into photorealistic masterpieces — and it’s easier than ever. Simply type a phrase like “sunset at a beach” and AI generates the scene in real time. Add an additional adjective like “sunset at a rocky beach,” or swap “sunset” to “afternoon” or “rainy day” and the model, based on generative adversarial networks, instantly modifies the picture.

With the press of a button, users can generate a segmentation map, a high-level outline that shows the location of objects in the scene. From there, they can switch to drawing, tweaking the scene with rough sketches using labels like sky, tree, rock and river, allowing the smart paintbrush to incorporate these doodles into stunning images.

The new GauGAN2 text-to-image feature can now be experienced on NVIDIA AI Demos, where visitors to the site can experience AI through the latest demos from NVIDIA Research. With the versatility of text prompts and sketches, GauGAN2 lets users create and customize scenes more quickly and with finer control.

An AI of Few Words

GauGAN2 combines segmentation mapping, inpainting and text-to-image generation in a single model, making it a powerful tool to create photorealistic art with a mix of words and drawings.

The demo is one of the first to combine multiple modalities — text, semantic segmentation, sketch and style — within a single GAN framework. This makes it faster and easier to turn an artist’s vision into a high-quality AI-generated image.

Rather than needing to draw out every element of an imagined scene, users can enter a brief phrase to quickly generate the key features and theme of an image, such as a snow-capped mountain range. This starting point can then be customized with sketches to make a specific mountain taller or add a couple trees in the foreground, or clouds in the sky.

It doesn’t just create realistic images — artists can also use the demo to depict otherworldly landscapes.

Imagine for instance, recreating a landscape from the iconic planet of Tatooine in the Star Wars franchise, which has two suns. All that’s needed is the text “desert hills sun” to create a starting point, after which users can quickly sketch in a second sun.

It’s an iterative process, where every word the user types into the text box adds more to the AI-created image.

The AI model behind GauGAN2 was trained on 10 million high-quality landscape images using the NVIDIA Selene supercomputer, an NVIDIA DGX SuperPOD system that’s among the world’s 10 most powerful supercomputers. The researchers used a neural network that learns the connection between words and the visuals they correspond to like “winter,” “foggy” or “rainbow.”

Compared to state-of-the-art models specifically for text-to-image or segmentation map-to-image applications, the neural network behind GauGAN2 produces a greater variety and higher quality of images.

The GauGAN2 research demo illustrates the future possibilities for powerful image-generation tools for artists. One example is the NVIDIA Canvas app, which is based on GauGAN technology and available to download for anyone with an NVIDIA RTX GPU.

NVIDIA Research has more than 200 scientists around the globe, focused on areas including AI, computer vision, self-driving cars, robotics and graphics. Learn more about their work.