Imagine your Labrador’s smile on a lion or your feline’s finicky smirk on a tiger. Such a leap is easy for humans to perform, with our memories full of images. But the same task has been a tough challenge for computers — until the GANimal.
A team of NVIDIA researchers has defined new AI techniques that give computers enough smarts to see a picture of one animal and recreate its expression and pose on the face of any other creature. The work is powered in part by generative adversarial networks (GANs), an emerging AI technique that pits one neural network against another.
You can try it for yourself with the GANimal app. Input an image of your dog or cat and see its expression and pose reflected on dozens of breeds and species from an African hunting dog and Egyptian cat to a Shih-Tzu, snow leopard or sloth bear.
I tried it, using a picture of my son’s dog, Duke, a mixed-breed mutt who resembles a Golden Lab. My fave — a dark-eyed lynx wearing Duke’s dorky smile.
There’s potential for serious applications, too. Someday movie makers may video dogs doing stunts and use AI to map their movements onto, say, less tractable tigers.
The team reports its work this week in a paper at the International Conference on Computer Vision (ICCV) in Seoul. The event is one of three seminal conferences for researchers in the field of computer vision.
Their paper describes what the researchers call FUNIT, “a Few-shot, UNsupervised Image-to-image Translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images.”
“Most GAN-based image translation networks are trained to solve a single task. For example, translate horses to zebras,” said Ming-Yu Liu, a lead computer-vision researcher on the NVIDIA team behind FUNIT.
“In this case, we train a network to jointly solve many translation tasks where each task is about translating a random source animal to a random target animal by leveraging a few example images of the target animal,” Liu explained. “Through practicing solving different translation tasks, eventually the network learns to generalize to translate known animals to previously unseen animals.”
Before this work, network models for image translation had to be trained using many images of the target animal. Now, one picture of Rover does the trick, in part thanks to a training function that includes many different image translation tasks the team adds to the GAN process.
The work is the next step in Liu’s overarching goal of finding ways to code human-like imagination into neural networks. “This is how we make progress in technology and society by solving new kinds of problems,” said Liu.
The team — which includes seven of NVIDIA’s more than 200 researchers — wants to expand the new FUNIT tool to include more kinds of images at higher resolutions. They are already testing it with images of flowers and food.
Liu’s work in GANs hit the spotlight earlier this year with GauGAN, an AI tool that turns anyone’s doodles into photorealistic works of art.
At the ICCV event, Liu will present a total of four papers in three talks and one poster session. He’ll also chair a paper session and present at a tutorial on how to program the Tensor Cores in NVIDIA’s latest GPUs.