A Blog by Jonathan Low

 

Dec 10, 2018

Telling the Difference Between Real and AI-Generated Faces Is Getting Harder

There are lots of tells, if you know what to look for. JL

Kyle McDonald reports in Medium:

It can be difficult for a GAN to manage long-distance dependencies in images. While paired accessories like earrings usually match in the dataset, they don’t in the generated images. Eyes tend to point in the same direction and they are usually the same color, but the generated images are crosseyed and heterochromatic. Asymmetry is visible in ears being at very mismatched heights or sizes. GAN  will stretch or shrink each tooth in unusual ways.Hair styles have a lot of variability, but also a lot of detail, making it one of the most difficult things for GAN to capture.
In 2014 machine learning researcher Ian Goodfellow introduced the idea of generative adversarial networks or GANs. “Generative” because they output things like images rather than predictions about input (like “hotdog or not”); “adversarial networks” because they use two neural networks competing with each other in a “cat-and-mouse game”, like a cashier and a counterfeiter: one trying to fool the other into thinking it can generate real examples, the other trying to distinguish real from fake.
The first GAN images were easy for humans to identify. Consider these faces from 2014.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks” (2014) by Radford et al, also known as DCGAN.
But the latest examples of GAN-generated faces, published in October 2017, are more difficult to identify.
“Progressive Growing of GANs for Improved Quality, Stability, and Variation” (2017) by Karras et al, also known as PGAN or ProGAN.
Here are some things you can look for when trying to recognize an image produced by a GAN. We’ll focus on faces because they are a common testing ground for researchers, and many of the artifacts most visible in faces also appear in other kinds of images.

Straight hair looks like paint

It’s common for long hair to take this hyper-straight look where a small patch seems good, but a long strand looks like someone smudged a bunch of acrylic with a palette knife or a huge brush.

Text is indecipherable

GANs trained on faces have a hard time capturing rare things in the background with lots of structure. Also, GANs are shown both original and mirrored versions of the training data, which means they have trouble modeling writing because it typically only appears in one orientation.

Background is surreal

One reason the faces from a GAN look believable is because all the training data has been centered. This means that there is less variability for the GAN to model when it comes to, for example, the placement and rendering of eyes and ears. The background, on the other hand, can contain anything. This is too much for the GAN to model and it ends up replicating general background-like-textures rather than “real” background scenes.

Asymmetry

It can be difficult for a GAN to manage long-distance dependencies in images. While paired accessories like earrings usually match in the dataset, they don’t in the generated images. Or: eyes tend to point in the same direction and they are usually the same color, but the generated images are very frequently crosseyed and heterochromatic. Asymmetry is also commonly visible in ears being at very mismatched heights or sizes.

Weird teeth

GANs can assemble a general scene, but currently have difficulty with semi-regular repeating details like teeth. Sometimes a GAN will generate misaligned teeth, or it will stretch or shrink each tooth in unusual ways. Historically this problem has shown up in other domains like texture synthesis with images like bricks.

Messy hair

This is one of the quickest ways to identify a GAN-generated image. Typically a GAN will bunch hair in clumps, create random wisps around the shoulders, and throw thick stray hairs on foreheads. Hair styles have a lot of variability, but also a lot of detail, making it one of the most difficult things for a GAN to capture. Things that aren’t hair can sometimes turn into hair-like textures, too.

Non-stereotypical gender presentation

This GAN was trained on a subset of CelebA, which contains 200k images of 10k celebrity faces. In this dataset, I haven’t seen an example of someone with facial hair, earrings, and makeup; but the GAN regularly mixes different attributes from stereotypical gender presentations. More generally, I think this is because GANs don’t always learn the same categories or binaries that humans socially reinforce (in this case “male vs female”).

Semi-regular noise

Some areas that are otherwise monochrome may exhibit semi-regular noise with horizontal or vertical banding. In the cases above, this is probably the network trying to imitate the texture of cloth. Older GANs have a much more prominent noise pattern that is usually described as checkerboard artifacts.

Iridescent color bleed

Some areas with lighter solid colors have a multi-hued cast, including collars, necks, and eye whites (not shown).

Examples of real images

Check out that clear background text, those matching earrings, those equally sized teeth, detailed hairstyles.

0 comments:

Post a Comment