Noel Duan reports in ars technica:
Instead of revolutionizing the beauty industry with data analysis and the topic’s holy grail—defining beauty in a universal, objective manner—Beauty.AI and its competitions only reinforced the limited capabilities of algorithms when it comes to labeling human beings. Bots (like beauty at large) often involve a degree of subjectivity. And what the robots saw were institutional prejudices—not universal beauty.
For over a year, I worked as a beauty editor, writing and researching about the products, trends, and people that make us want to look a certain way. And as research for many of the stories I wrote, I consulted with dermatologists, plastic surgeons, makeup artists, aestheticians, and more trying to answer a simple question—how can I make myself more conventionally attractive?
“Beauty is confidence,” they’d always say, prefacing the real answer. Inevitably, these experts would eventually tell me that you feel more confident, and thus more beautiful, when you look blemish- and wrinkle-free. (Pending on the product they were promoting, this could also incorporate being tanner, or more contoured, or thinner, or paler, or less made up, or curvier, etc.) Regardless of respondents’ different aesthetic tastes, everyone seemed to agree—younger is more beautiful. Beauty was about anti-aging.
Naturally, the problem here is the premise. What is beauty beyond someone else defining it? For as long as humanity’s obsession with the term has existed, we’ve equally known about its subjective nature. After all, “beauty is in the eye of the beholder” is merely a cliché that posits that exact subjectivity of attractiveness.
But what if the beholder can eliminate subjectivity—what if the beholder wasn’t a person, but an algorithm? Using machine learning to define beauty could, theoretically, make beauty pageants and rankings like People’s annual Most Beautiful in the World list more objective and less prone to human error. Of course, teaching an algorithm to do anything may involve some bias from whoever does the programming, but that hasn’t stopped this automated approach from defining equally subjective things like listening preferences or news value (we see you, Facebook et al).
“We don’t want human opinion,” says biotechnologist Dr. Alex Zhavoronkov, one of the founders behind a pageant-holding, beauty-quantifying initiative called Beauty.AI. “At the end of the day, there are lots of disagreements. We’re looking at ways to evaluate beauty, and some ways may be more relevant or less relevant to human perception. But the entire purpose of Beauty.AI is to get rid of human opinion, to transcend it.”
Beauty.AI was merely one of the latest attempts to have technology objectively evaluate beauty. But as an online competition that crowdsourced headshots and allowed bot-driven algorithms to determine rankings, perhaps it represents the fever point of this exercise. If so, the initiative’s outcome made one thing definitively clear: artificial intelligence will never determine a universal face of beauty. Even today, it only highlights how precisely narrow one’s definition of beauty can be.
Before Hot Or Not: A brief history of quantifying beautyLong before anyone knew what an algorithm was, humanity has attempted to quantify and measure beauty. Leonardo da Vinci’s pen-and-ink drawing of the Vitruvian Man, whose head was one-eighth of its body, was based on Roman architect Vitruvius’ writing on the subject from his treatise, De Architectura. Plato believed that beauty resided in parts that harmoniously fit into the whole. St. Augustine believed that the more geometrically equal something was, the more beautiful it was. The theories went on and on.
And for as long as people have made these landmark statements on beauty, they’ve also revealed obvious cultural bias about their standards of beauty. Northern Renaissance painter Albrecht Dürer used his own fingers, known for being longer than average, to construct a canon of the human body. Or, for a recent example, morning show Good Day DC anchors Wisdom Martin and Maureen Umeh went viral last year for giving the side eye to a 2014 cosmetic surgery study stating that Kate Middleton had the “most desirable face.” Naturally, the study was based on a test group of “normal-appearing white women aged 18 to 25 years.”
In the past few decades, scholars have at least come to accept that universal beauty is a complicated, perhaps impossible thing. One of the more popular works furthering that idea comes from author Naomi Wolf and her 1991 bestselling book, The Beauty Myth. “Beauty is a currency system like the gold standard,” she wrote. “Like any economy, it is determined by politics, and in the modern age in the West it is the last, best belief system that keeps male dominance intact.” Wolf believed that beauty is a construction of capitalism meant to preserve the status quo in the ever-expanding West—essentially arguing that modern, more diverse supermodels like Naomi Campbell and Tyra Banks still had to fit into a rigid definition of beauty that entails things like “tall,” “thin,” and “youthful.”
These cultural complications haven’t stopped modern researchers from looking to tech for a better solution, however. Case in point: University of California, Irvine researchers Natalie A. Popenko and Dr. Brian J. Wong. (Wong, a plastic surgeon and professor, was one of the experts behind that controversial, Kate Middleton-face study.) In their most famous paper—2008’s “Evolving Attractive Faces Using Morphing Technology and a Genetic Algorithm: A New Approach to Determining Ideal Facial Aesthetics”—the duo employed digital morphing software to “evolve” and “breed” more attractive faces over time based on data gathered from varied, human sources including Facebook surveys, plastic surgeons, student study participants, and professionals from an eyebrow cosmetics company favored by Kylie Jenner. In the most basic sense, their work tried to deploy predictive computing in a similar way to how scientists generate climate models… except they were hoping to see whether an average evolved over time into an ideal face.
Ultimately, Wong and Popenko determined that an “average” face didn’t make for a beautiful face. In fact, nasal width, eyebrow arch height, and lip fullness correlated significantly with the study’s scores of attractiveness. In other words, Jenner was onto something with her Kylie Lip Kit (designed to give you full, pouty lips) and heavily arched eyebrows (brought to you by Anastasia Brows). It turns out beauty, at least the kind that makes you want to shop at Sephora, isn’t determined by evolution—it’s determined by celebrity idols.
As this type of research has continued, businesses have sought to get in on the premise of technologically-defined beauty. The venture-backed Naked 3D Fitness Tracker is a $400 smart mirror (available for pre-order) that scans your body in 3D and uses a heat map to tell you where you’re growing muscle or gaining fat, and it claims results within 2.5 percent accuracy. It comes with a mirror that “looks” at you—a literal “Mirror, Mirror, On the Wall” situation—and encourages you to face the facts: Are you actually losing weight? This scale claims it won’t let you cheat.
Or, launched last April, an app called Map My Beauty claims to use facial zone recognition algorithms to objectively assess beauty. Users upload selfies, and the app spits out how and where to put on makeup. So far in its short existence, the app has proven particularly useful for advanced techniques like contouring, the old school method made viral by Instagram and the Kardashians. (Contouring requires a solid understanding of your own facial structure in order to manipulate appearance using light and shading.)
“Using this active appearance model and applying it to selfies we have never seen before, we can extract a handful of parameters which also—among others—describe implicitly facial attractiveness,” says Dr. Kristina Scherbaum, the computer scientist behind the app. What those parameters are, however, remains a secret. Map My Beauty has business aspirations beyond aiding at-home makeup artists. The team has previously worked with international beauty retailer Sephora, and now Map My Beauty has its own team of professional makeup artists. These professionals act as a focus group for labeling and categorizing the database, and Map My Beauty says its judgment criteria is proprietary.
This means an app may spit out the answers, but a team of humans is again behind the scenes making decisions (with varying degrees of subjectivity and objectivity). So from DaVinci to Wong and Popenko to this, that undeniable human element ultimately permeates results no matter how many layers of technology are added.
Beauty.AI, teaching algorithms about beautyThis was the underlying pitfall lurking when Beauty.AI launched in 2015. Developed as a collaboration between Russian scientists and Hong Kong-based Youth Laboratories and supported by US companies like Microsoft and Nvidia, Beauty.AI offered an online-only beauty contest for competitors from across the globe (India, China, Africa, the US, etc.). Willing entrants sent in selfies, and this ratings website promised to pit participants against each other as bot judges determined who was the most attractive.
Unfortunately, participants only had vague explanations of how this data-driven rankings system worked.
The 1.0 version ran from November 2015 through January 2016, and it attracted 5,000 participants. Male and female entries were judged by three algorithms. Russian scientists Anastasia Georgievskaya, Jane Schastnaya, Poly Mamoshina, and Zhavoronkov developed a trio of beauty-seeking bots that focused on facial symmetry (aka proportion), number of wrinkles (aka youthfulness), and comparisons of facial symmetry and number of wrinkles on contestants versus fashion models (aka professionally beautiful individuals). These traits were selected because the team felt they could be mathematically determined. Supermodel Cindy Crawford’s famous beauty mark might be her career-making trait, but an algorithm would simply mark it as a deviation from a “perfect” spot-free face. “Eventually, we might add weight to parameters later to make some traits matter more than others,” Zhavoronkov noted.
For a moment, let’s accept the premise that wrinkles are a determinant of beauty. How do you get an algorithm to accurately and efficiently gauge and track these things?
When devising their wrinkle-judging algorithm—dubbed RYNKL—the Beauty.AI team chose to power it through deep learning. RYNKL was trained by Moscow-based bioengineer Anastasia Georgievskaya using about 20,000 photos from a variety of sources, including the IMDB database of actors’ headshots. “We get a standardized photo—a high-resolution, frontal facial photo,” she said. The image must be at least 1024 pixels with a forehead minimum width of 250 pixels, “like Harrison Ford. He has really good wrinkles and our programmers love him.” Actors were chosen specifically because their biometric parameters—like approximate height, approximate weight, and approximate age—are public knowledge. Ultimately, an image database of non-celebrities plus the 5,000 entries from the Beauty.AI 1.0 competition were used in training.
The first step in training the algorithm was supervised learning, or learning with a teacher like Georgievskaya. (This is essentially how tech companies like Facebook have been training machine-learning algorithms, too.) She started by taking some of the training data (headshots) and manually extracting the pertinent features of that data. How many forehead wrinkles does Harrison Ford have compared to other men in his age group?“Our gold standard images are from Beauty.AI because we monitor the resolution, the light conditions, and the facial attributes,” Georgievskaya notes. One of the rules for submitting to Beauty.AI is that neither men nor women can have makeup, eyewear, or facial hair in the images—the Beauty.AI team manually combs through the entries to disqualify selfies that don’t adhere to these standards. “It's difficult to detect makeup,” admits Georgievskaya, especially when the popular concept “no-makeup makeup” is all about wearing makeup to look barefaced.
After manual training, RYNKL learns to process wrinkles as white lines. Georgievskaya showed me an image of RYNKL’s thought process: a photograph of a wrinkled forehead is now a flat black-and-white landscape, a black map with white lines that looks just like a river current. It is, essentially, topography of your face. RYNKL counts these lines and gives a score based on this number. The fewer white spaces you have, the lower your score, and the higher (more beautiful) you rank. Later, Georgievskaya would use a similar method to guide a new algorithm trained to recognize pimples (called PIMPL, which identified pimples and dark spots but not freckles). This algorithm learned to detect circle-shaped “blobs” on the face and counted said blobs in its analysis.
Third time’s a charm?The results of Beauty.AI’s 1.0 competition were controversial, though ultimately the company appeared pleased. Feedback started with vehement disagreement from participants. One contestant even suggested that the contest would benefit from a human judge overseeing the outcomes made by the algorithms, essentially reverting this experiment’s central premise. But the experiment garnered enough notice that Beauty.AI was invited to transfer its algorithms to real life judging down the line.
The Miss Moscow State University Pageant, a beauty pageant sponsored by Russian conglomerate Basic Element, looked to partner with Beauty.AI later in 2016. Inspired by the 1.0 feedback, this jury would include both Beauty.AI’s tech and human judges in order to determine the finalists. There would even be an accompanying hackathon for developers interested in this beauty-defining algorithm.
“They want to have a robot show in Asia in 2017,” Zhavoronkov told Ars last summer. “So we’re going to make a pilot.” The pageant would have involved both human judges and robot judges picking the finalists. But this pageant partnership never happened—instead, Beauty.AI soon went on indefinite hiatus shortly after its 2.0 results were released.
For the second iteration of its online-only contest, the team added PIMPL, "Average Face"(built on the hypothesis that the closer the face is to the average face within an ethnic group, the more attractive it is) and "AntiAgeist" (evaluating predicted versus actual chronological age) to its robojudges. This competition ran from April through August 2016, and it attracted 6,000 participants this time. When results published, the worldwide press even took note—though likely not for reasons Beauty.AI had anticipated.
More than 100 countries were represented among the 6,000 entries, but among the 44 winners “nearly all were white, a handful were Asian, and only one had dark skin,” reporter Sam Levin noted for The Guardian in a piece titled, “A beauty contest was judged by AI and the robots didn't like dark skin.” Zhavoronkov told the paper he was shocked by the results, but he had a working hypothesis.
"It happens to be that color does matter in machine vision," Zhavoronkov told Motherboard in an e-mail during this flurry of press. "For some population groups, the data sets are lacking an adequate number of samples to be able to train the deep neural networks.”
Within a week of the Guardian piece, Beauty.AI cancelled its plans for a 3.0 contest at the end of 2016. In the accompanying press release, the team noted its ultimate hopes for the algorithms were for “preserving humans in the youthful, healthy state” for as long as possible, but there was greater demand (and market appeal) for determining appearance and beauty. It simultaneously took shots at “sensational” journalism while admitting the great potential for bias to enter into machine learning, especially when dealing with a subject as culturally-determined as beauty.
“After the racial controversy, we decided to postpone the Beauty.AI 3.0 launch until we develop a set of algorithms that test for bias in AI and develop a set of safeguards,” Zhavoronkov told Ars in February 2017. “Also, we started uncovering many interesting and scary facts that we would never be able to publish, but need to be considered when interpreting the results, when the network is highly biased and the racial and ethnic features are predictive. This work got us into some very new collaborations with the large hospital networks and banks and there is virtually no time to work on the Beauty.AI competitions, because for us healthcare research always take priority.”
The Beauty.AI team then sent along a press release for its new nutraceutical (pharmaceutical-grade nutrients) venture, a “combination of naturally-occurring molecules for life extension” called Geroprotect.
The next contestFor the record, Beauty.AI maintained an emphasis on health even while it allowed beauty to become its more high profile pursuit. We initially started reporting this piece in the Spring of 2016 with September as the target date (hoping to coincide with the 2.0 competition). At the time, there were concerns about methodology even before the results became the bigger issue—for instance, the team was vague on what they planned to do with all the data being gathered; Zhavoronkov told Ars it was kept in a secure but proprietary server and “it would distract from the story if I told you.” Still, the team explicitly said its long-term hope was health, not beauty, even back then.
“We just want to help people look younger and live longer,” Zhavoronkov said last summer. The team hoped to develop pharmaceuticals for longevity, physical robots, and appliances—much like the Naked 3D Fitness Tracker—that could combine to improve quality of life. “We want to know what parts of beauty are linked to a longer life.” The team said Beauty.AI was created to collect worldwide data on facial features that could be later used for longevity products, which isn’t a revolutionary occurrence in the $13 billion youth-obsessed beauty industry. (Mind you, that’s separate from the anti-aging sector of the beauty industry, which actually fell from $2.2 billion in 2010 to $1.9 billion in 2015.)Today in addition to nutrients, there are rumbles that the Beauty.AI team will return to its pageantry. Wired UK recently interviewed Georgievskaya, who said a Beauty.AI 3.0 would be happening. The site reported this contest would coincide with a simultaneous launch for Diversity.AI, an organization devoted to “inclusion, balance and neutrality in artificial intelligence" launched by some of the Beauty.AI team. Essentially, the team that stumbled into reinforcement of algorithimic bias now wants to combat it. (And as Wired UK noted, yes, "the contradiction between Diversity.AI’s aim of reducing discrimination and its method—a contest based, by definition, on appearance-related discrimination—did not appear evident" to the team.)
Shortly after that story published, Diversity.AI's website was taken down. "The initiative has not yet been launched," is now the only message publicly available. There's no date attached to any official Beauty.AI 3.0 initiatives, and when Wired UK talked with some of the experts listed on the Diversity.AI advisory board, they weren't aware of any contest and would not voice support for another one. Whether or not either initiative goes forward remains an open question. Ars has been unable to reach the team for clarification.
So for now, instead of revolutionizing the beauty industry with data analysis and reaching the topic’s holy grail—defining beauty in a universal, objective manner—Beauty.AI and its 1.0 and 2.0 competitions only reinforced the limited capabilities of algorithms when it comes to labeling human beings. “There will always be people who not ready to embrace your approach to measuring beauty,” Georgievskaya told Ars in 2016. “But the point is that our robots can see what humans cannot see.” What the humans could not see was that bots (like beauty at large) often involve a degree of subjectivity. And what the robots saw, however, were institutional prejudices—not universal beauty.