A Blog by Jonathan Low

 

Jan 25, 2020

How Rotten Tomatoes Calculates Film Ratings. Hint: Humans Matter

Not scientific, you say? This is art, not algorithm. JL

Simon Van Zuylen-Wood reports in Wired:

The Tomatometer has become a Good Housekeeping Seal for visual entertainment. The Tomato­meter is run by a team of “curators” who read every known review from a pool of approved critics, then decide if each is positive or negative. Once a movie has five reviews, it is Tomatometer-eligible. Each film's Tomatometer score is equivalent to the percentage of “positive” reviews it has accumulated If a movie generates a 59% or lower, it's Rotten. 60% or higher, it's Fresh.
Tim Ryan is an excitable 42-year-old film savant with a mop of reddish hair. In his early twenties, he worked as a news­paper reporter in Rhode Island and spent his downtime bingeing the classics. “Like Godard, and Russian propaganda films,” he says. Eventually he moved to the Bay Area, where the fledgling movie-rating website Rotten Tomatoes was then based. In his quest to devour the entire canon, Ryan had become a Rotten Tomatoes obsessive. When a job opened up at the site in 2004, it felt like a life-changing opportunity. He landed it, and now Ryan compares himself to the Mark Wahlberg character in the critically panned movie Rock Star. He went from “being the biggest fan to being the lead singer.”
Ryan is the site's longest-tenured employee, and he recently committed himself to an ambitious project he'd been chipping away at for a while. When I visited the Rotten Tomatoes offices—now in Beverly Hills—in October, he put it this way: “One thing I've been thinking about is, what if Rotten Tomatoes always existed?” Ryan was going to rate every movie ever made. Or, more precisely, every review of every movie ever made.
The world's first feature film, called The Story of the Kelly Gang, is an hour-plus romp about a band of outlaw Australian bushrangers. Events depicted include cattle theft, bank robbery, and attempted train derailment. It premiered on December 26, 1906, in Melbourne's Athenaeum Hall, to general delirium. A day later came the world's first proper feature-film review, which Ryan tracked down in a digitized version of the Melbourne paper The Age. From the review:
“A conscientious and, on the whole, a cred­itable, effort has been made to reproduce the tragedies as they occurred, and if there were any imperfections in detail probably few in the hall had memories long enough to detect them.”
The movie played for five sold-out weeks at the Athenaeum, before migrating to a ­theater in Sydney. So Ryan checked out Sydney's Daily Telegraph, where he found world movie review number two.
“The films are clear and distinct, the chief actors concerned in the bush drama are fairly recognizable, the photographs are taken in “Kelly country” and after due allowance is made for certain acknowledged liberties taken, the illustrated record is probably as satisfactory as anything of the kind procurable at this distant date.”
Then, 112 years after they were first published and immediately forgotten, the reviews were uploaded to Rotten Tomatoes. Ryan interpreted the first review as “Fresh,” and the second one “Rotten.” Until further notice, and possibly until the end of time, the internet's authoritative appraisal of The Story of the Kelly Gang will feature one glistening red tomato and one fetid green splat.
Strange as it is, a website that evaluates films via cartoon tomatoes might be the closest thing our fractured, post-gatekeeper culture has to an arbiter of good taste. The site's Tomatometer has become, as one early employee put it, a Good Housekeeping Seal for visual entertainment. Red means good, green means bad. The Tomato­meter is run by a team of “curators” who read just about every known review from a gigantic pool of approved critics, then decide if each is positive or negative. Once a movie has five reviews, it is Tomatometer-eligible.
For those who've never paused to wonder what the metric actually means, a tutorial: Each film's Tomatometer score is equivalent to the percentage of “positive” reviews it has accumulated. For example, when John Travolta's 2018 mobster biopic Gotti generated a 0 percent rating, it meant that literally none of the 55 critics who appraised the film had any remotely warm feelings about it. If a movie generates a 59 percent or lower, it's Rotten. Sixty percent or higher, it's Fresh.
The site's founder has said he landed on the name Rotten Tomatoes while watching a movie called Leolo, about a boy who thinks he was conceived when an Italian peasant fell into a cart of semen-covered tomatoes. Of course the name more straightforwardly evokes the supposed old-time practice of hurling fruit at unsatisfactory stage performers. In that spirit, the site also offers a second, more Yelp-like rating called the Audience Score, determined by hundreds of thousands of Rotten Tomatoes users who grade movies from 0.5 to 5.
Tim Ryan's maximalist archival project befits the growth of the site. Founded in 1998 by Berkeley postgrads who wanted to rate Jackie Chan movies, Rotten Tomatoes matured into a powerhouse by proving its usefulness to corporate America. Steve Jobs, an early evangelist, name-checked the site during his keynote presentations. Though routinely denounced by the Holly­wood elite, from Meryl Streep to Martin Scor­sese, Rotten Tomatoes has proved an irresistible asset to companies that want you to watch movies.
In 2010 it was bought by Flixster, which was bought the following year by corporate overlord Warner Bros., which in 2016 sold most of its stake to new corporate overlord Fandango, which is itself owned by corporate overlord Comcast NBCUniversal. Now, when you browse for showtimes on Fandango, which is the country's dominant ticket seller, you'll see a Tomatometer beside each release. Rent a movie on ­Google Play, DirecTV, or iTunes—Rotten Tomatoes' corporate partners—there it is again. For studios, the Tomatometer has become a ubiquitous marketing tool, while news coverage of the scores has become its own odd internet subgenre.
As the site's influence grew, it inevitably led to a reckoning. In 2017 producers started blaming low scores for the dismal performance of expensive summer fare—like the Baywatch reboot and the latest terrible Pirates of the Caribbean installment. Casual conspiracy theorists, meanwhile, imagined that Rotten Tomatoes intentionally goosed movie scores according to the wishes of studio bosses. While there is no evidence that curators can be bought, the site's Audience Score is definitely corruptible. In late 2018 and early 2019, it fell prey to a trolling epidemic, as bigoted male comic book fans appeared to bull-rush the site to take down the audience score of superhero movies, like Black Panther and Captain Marvel, whose stars they deemed unacceptably black or female. All of a sudden, along with the rest of the internet, Rotten Tomatoes was not to be trusted. The crowds were not wise.
Still, there is an authoritative allure in the site's numerical scores. As a Rotten Tomatoes user, I reflexively—and nonsensically—trust a Fresh 60 percent Tomatometer over a Rotten 59 percent. Yet the numbers themselves, as I found, can be close to meaningless. In a world of endless choice, on an internet increasingly dictated by predictive algorithms that recommend “for you,” Rotten Tomatoes represents something more analog. And it raises the question: What's the best way to choose? Or, more to the point, who do you trust?
“Is it a review?” This is the question the Rotten Tomatoes curation team asks itself every two weeks, during a meeting called Review of Reviews. On the day I attended, it was led by Haña Lucero-Colin, the site's 27-year-old TV czar. Rotten Tomatoes' office, which it shares with the larger Fandango staff, has a Silicon Valley feel. Walls you can write on. Walls you can remove. Pods, booths, nooks. The orange of Fandango's logo everywhere. But this meeting felt less startup and more extremely random J-school seminar.
The meeting works like this: Curators submit articles that may or may not be reviews, and the room decides if they are. That's it. Rotten Tomatoes will not consider reported features, tweets, or—to its eternal credit—recaps. Today's submissions include a Guardian piece on 30 Rock's overreliance on celebrity guests, a rambling discussion on a culture podcast, and a 2008 Entertainment Weekly piece about the short-lived daytime program The Bonnie Hunt Show. All were swiftly labeled nonreviews.
The slipperiest example of the day was a piece by Matt Zoller Seitz on New York magazine's Vulture site about a new Nancy Drew show on the CW. Robert Fowler, a TV curator, laid out the problem. It seemed to Fowler that when Zoller Seitz started to write about the series, “he decided, ‘Maybe I'm just going to pontificate on the nature of television.’ As is sometimes his wont. In this case, I think it's kind of a byproduct of a very established television critic maybe being a little bored by his subject matter.” Lucero-­Colin concurred. “I think he got into that Nancy Drew is Twin Peaks is Nancy Drew is Sabrina time loop and got stuck.” Review or not? Nobody could tell. (The solution: Lucero-­Colin emailed Zoller Seitz. He responded concisely: “It's fresh.”)
Meetings like this are crucial to maintaining Tomatometer integrity. Few contemplate this more than Jeff Giles. Bearded, wearing a Henley and a flannel shirt when I met him, he exudes steadiness and chill, which is a good quality to have when you read Joker reviews for a living. A New Hampshire resident who mostly works remotely, Giles began curating for Rotten Tomatoes in 2005. Since then he's also started a pop-­culture site and written a 381-page oral history of the soap opera One Life to Live.
Giles, 45, leads the theatrical department. That sounds grander than it is. Of Rotten Tomatoes' four dozen employees, just 12 are curators. Three work on historical reviews. Seven monitor the content fire hose that is peak TV. That leaves just two, including Giles, working full-time on movies.
Giles, who was in Beverly Hills on a regular visit, stared at his laptop while I observed his daily labors. Each curator is responsible for a list of publications. Giles, as eminence grise, handles many of the critics—or “sources,” in Rotten Tomatoes argot—at A-list publications: The New Yorker, The New York Times, Slate. The job: Evaluate a review's freshness, then trawl for a good pull quote to slap on the website. First up on his list is a Hollywood Reporter review of an Indian film called The Wayfarers. The review is meandering and difficult to evaluate. Luckily, it comes with a helpful “bottom line” that makes the decision for Giles: “A slow starter turns into something deeply moving.” OK, then. Fresh. After that we plow through a pretty clear-cut Richard Brody review in The New Yorker entitled “Springtime for Nazis: How the Satire of Jojo Rabbit Backfires.”
Craving a challenge, I ask Giles for a tougher call. He cites a condescending but lighthearted review he had already logged of the Downton Abbey movie. “He seems to think that it didn't need to exist,” Giles remarks of the critic, Anthony Lane, also of The New Yorker. “But it wasn't a painful experience, you know?” Reminder: There are no official Fresh-or-Rotten criteria. No quota for superlatives, no scale for snark. There is only a curator's gut check. Conflicted, and kind of a Downton homer, I was leaning Fresh. Giles agreed. “Sometimes we call those a Gentleman's Fresh.” Benefit of the doubt. But he had forgotten his official assessment: Lane's review was marked Rotten. (I asked Giles what he likes in a reviewer. “Clearly stated opinion,” he said.)
For publications that use letter grades, Giles tends to mark Fresh any review that gets a B- or higher. Speed and shortcuts are appreciated. Kristin Livingstone, who spent a year as a curator, says curators often lob nebulous reviews to their colleagues on a company Slack channel. “Some curators would tell you almost immediately if it was Fresh or Rotten,” she says. “Like, there was no way you read this!”
Curators were expected to rate at least 50 reviews a day, Livingstone says, a pace that allowed little time for contemplation, especially when powering through the site's expanded YouTubed and podcasted criticism. Weekly review counts were shared on a Google spreadsheet. “It felt like a leaderboard, like in Glengarry Glen Ross.” (Rotten Tomatoes says that the target benchmark is 200 reviews per week and that employees face no penalty if they don't hit it.)
Rotten Tomatoes has started to tackle its volume problem by allowing critics to upload and rate their own reviews. About 30 percent now do, but I get the sense many would prefer the Tomatometer didn't exist at all. Time film critic Stephanie Zacharek bemoans the site's inability to reckon with “an amazing performance in a terrible film.” Most critics—apologies to Roger Ebert—aren't in the thumbs-up, thumbs-down business.
The Tomatometer has been further distorted by the triumph of “poptimism”—critical faith in commercial success stories. “TV critics during the '90s were insanely mean,” says Lucero-Colin, who spent last year on a team reviewing reviews of every scripted TV show to premiere in the 1990s. “Every other review was like, ‘This show is crap and we'll never watch it again.’ When you read a lot of TV criticism today, it's much more didactic. It's like, ‘Well, they do this really well. And this is not great. But I still like the star.’ ” Furthermore, because the Tomatometer doesn't distinguish between raves and Gentleman's Freshes, popcorn crowd-pleasers and classics are often rated identically. (Spider-Man: Into the Spider-Verse: 97 percent; Alfred Hitchcock's Vertigo: 95 percent.) Annual average Tomato­meter scores, according to a recent analysis, have never been higher.
Understood as a shorthand for film quality, the Tomatometer, as Alison Willmore, a critic at New York, puts it, is actually a measurement of “consensus”: film criticism as popularity contest. This, conveniently, boosts Rotten Tomatoes' visibility. “Because everything boils down to positive or negative, that's why you get stuff up in the 90s and stuff in the single digits,” says Matt Atchity, the site's former editor, who left in 2017. Rotten Tomatoes' brainier, less popular rival Metacritic culls from a smaller number of reviews and seems to assign a lot more ho-hum scores. “What keeps Rotten Tomatoes popular, what helps keep them in the news, are those extreme numbers,” Atchity says.
So back to The Story of the Kelly Gang. The world's first review, rated Fresh, didn't have anything explicitly bad to say about the movie. And yet adjectives like “creditable” and “conscientious” are not exactly glowing. The second review opened with the assertion that the real-life story of the gang was not a “splendid advertisement” for Australian values. It was rated Rotten. But the critic didn't impugn the film itself, and in fact seemed to think it was pretty well made. The point isn't that Ryan reviewed the films incorrectly. I probably would have done the same. The point is, the Tomatometer forces a false choice: Fresh or Rotten. There is no Underripe or Overripe tomato.
Giles recently heard from a critic who objected to a Fresh rating he'd given a review. “She said, ‘I really didn't like this movie. Can you make it Rotten?’ And I said, ‘Absolutely. However, I have to ask, why did you make it a B-?’ And the response was basically, ‘I hate grading things. It's arbitrary.’” Giles added, “I agree completely.”
My second day at Rotten Tomatoes, I went to lunch with some of the site's editorial staff. These are the front-facing Tomato­people, separate from the curators. They interview movie stars. They schmooze at film festivals. They write hot takes for the site. I asked if, as de facto brand ambassadors, they find that people understand Rotten Tomatoes. No, came the reply, they do not. One editor, Jacqueline Coley, said that she tells Uber drivers she's a traveling nurse, so they don't start accosting her about scores she can't control. She also hears complaints about “the algorithm.” Says Coley, incredulous: “We don't have an algorithm!”
Indeed not. This is why review-­bombing trolls caused such grief not just to studios but to Rotten Tomatoes itself. When audience scores for The Last Jedi began plummeting to suspiciously low depths a couple of years ago (it's currently at 43 percent, with a Tomato­meter score of 91), casual users couldn't know if the criticism was representative of the film-going public or just Gamergate runoff protesting the film's casting inclusivity (or some other niche superfan grievance, for that matter). Absent its reputation for accurate ratings, Rotten Tomatoes is nothing.
To bolster that trust, Rotten Tomatoes fixed an obvious problem: It forbade people from rating movies before they actually came out. It also began verifying the reviews of tomato throwers who could prove they bought their tickets on Fandango. The new verified rating is now the site's default Audience Score. (Rotten Tomatoes says it is working with cinema chains to verify their ticket stubs too, but for now this arrangement obviously benefits … Fandango.) Still, there's nothing stopping people from bombing a movie for nefarious purposes after it comes out.
These changes took place in tandem with a parallel overhaul of its critics' criteria, designed to make its Tomato­meter more representative. Prior to August 2018, Tomatometer-approved critics were almost exclusively staff writers from existing publications, who tended to be whiter, maler, and crustier. Since the site changed its policies, it's added roughly 600 new critics—the majority of whom are freelancers and women. But that also means there are now a stunning 4,500 critics, some of whom inevitably will be terrible. A couple of years ago, an approved critic named Cole Smithey, who writes for Colesmithey.com, bragged about intentionally tanking Lady Bird's then-100 percent rating with a negative review.

It's hard to know how much of a difference high or low scores make at the box office. In late 2018, Morning Consult conducted a national poll and found that one-third of Americans look at Rotten Tomatoes before seeing a movie, and 63 percent of those have been deterred by low scores. Whatever the effect, appearance is everything in Holly­wood. Nobody wants a green tomato. Studios hold screenings for critics as close to release dates as possible, to delay splats, while disputing rotten ratings to curators like Giles.
“I've noticed over the last year that Certi­fied Fresh is more important for studios and filmmakers,” he says, referring to the little badge movies get if the Tomato­meter is 75 percent or higher for a minimum of 40 film reviews. “They know the value we add to their marketing.” The AMC movie chain—the largest in the country—displays the Tomatometer on its websites, but only next to movies that are Certified Fresh.
In any case, Fandango did not buy Rotten Tomatoes to discourage people from seeing movies. To that point, the site doesn't have its own boss. Instead, it's led by Fandango's president, a fit, ageless-looking Canadian named Paul Yanover. He started out developing software for animators working on Disney's original Beauty and the Beast, and he doesn't seem like a suit, exactly. But he knows how the popcorn gets buttered. “I think we actually see ourselves as a really useful marketing platform for the studios,” he told me.
Fandango makes money in several ways. It earns a cut of the “convenience fee” you pay when you buy a ticket on its platform. It also strikes licensing agreements with content providers who want to use the Tomatometer.
Obviously, Rotten Tomatoes practices enormous independence,” Yanover says. “But Fandango, equally so, is a retailer of tickets and streaming.” As he sees it, the missions of Rotten Tomatoes and Fandango are identical: to get you in front of content you'll enjoy.
That, of course, is also the job of Netflix's predictive algorithm. Difference is, Netflix knows your preferences better than the critics do, maybe even better than you know them yourself. Netflix does not show you a Tomatometer when you browse. It doesn't show you any user ratings at all. Instead, it suggests movies and shows it thinks you'll like, based on movies and shows you've already watched. This, of course, is how Spotify's playlists and Facebook's News Feed work; they're content curators too. In our era of digital excess, we're being recommended to all the time. Paralyzed by choice, we'll take the suggestions.
Given the flaws of the Tomatometer, why use Rotten Tomatoes at all? Here's one reason: While the point of Netflix's algorithm is to keep you on its site as long as possible, the intent of Rotten Tomatoes, ultimately, is to get you off the site. Sure, it'd like you to go first to Fandango, but then to the movies or maybe a random Gunsmoke rerun. It will lead you—or not, if the reviews are bad—to whatever you looked up in the first place, presumably of your own volition. “What you kind of hope is that someone will have a list of Rotten Tomatoes critics they kind of like and trust,” says Zacharek, the Time critic. “They'll click on a link and look at a review.” Used properly, Rotten Tomatoes becomes a resource of nearly infinite vastness. Which was kind of the point of the internet in the first place.
Since Tim Ryan started his archival project, Rotten Tomatoes has created roughly 210 pages for old-time movies on its site, thanks to 5,500 ancient reviews he unearthed, many by critics who are all but forgotten. For anyone actually interested in reading movie reviews, there they are.

1 comments:

Molly said...

The thing that I enjoy doing the most is watching films, although doing so takes my attention away from my schoolwork. The fact that there are now websites like BestWritersOnline that publish reviews of the top writing services, makes me feel very fortunate.

Post a Comment