A Blog by Jonathan Low


Jun 24, 2012

Tweets at 11: The Next National News Anchor May Be a Robot Thanks to Texts and Twitter

Concerns about society's declining ability to communicate have been rampant for decades. The iconic play and movie "My Fair Lady," exemplifies the genre - and that was set at the end of the 19th century.

But the advent of email, texting and, especially, Twitter, have set off a more concerted round of hang-wringing. Grammatical disregard, epistolary shortcuts and the 140 character limit have all conspired, so the thinking goes, to dumb down the craft of writing to its crudest levels.

An alternative view does exist, however. Which is far more perky and positive. It takes the position that this technological pidgin has created a kind of global shorthand that algorithms can decipher and reconstruct into a narrative. Some information companies are already using this approach to report basic information regarding sports scores and financial results. And some have applied similar analyses to political candidates' speeches. The results are by turns hilarious and frightening, depending on one's point of view about the sanctity of the political process and society's dependence on its outcomes. There is considerable opportunity for abuse, though the very technological nature of the writing makes it easy to identify origins. But the point has been made; when people complain about the robotic quality of news reporting, it may no longer be a simple turn of phrase. JL

David Holmes comments in Fast Company:
Narrative Science has developed an algorithm that can mimic human writing so effectively that… well just have a look at these two Forbes.com leads and see if you can tell which was written by a robot:

“Take-Two Interactive Software (TTWO) is expected to book a wider loss than a year ago when it reports fourth quarter earnings on Tuesday, May 22, 2012 with analysts expecting a loss of 60 cents per share, down from a loss of 23 cents per share a year ago.”

“Take-Two Interactive shares are trading higher after hours Tuesday following the video game publisher’s financial results for the fiscal fourth quarter ended March 31.”
If you guessed the first one, you probably just got lucky.

It almost feels like a kind of magic. But Narrative Science CEO Stuart Frankel says there’s nothing mysterious about the process. First, the algorithm takes a huge amount of data and determines what information is significant and what is noise. This is done based on predetermined parameters so, for example, when writing a baseball recap, the algorithm knows that a strikeout in the eighth inning stranding a runner on third-base is important to the narrative, while a strikeout in the second inning with no one on base hardly bears mentioning. Next, the algorithm figures out what kind of narrative it wants to tell. Is it a come-from-behind victory? A shocking upset? And finally, it converts that information into language. This part gets pretty complicated, but once you have the right data and the right story structure, the rest is just MadLibs.

The phrase "convincingly human" has probably never been used by the Pulitzer Prize committee, but it's good enough when it comes to analyzing large data sets, or the earnings reports that Narrative Science files for Forbes.com. These Reporter-bots are perfect for the kinds of stories journalists don’t tell. Before the year is out, for example, Narrative Science will write between 1.5 and 2 million little league recaps, something no other publication has the resources or desire to do. "What we’ve been able to do is cover a story for a really large albeit disaggregated audience that would not get coverage otherwise," says CEO Stuart Frankel.

So why does every story written about Narrative Science act like the journalist apocalypse is nigh? Because once Narrative Science can begin collecting enough of the right data, its output will almost surely become competitive with real reporters.

Consider a recent Narrative Science project measuring the support for Republican Primary candidates on Twitter. Most of the company’s work up to that point dealt with very structured data: box scores, profit margins, and the like. But here, Narrative Science made sense of vast amounts of unstructured data--in this case tweets. And what are tweets if not quotes, the bread-and-butter of traditional journalists? "As we develop technology to extract data from unstructured sources, particularly Twitter, we can pull information from Twitter conversations and source data to generate stories," says Frankel. Relying on tweets to write stories about public opinion trends makes sense. After all, it can’t be much worse than basing stories on notoriously inaccurate exit polls.

But what about breaking news coverage? Journalists like NPR's Andy Carvin use Twitter to wrangle real-time reports from everyday citizens on the ground in hostile areas. But Carvin takes painstaking steps to verify information before labeling it “confirmed.” Can an algorithm be trained to do that?

"We can pull information from Twitter conversations and source data to generate stories," says Frankel.Maybe. Last December, the Guardian posted a series of data visualizations that tracked how rumors were spread and later debunked on Twitter during the London riots. What the data junkies at the Guardian found was that the Twitter community itself was quite adept at calling bullshit on false information, usually within hours (though journalists certainly played a role in the debunking).

Even whistleblowers nowadays are as likely to leak sensitive information to the Internet as they are to call up a reporter. Once their testimony becomes data, Narrative Science can work its magic. "If the data is there, and a human can write that story using the data, then we can write that story."

NYU Journalism professor Clay Shirky predicted the rise of robot-journalism in 2009, and wrote that its success will depend on whether audiences can trust a robot to be as authoritative a source as, say, Walter Cronkite. In A Speculative Post on Algorithmic Authority, Shirky writes:

"There’s a spectrum of authority from 'Good enough to settle a bar bet' to 'Evidence to include in a dissertation defense', and most uses of algorithmic authority right now cluster around the inebriated end of that spectrum, but the important thing is that it is a spectrum, that algorithmic authority is on it, and that current forces seem set to push it further up the spectrum to an increasing number and variety of groups that regard these kinds of sources as authoritative.”

Good journalism isn’t about writing like a human. It’s about trust. And as trust in conventionally authoritative sources continues to erode, Narrative Science's robots may be lying in wait to pick up the slack.


Toasty Toad stool said...

that was interesting, thanks for great share mcdvoice myhoneybakedfeedback

Post a Comment