A Blog by Jonathan Low


Apr 6, 2018

US Court Says Scraping Website Data To Create Fake Profiles Protected By 1st Amendment

Presumptive interpretation: if information is on the internet, it's probably fair game. You've been warned. Again. JL

Mike Masnick reports in Tech Dirt:

Lots of internet companies have go(ne) after sites that scraped content including Craigslist, Facebook and LinkedIn. Does it count as "unauthorized access" if you fail to abide by a site's terms of service? You have to show that you are actually harmed. The Supreme Court give(s) full First Amendment application to the gathering and creation of information. Six courts of appeals have found individuals have a First Amendment right to record  matters of public interest to preserve and disseminate ideas. To scrape data from websites rather than manually record does not change the analysis.
It's no secret that the Computer Fraud and Abuse Act (CFAA) is a mess. Originally written by a confused and panicked Congress in the wake of the 1980s movie War Games, it was supposed to be an "anti-hacking" law, but was written so broadly that it has been used over and over again against any sort of "things that happen on a computer." It has been (not so jokingly) referred to as "the law that sticks," because when someone has done something "icky" using a computer, if no other law is found to be broken, someone can almost always find some weird way to interpret the CFAA to claim it's been violated. The two most problematic parts of the CFAA are the fact that it applies to "unauthorized access" or to "exceeding authorized access" on any "computer... which is used in or affecting interstate or foreign commerce or communications." In 1986 that may have seemed limited. But, today, that means any computer on the internet. Which means basically any computer.
A big question that has come up in multiple CFAA cases is does it count as "unauthorized access" or as "exceeding unauthorized access" if you simply fail to abide by a site's terms of service. This was the way that prosecutors were able to go after Lori Drew, who helped bully a girl on MySpace, who later committed suicide. Drew's actions were despicable, but the only law that prosecutors could get to "stick" was that she violated the CFAA by using a fake name to sign up for MySpace, thereby violating its terms of service... and thus getting "unauthorized access" to MySpace's internet-connected computers. There are both criminal (as in the Lori Drew case) and civil components to the CFAA -- and some companies (*cough* Oracle *cough*) have long fought against reforming the CFAA in the belief that they want to be able to use the law. Unfortunately, lots of internet companies, which should know better, have used the CFAA to go after sites that have scraped some content off their site -- including Craigslist, Facebook and LinkedIn.
There is a case happening now, brought by some researchers and journalists, trying to get the CFAA declared unconstitutional for making scraping of the open internet a crime. In a little-noticed, but highly-entertaining ruling, the district court let the case proceed, but also made some important points about the CFAA, making it clear that the law should be narrowly applied (which actually harms the "is this unconstitutional" question, since the more limited the law is, the less likely it's unconstitutional). Thanks to Andy Sellars who first spotted the ruling, and has a quick Twitter thread with some highlights.
As noted, the ruling is an entertaining read, even from the opening sentence:
It’s a dangerous business, reading the fine print. Nearly every website we visit features Terms of Service (“ToS”), those endless lists of dos and don’ts conjured up by lawyers to govern our conduct in cyberspace. They normally remain a perpetual click away at the bottom of every web page, or quickly scrolled past as we check the box stating that we agree to them. But to knowingly violate some of those terms, the Department of Justice tells us, could get one thrown in jail. This reading of federal law is a boon to prosecutors hoping to deter cybercrime. Yet it also creates a dilemma for those with more benign intentions. Plaintiffs in this case, for instance, are researchers who wish to find out whether websites engage in discrimination, but who have to violate certain ToS to do so. They have challenged the statute that they allege criminalizes their conduct, saying that it violates their free speech, petition, and due process rights.
The question at play here is whether or not (as the government would like) the case should be dismissed at this point, for failure to show standing. In other words, you can't just say "hey, this law sucks," you have to show that you, as the plaintiff, are actually harmed by the law. This leads to a fairly interesting analysis of the First Amendment and the internet, partly building off the Supreme Court's recent Packingham decision about kicking people off the internet. Here, DC district court Judge John Bates goes deep on how the First Amendment and the internet mix:
At the outset, it is necessary to answer a question that affects both the standing and the merits inquiries in this case: what is the First Amendment status of the Internet? And, more particularly, what powers does the government possess to regulate activity on individual websites? The government bases much of its argument that plaintiffs do not have standing, and that they have not alleged a First Amendment violation, on the premise that this case is about “a private actor’s abridgment of free expression in a private forum.”... This argument finds some support in Supreme Court case law, which has rejected the First Amendment claims of individuals who wished to distribute handbills or advertise a strike in shopping centers against the wishes of the property owners.... Private property, the Court determined, does not “lose its private character merely because the public is generally invited to use it for designated purposes.” ... Why, then, would it violate the First Amendment to arrest those who engage in expressive activity on a privately owned website against the owner’s wishes?
The answer is that, quite simply, the Internet is different. The Internet is a “dynamic, multifaceted category of communication” that “includes not only traditional print and news services, but also audio, video, and still images, as well as interactive, real-time dialogue.”... Indeed, “the content on the Internet is as diverse as human thought.” ... Only last Term, the Supreme Court emphatically declared the Internet a primary location for First Amendment activity: “While in the past there may have been difficulty in identifying the most important places (in a spatial sense) for the exchange of views, today the answer is clear. It is cyberspace . . . .” Packingham v. North Carolina...
With this special status comes special First Amendment protection. The PackinghamCourt applied public forum analysis to a North Carolina law that banned former sex offenders from using social media websites, employing intermediate scrutiny because the law was content-neutral.... The fact that the statute restricted access to particular websites, run by private companies, did not change the calculus. Consider: on one of the sites the Court treated as an exemplar of social media, LinkedIn, “users can look for work, advertise for employees, or review tips on entrepreneurship,” ...—the same activities in which Mislove and Wilson wish to engage for their research. As the Court warned, the judiciary “must exercise extreme caution before suggesting that the First Amendment provides scant protection for access to vast networks in [the modern Internet]."... The government’s proposed public/private ownership distinction cannot account for the Court’s determination in Packingham that privately-owned sites like Facebook, LinkedIn, and Twitter are part of a public forum, government regulation of which is subject to heightened First Amendment scrutiny. The Internet “is a forum more in a metaphysical than in a spatial or geographic sense, but the same principles are applicable.”
I worry about this argument. It appears to expand on the (faulty, in my belief) argument that we've been seeing in numerous recent cases trying to misuse the Packingham ruling to mean that no platform can ever kick a user off their service, as that would violate their First Amendment rights. The point of the Packingham ruling was not that any individual should be able to demand a "First Amendment right" to use any website, but rather that the government cannot force an individual off the entire internet. Note the big difference: one is about private parties choosing who they can block through technical means. The other is the government blocking people from the entire internet through legal means. But Bates seems sympathetic to the first interpretation. And then there are some weird analogies:
The First Amendment does not give someone the right to breach a paywall on a news website any more than it gives someone the right to steal a newspaper.
But, Judge Bates notes, there's a real difference from a site that has taken technological measures to keep people out, as compared to sites that are open for nearly everyone, even if they have terms of service that might be broken. And here, Judge Bates suggests that contractual provisions blocking a user from a site are quite different from getting past a technological measure (heavily quoting Orin Kerr):
What separates these examples from the social media sites in Packingham is that the owners of the information at issue have taken real steps to limit who can access it. But simply placing contractual conditions on accounts that anyone can create, as social media and many other sites do, does not remove a website from the First Amendment protections of the public Internet. If it did, then Packingham—which examined a law that limited access to websites that require user accounts for full functionality—would have come out the other way....; see also Orin S. Kerr, Cybercrime’s Scope: Interpreting “Access” and “Authorization” in Computer Misuse Statutes,... (“Applying a contract-based theory of authorization in a criminal context . . . may be constitutionally overbroad, criminalizing a great deal beyond core criminal conduct, including acts protected by the First Amendment.”). Rather, only code-based restrictions, which “carve[] out a virtual private space within the website or service that requires proper authentication to gain access,” remove those protected portions of a site from the public forum. Orin S. Kerr, Essay, Norms of Computer Trespass, .... Stealing another’s credentials, or breaching a site’s security to evade a code-based restriction, therefore remains unprotected by the First Amendment.
And here's where there's some nice language pointing out that scraping a website is protected activity under the First Amendment:
First, scraping plausibly falls within the ambit of the First Amendment. “[T]he First Amendment goes beyond protection of the press and the self-expression of individuals to prohibit government from limiting the stock of information from which members of the public may draw.” First Nat. Bank of Boston v. Bellotti, 435 U.S. 765, 783 (1978). The Supreme Court has made a number of recent statements that give full First Amendment application to the gathering and creation of information. Additionally, six courts of appeals have found that individuals have a First Amendment right to record at least some matters of public interest, in order to preserve and disseminate ideas. That plaintiffs wish to scrape data from websites rather than manually record information does not change the analysis. Scraping is merely a technological advance that makes information collection easier; it is not meaningfully different from using a tape recorder instead of taking written notes, or using the panorama function on a smartphone instead of taking a series of photos from different positions. And, as already discussed, the information plaintiffs seek is located in a public forum. Hence, plaintiffs’ attempts to record the contents of public websites for research purposes are arguably affected with a First Amendment interest.
Separately, the court notes a "First Amendment interest" in lying to a website, i.e. creating a fake profile:
Second, plaintiffs have a First Amendment interest in harmlessly misrepresenting their identities to target websites. The complaint alleges that plaintiffs’ research requires them to create false employer and job-seeker profiles on employment websites, and to use sock puppets to make it appear to a number of housing and employment sites that multiple people are accessing the information they have made available. Compl. ¶¶ 88–93, 114–21. Because “some false statements are inevitable if there is to be an open and vigorous expression of views in public and private conversation,” and because “[t]he Government has not demonstrated that false statements generally should constitute a new category of unprotected speech,” false claims that are not “made to effect a fraud or secure moneys or other valuable considerations” fall within First Amendment protection.
On the scraping question, Judge Bates (this time, correctly) points out that there's a difference between websites being forced to hand over information that they want kept secret, and information that is publicly available, but that the companies just don't want scraped. And that distinction makes all the difference in the world.
Here, plaintiffs are not asking the Court to force private websites to provide them with information that others cannot get. Instead, they seek only to prevent the government from prosecuting them for obtaining or using information that the general public can access—though they wish to do so in a manner that could have private consequences, such as a website banning them or deleting their accounts.
The court does seem a bit concerned about the fact that none of the plaintiffs has been threatened with a CFAA action, let alone actually facing one. But then quickly notes that there's a low bar for these issues:
The government asserts that plaintiffs cannot meet this test, because “plaintiffs make no allegation that the government has threatened them with CFAA enforcement,” plaintiffs “cite no instances in which the government has enforced the challenged provision for harmless [ToS] violations,” and DOJ “has expressly stated that it has no intention of prosecuting harmless [ToS] violations that are not in furtherance of other criminal activity or tortious conduct.” Def.’s Reply at 13. The government is, for the most part, correct on the facts. The complaint does not allege that plaintiffs have actually been threatened with prosecution. The two cases plaintiffs cite to show that prosecutors have used the Access Provision to punish ToS violations did, in fact, involve harmful conduct. And DOJ’s guidance to federal prosecutors does discourage them—though somewhat tepidly—from bringing CFAA cases based solely on harmless ToS violations. See U.S. Att’y Gen., Intake and Charging Policy for Computer Crimes Matters (“Charging Policy”) (Sept. 11, 2014) ...
However, both Supreme Court and D.C. Circuit precedent create a low standing bar in cases like this one. Because plaintiffs “challenge [a] law[] burdening expressive rights,” and... because their complaint provides “a credible statement . . . of intent to commit violative acts,” plaintiffs may rely on the “conventional background expectation that the government will enforce the law.”
Even better, the court notes the mere chilling effects of the risk of being threatened and/or sued for scraping websites for research and journalism purposes. And, it notes that there are similar historical examples, including the Lori Drew case I mentioned above.
That the government brought the Drew case without enough evidence to ultimately prove the added harm required for a felony conviction, and chose to include a misdemeanor count for harmless ToS violations, lends some credibility to plaintiffs’ fears of prosecution.
The court also points out that just because the DOJ claims it probably wouldn't bring CFAA charges against researchers such as those in this case, that's not nearly enough:
In an attempt to provide such a disavowal, and at the Court’s suggestion, the government filed an affidavit from John T. Lynch, Jr., Chief of the Computer Crime and Intellectual Property Section of the Criminal Division of DOJ. ... He points to the charging factors mentioned above... and states that he “do[es] not expect that the Department would bring a CFAA prosecution based on such facts and de minimis harm,” ... But many things that we do not expect in fact come to pass. An official’s prognostication does not substitute for a declaration of non-prosecution. Moreover, even explicit disavowals are most valuable when they are made “on the basis of the Government’s own interpretation of the statute and its rejection of plaintiffs’ interpretation as unreasonable.” ... Here, the government has implicitly—and in past prosecutions, explicitly—read the Access Provision to include ToS violations.... “[T]o rely upon prosecutorial discretion to narrow the otherwise wide-ranging scope of a criminal statute’s highly abstract general statutory language places great power in the hands of the prosecutor.” Marinello v. United States,
From there, the Court basically punts on the larger Constitutional questions. It is clearly well aware of the myriad CFAA cases out there, and the somewhat haphazard rulings over the years. After going through some examples of the different courts that have ruled on the issue of what the CFAA covers, this one decides to go with a narrow interpretation, more or less recognizing a broad definition would be insane.
The question thus remains whether “exceeds authorized access” refers to access alone or to access, use, and other violations. The Court finds the narrow interpretation adopted by the Second, Fourth, and Ninth Circuits—and by numerous other district judges in this Circuit—to be the best reading of the statute. First, the text itself more naturally reads as limited to violations of the spatial scope of one’s permitted access. To “exceed[] authorized access,” one must have permission to access the computer at issue, and must “use such access”—i.e., one’s authorized presence on the computer—“to obtain or alter information in the computer.” ...Thus, unlike the phrase “unauthorized access” used alongside it in several CFAA provisions, the phrase “exceeds authorized access” refers not to an outside attack but rather to an inside job.... The rest of the definition requires that the information at issue be information “that the accesser is not entitled so to obtain or alter.” ... The key word here is “entitled.” “And, in context, the most ‘sensible reading of “entitled” is as a synonym for “authorized.”’” ... The focus is thus on whether someone is allowed to access a computer at all, in the case of “unauthorized access,” or on whether someone is authorized to obtain or alter particular information, in the case of “exceeds authorized access.” In neither instance does the statute focus on how the accesser plans to use the information.
And with that narrow definition, the court basically can sidestep the larger Constitutional questions:
While the CFAA’s text and legislative history point strongly toward an access-only interpretation of “exceeds authorized access,” a broader reading is not entirely implausible; therefore, constitutional avoidance applies. ... In interpreting the statutory text, the Court need not determine whether plaintiffs’ constitutional arguments would actually win the day. Rather, the Court undertakes “a narrow inquiry” into whether one reading “presents a significant risk that [constitutional provisions] will be infringed.”
Of course, the court notes that if the CFAA were not read so narrowly, the Constitutional concerns seem fairly obvious:
Here, significant risks abound. By providing for both civil and criminal enforcement of websites’ limitless ToS—including enforcement by the same entities that write the ToS—a broader reading of the CFAA “would appear to criminalize a broad range of day-to-day activity” and “subject individuals to the risk of arbitrary or discriminatory prosecution and conviction,” raising Fifth Amendment concerns. ... By incorporating ToS that purport to prohibit the purposes for which one accesses a website or the uses to which one can put information obtained there, the CFAA threatens to burden a great deal of expressive activity, even on publicly accessible websites—which brings the First Amendment into play.... If “exceeds authorized access” is read broadly, plaintiffs claim, the Access Provision could even run afoul of the Fifth Amendment by delegating power to private parties to define restrictions “limitless in time and space,” which can then operate as petty civil and criminal codes.
And, really, it appears that Judge Bates has about zero interest in digging into what a mess that would be:
All of these factors, therefore, lead the Court to adopt a narrow reading of the term “exceeds authorized access.” Just as an individual “accesses a computer ‘without authorization’ when he gains admission to a computer without approval,” an individual “‘exceeds authorized access’ when he has approval to access a computer, but uses his access to obtain or alter information that falls outside the bounds of his approved access.”
From there, the court looks at what the plaintiffs in this case want to do, and this part of the ruling is actually pretty useful, detailing why such scraping of websites is not a CFAA violation:
Applying this standard, it becomes clear that most of plaintiffs’ proposed activities fall outside the CFAA’s reach. Scraping or otherwise recording data from a site that is accessible to the public is merely a particular use of information that plaintiffs are entitled to see. The same goes for speaking about, or publishing documents using, publicly available data on the targeted websites. The use of bots or sock puppets is a more context-specific activity, but it is not covered in this case. Employing a bot to crawl a website or apply for jobs may run afoul of a website’s ToS, but it does not constitute an access violation when the human who creates the bot is otherwise allowed to read and interact with that site.
And then... the Court gets a little Star Wars slap happy:
The website might purport to be limiting the identities of those entitled to enter the site, so that humans but not robots can get in. See Star Wars: Episode IV – A New Hope (Lucasfilm 1977) (“We don’t serve their kind here! . . . Your droids. They’ll have to wait outside.”). But bots are simply technological tools for humans to more efficiently collect and process information that they could otherwise access manually. Cf. Star Wars: Episode II – Attack of the Clones (Lucasfilm 2002) (“[I]f droids could think, there’d be none of us here, would there?”).
This then leads into a separate issue: of whether or not creating fake accounts for the purpose of research might violate the CFAA, and if that would be unconstitutional. The court denies the DOJ's motion to dismiss. The plaintiffs argue that the law's "access provision" violates the First Amendment. The DOJ says it's not targeting speech, but conduct. Here, the court rejects the motion to dismiss, but doesn't seem particularly won over by the plaintiffs' arguments either. It basically says that the government needs to show that the law is narrowly taiolored to specific government interests, and that such results can't be obtained through other less burdensome means. And here, the government simply failed to provide the evidence (though it may in the future):
At this early stage, the government has not put forward any evidence to show that prosecuting those who provide false information when creating accounts, without more, would advance its interest in preventing digital theft or trespass
And thus, the court notes that false speech by the plaintiffs (i.e., signing up for fake accounts to do research) is protected by the First Amendment:
At this stage, “absent any evidence that the speech [would be] used to gain a material advantage,” Alvarez, 567 U.S. at 723, plaintiffs’ false speech on public websites retains First Amendment protection,... and rendering it criminal does not appear to advance the government’s proffered interests. Hence, plaintiffs have plausibly alleged an as-applied First Amendment claim, and the motion to dismiss that claim will be denied.
Judge Bates does dismiss another Constitutional claim, concerning whether the Access Provision of the CFAA violates the right to petition, basically saying this appears to be a stretch and is redundant of other claims in the case. Similarly, a Fifth Amendment claim for "vagueness" is dismissed, noting that the Court has already construed a narrow interpretation on the CFAA, and thus that solves most of the vagueness problem that would allow for too much discretion.
The end result here is something of a mixed bag. Most (though not all) of the Constitutional claims fail, in large part because the court is able to construe the CFAA in a narrow manner. It's good that the Court sees the CFAA to be narrow, but that means the law is more likely to remain on the books (though hopefully with the narrow interpretation remaining in place and actually respected). There are still some First Amendment claims, but it's not clear that those will hold up for long either. But, for now, they survive. Still, the language concerning First Amendment protections in scraping websites as well as in creating fake profiles is nice to see.


Post a Comment