Jan 28, 2019

GDPR Makes It Easier To Get Your Data - But Doesn't Mean You'll Understand It

Big tech companies are complying with GDPR resentfully and as incompletely as they can. This includes reformatting data to make it harder to understand.

Given the profits this data is proving them, their resistance is not surprising. The question is how much further Europe and other governments will be in demanding fuller compliance. JL

Jon Porter reports in The Verge:

The problem is that companies can be stingy about providing data. If your service is “forcing consent” (as Google was recently fined €50 million for doing), then you might not want your users to see how much personal data you’re collecting. It takes the full 30 days to receive a link to download data (the limit imposed by the regulation). Some files (are)ambiguously labeled, while others were stored in formats that tested the limits of what constitutes “commonly used.” GDPR regulations have a long way to go if they want to give control over data. Making it useful means ensur(ing) what’s downloaded is easier for the average person to understand.
If the numerous tech scandals of recent years have taught us anything, it’s that tech companies hold a truly terrifying amount of data about us all. Along with feeling invasive, this data can be outright dangerous when it falls into the wrong hands.
Europe’s response to that risk, put in place as part of the General Data Protection Regulation (GDPR), is the “Right of Access.” The right says that, when requested, any company should be prepared to provide you with your personal data. They should provide it in a way that’s easy for you to read, in a timely manner, and with enough background information for you to understand how they got it and how they use it. The thinking is that once you know what data a company holds about you, you can use it to make informed decisions about whether you want to provide it, as well as holding them accountable when they gather data without your consent.
The problem is that companies can often be really stingy about actually providing this data. After all, if your service is essentially “forcing consent” (as Google was recently fined €50 million for doing), then you might not want your users to easily see how much personal data you’re collecting.
I decided to test the “Right of Access” offered by four of the biggest tech companies operating in the EU: Apple, Amazon, Facebook, and Google. What I found suggested that while you can certainly get the raw data, actually understanding it is another matter, which makes it harder to make informed decisions about your data.
According to the UK data protection regulator, the ICO, companies must provide all personal data — defined as any data that relates to an identified or identifiable individual — on request. The information must be provided to the individual in a “concise, transparent, intelligible and easily accessible form, using clear and plain language” in a “commonly used electronic format.” It sounds simple enough, but how did each of the four tech giants do?
It was easy to download my data in the first place. Both Google and Apple’s data download services let you pick and choose what data you want to download. Facebook doesn’t, but all three are easy to find on their respective websites, and it arrives quickly. Meanwhile, rather than presenting it as an easy option to find on its site, getting a single link with all of your Amazon data relies on you digging through the site’s “Contact Us” page to find the option hidden at the end of the list. Once I requested it, it took the full 30 days to receive a link to download my data (the limit imposed by the regulation).
When it actually came time to look at the data I’d received, however, things got messy. Some files were ambiguously labeled, while others were stored in formats that tested the limits of what constitutes “commonly used.” Actually working out what data I was looking at wasn’t nearly as simple as it should be.
Google’s location-tracking data was particularly hard to understand. The company has been repeatedly criticized for tracking Android users, even when they’ve turned off the main location-tracking option in the operating system. Consumer groups across seven European countries have lodged complaints with their data security watchdogs about it, and downloading your data using GDPR should be a way of checking that a service isn’t using tricks like these to gather any more data than it should be. It should be a means of holding companies like Google to account.
Google has admitted that it tracks you even if you turn off Location History.
Photo by Chris Welch / The Verge
But when you actually look at the data, this information is very difficult to view and understand. All of my location data from Google was contained within a single 61MB JSON file, and opening it with Chrome revealed a bewildering array of fields labeled “timestampMs,” “latitudeE7,” “logitudeE7,” and estimations about whether I was sitting still or in some kind of transport (I assume).
I don’t doubt that this is all the location history information that Google has associated with my account, but without context, this data is meaningless. It’s a series of numbers that I’d have to make a serious effort to even begin to understand and import into another piece of software to properly parse. If the purpose of GDPR is to allow people to have more control and understanding of what data is collected from them, then this part of Google’s download has little to offer. JSON’s are great if you want to ingest the data into another system, but they’re less helpful if you want to evaluate how much data Google has on you and make informed data privacy decisions.
When it came to other files, it wasn’t even clear what data I was looking at in the first place. A 4GB HTML file called “My Activity” located within the “Ads” folder is presumably showing me something relating to the ad-tracking data that Google has gathered on me, but there are no annotations or metadata here to explain it.
These are, by far, the most confusing files out of the entire data download, and they’re also the most important. They contain the kinds of personal information that potential advertisers would kill for, and Google should make more of an effort to explain what they are. It already provides an Index HTML file to give you an overview of your data, so why not include information in there about the contents of each file?
Apple fared better than Google in the way it presented its data, although there were still problems. First impressions were very positive, though. The majority of the data Apple provided was in file types that were easy to read and understand like CSV, TXT, and JPG, with only a couple of JSON files to confuse things.
But once you get into these files, there’s still a lot of information that’s difficult to understand. A file titled, “Apple ID Account Information” appeared to contain 11 nearly identical records about my Apple account, all created on exactly the same date in 2014, with no explanation as to what they were. Another CSV file with the ambiguous title of “Apps and Service Analytics” appears to contain an entire list of every single one of my App Store searches, but it has so many empty cells that I only noticed it had data in it when I saw its 6.7MB file size.
The creepiness of being able to listen to all my Alexa requests notwithstanding, Amazon did far better with how it presented its data, although this may just have been because of how comparatively little it holds about me. For the most part, files and folders were clearly labeled, although the company still has some work to do on labeling the contents of its spreadsheets better.
Ironically enough, Facebook actually had the most comprehensible data of the four services. For starters, every single file Facebook gives you is an HTML file. Each is sorted into its own clearly labeled folder, and an index file gives you an overview of what each document contains. The files themselves are clearly laid out and formatted, and browsing them feels almost like browsing a page on Facebook itself, albeit one that’s stored entirely locally on your computer.
Facebook’s download includes a lengthy index file that tells you where to find all of your information.
It’s still terrifying to see the amount of data Facebook has stored on you (and that’s not even getting into the instances of people having found records of all their old calls and SMS messages), but at least you’re well-informed about what exactly this information is, rather than having to guess based on the contents of each file.
At the end of my experiment, I’m left with just under 138GB of data across the four services I contacted. I had 1.1GB from Facebook, 392MB from Amazon, and 254MB from Apple. Although Google had a massive 72.5GB of data for me to download, this overwhelmingly consisted of my Google Drive and Google Photos backups, which came in at 44.3 and 25.7GB, respectively. The rest of my Google data came in at just 2.5GB.
After attempting to sift through and understand it all, it’s clear that these companies, and the GDPR regulations that govern them, have a long way to go if they want to give us real control over our data. Being able to download it is one thing, but making it useful means working harder to ensure that what’s downloaded is easier for the average person to understand.
At a minimum, that means providing a better index to tell you what data is contained in what file, but it also means organizing the contents of those files in a way that allows them to make better sense by themselves.


