A Blog by Jonathan Low

 

Apr 19, 2018

As the Value of Personal Data Rises, the Battle Over Who Owns It Is Intensifying

Personal information has been monetizable for some time. The question now, as the value of that data becomes clearer, is whether the current ownership model can survive. JL

David Floyd reports in Investopedia:

In the year 2022, $7,600 worth of personal information will be bought and sold per person. Data brokers just aren't that good at it. Matching cookies based on device IDs has a success rate of 2.9%. "The data that are publicly available, which can then be scraped by a broker, only amounts to 10% of the data a user creates. The rich information, such as likes, posts, check-ins, is off bounds." The data in silos or held by giant corporations is underutilized. The threat blockchain and other cryptographic techniques pose to data brokers is more immediate than the threat they pose to platforms.
Blockchain technology has been dragged through the muck in recent months by scammers, charlatans and comedians. A joke cryptocurrency attracted piles of real cash, a kleptocratic regime announced an ICO, and an ice tea company pivoted to bitcoin mining. (For about a minute.) In the wake of all this hype, plunging cryptocurrency prices have hardly helped matters. It's time for a reminder that, while no panacea, blockchain technology is extremely good at doing one, quite useful thing: removing intermediaries.
Take bitcoin, the original token on the original blockchain, which gave people an unprecedented third option for transferring money. Prior to bitcoin's invention, one of two things was possible: handing money over in person, or trusting a middleman such as a bank to do it on your behalf. With bitcoin, you can transfer money remotely without a middleman. This is a first in human history.
The economy harbors plenty of middlemen besides banks, but projects based on bitcoin's core innovation, the blockchain, have the potential to challenge them as well. Take data brokers. These are mostly obscure, with meaningless, x-heavy names like Acxiom, DataLogix, Experian, Ameridex, and the now-famous Equifax. These firms scrape consumer data – financial transfers, social media activity, browsing history, e-commerce purchases, location data – from public sources, or buy it from digital services. (Here's everywhere PayPal customers' data ends up, for example.)
The brokers analyze this data to determine everything from hobbies to credit worthiness to addictions to sexual orientation. They sell it on to advertisers, card issuers, prospective employers and whoever else might be interested. In this way, each individual consumer generates a nice stream of rent for an industry that gives them nothing in return. Equifax Inc. (EFX
) earned $488.8 million in profit in 2016, $3.36 for each of the 145.5 million victims of the data breach it announced in September.
Accounting for the number of players in the industry and the rapid growth in the amount of data users produce, Datawallet – more on them in a bit – estimates that in the year 2022, roughly $7,600 worth of personal information will be bought and sold per person, a quantity the firm's founder and CEO Serafin Lion Engel likens to a universal basic income.
It can only serve that purpose, though, if the money does not go to middlemen, but to the people who actually generate the value. That's hardly the case today. User data, often called the "new oil," more closely resembles the new guano. These nitrogen-rich seabird feces were the most sought-after fertilizer in the world for most of the 19th century. Like data, guano was procured through extraction, rather than transaction. And as with data, the seabirds that produced the stuff were never compensated.
Users of digital services are treated a bit like oblivious gulls who happen to excrete an immensely productive resource, rather than owners of an asset they create. Blockchain technology and related cryptographic techniques could change that, giving us control over our personal data and enabling us to sell it to whomever we please.

"You have the monopoly"

Datawallet is one of the companies trying to bring this change about. The app has mostly gained media attention as a way to earn $5 or $10 a month selling Facebook likes, Amazon purchases, Uber rides and Airbnb trips. Engel expects early adopters to be college students "who are in it for beer money."
But the idea behind Datawallet holds a more fundamental appeal, the ability to control what Engel calls a "self-sovereign wallet," which makes the user the sole owner of their data and the only one with the ability to grant access to it. Engel says, "you have the monopoly over that data about you."
Datawallet is just one among a slew of blockchain-based apps aiming to do away with data middlemen wherever they can be found.
Medicalchain is tackling medical records, giving patients full control over some of their most sensitive information and bypassing the healthcare system's decrepit infrastructure (think fax machines) in the process. Loomia is going after smart textiles, an industry in which other players are eager to harvest and hoard heart rates, geographical movements and even more intimate metrics (think smart mattresses).
Most of these projects are in their very early stages, but if they come to fruition, something unprecedented and rather strange may emerge: empty platforms, places that facilitate commerce in data, but where no one party is doing the facilitating. Henri Pihkala, founder and CEO of Streamr, a blockchain-based platform for live data streams, captures the paradox: "we make a central place which is decentralized."

The tech: Keys, hashes, smart contracts

How does that work? The details vary, but Datawallet's solution is generally representative of the technology that enables these decentralized platforms.
Keys
Say you want to sell some of your personal data – for example, your Facebook activity or Amazon purchases – using Datawallet. You and the purchaser each have a public key and a private key. Public keys are used to encrypt a message, to scramble it so that it looks like gibberish to everyone except the holder of the matching private key, who can use it to decrypt (unscramble) the message.
In order to exchange your private data securely, you encrypt it with the purchaser's public key and send the encrypted data to them. They take the data and decrypt it with their private key. If someone in the middle intercepts the data, all they obtain is an unreadable mess.
Hashes
In Datawallet's design, the data exchange itself happens off-chain, since the contents are both too large and too sensitive to broadcast to the central ledger (for Datawallet and most other projects, this ledger is the ethereum blockchain). What does go on the blockchain are hashes of the data. You hash the data you're selling and post the result to the chain, and the buyer hashes the data they receive and posts that result to the chain. If the hashes match, a payment held in escrow is released. (See also, Bitcoin vs. Ethereum: Driven by Different Purposes.)

What are hashes and what do they accomplish? They are cryptographic functions that enable quick verification that two sets of data are identical.
They do this by distilling data down to a manageable chunk. No matter how short or long the text you run through SHA256, the hash function used by bitcoin, you will get 64 characters back. Here's the hash of the first scene of Hamlet, for example:
91BBAB0B8C574E4071B6AB0458CB891BD01392D58CB7A6D43918DA95E30DC04D
Now if you have the text of Hamlet, you can instantly check that what you've received has not been tampered with – no need to pore over every jot and tittle. Simply hash your text and compare it to the hash of the sender's ostensibly identical text. (This works just as well for web browser data or Amazon purchase histories.)
The process is instant because hash functions are so finicky. Delete the exclamation point in the scene's first line, and that one change yields an unrecognizably different hash:
80DA6F89DDB7BD67BE5D30AE5EA6D74949C55719354D38D97C64DE5FE914029C
This sensitivity to tampering makes hashing central to bitcoin, ethereum and their peers. Thousands of identical copies of a blockchain can be efficiently maintained because they're compared using hashes, rather than through meticulous scans of every block.

Hashes are also useful because data cannot be unhashed. No one using any known technology can take 91BBAB0… and wring Shakespeare back out of it. That makes it relatively safe to broadcast a hash of sensitive information to the blockchain, as Datawallet does.
Blockchain and smart contracts
Even though the data exchange itself does not happen on-chain, the ledger is crucial to decentralized data transfer. Blockchains are immutable public records that remove all doubt about what was traded, at what price, and when. The hashes broadcast to the blockchain either match or they don't, so buyers can't claim they didn't receive data that in fact they did. Nor does anybody need to wonder whether a hacker or spy tampered with the data en route.
Without the need for someone to mediate the exchange, brokers lose their raison d'être. They are replaced by a large number of (ideally) dispersed, competing and mutually distrustful "miners" who post exchanges to the ledger.


Miners also obviate the need for a bank: blockchain technology's main application has always been as a distributed money transfer platform. Finally, ethereum offers the ability to enforce complex contracts via this same distributed network of miners. You may have paused at the reference to a "payment held in escrow" above. Who's holding the money while it straddles buyer and seller?
No one, it turns out. Ethereum took bitcoin's decentralized money and made it programmable via smart contracts: self-executing bits of code that live on the blockchain. If all the hashes match and all other pre-agreed conditions are met, the money automatically moves from the buyer's account to the seller's. No need for a trusted custodian in the middle.

Blockchain has problems aplenty

As promising as this technology sounds, not all the kinks have been worked out. Some may never be. Start with scalability.
Expensive and slow
Blockchains are fat, lumbering beasts. Distributed consensus is slow and costly compared to the centralized networks that are currently in operation, so how can blockchain technology compete in a user data market that – for all its shadiness – at least works at scale?
Datawallet sidesteps the problem by transferring data off-chain. On-chain transfers of data troves that might include videos and other large files "would immediately crash the ethereum blockchain," says Engel. And in any case, no one would want that kind of data broadcast to a public ledger.
Medicalchain leaves health records where they are, on regulatory-compliant servers in the patient's home jurisdiction. It simply provides a platform for patients to grant doctors access to their records.

Some projects are trying to scale by tweaking the way distributed networks are structured. Streamr combines a "reputation mechanism" called karma with its blockchain-based token, DATAcoin, to divvy up the work. "We need to assign asymmetric responsibilities to different nodes," says Pihkala, "otherwise we end up with a situation that is typical with current-day blockchains, which is all the data goes to all the nodes, leading to no scalability." The nodes put down a DATAcoin stake, which they lose if they break the rules. Karma, meanwhile, assigns greater responsibility to the most reliable nodes, increasing efficiency without sacrificing too much in the way of decentralization.
Kochava is taking similar ideas and applying them to a blockchain it is building in-house to reduce opacity and fraud in digital advertising. XCHNG, as the platform is called, uses a reputation mechanism and a brutal form of pruning – in which most nodes only hold onto a day's worth of ledger history – to process the huge volume of transactions digital ad delivery demands. Kochava founder and CEO Charles Manning believes the platform could deliver millions of transactions per second. Ethereum can manage around 15 or so, bitcoin far fewer.

Where do you store it?
Every blockchain application faces problems with storage. Monetary transactions and smart contracts might be utterly decentralized, but the data itself either resides in centralized servers, à la Medicalchain (this is largely for regulatory reasons, to be fair), or on users' own storage-constrained devices, à la Datawallet.
A number of projects are trying to enable decentralized storage, including IPFS, BigchainDB and Storj. Engel, Pihkala, and Janett Liriano, the CEO of Loomia, each mention plans to integrate their platforms with one or the other of these companies.
You still give up your data
At some point, the quest to establish ownership over your personal data hits a wall. You can encrypt it. You can transfer it directly, shunning intermediaries and keeping it encrypted en route. You can ensure that the buyer pays the agreed amount upon receipt.
But no technological legerdemain can overcome the fact that once the buyer has your data, as Guy Zyskind (co-founder and CEO of Enigma) puts it, "You're done. They can take your data, they can copy it, they can go off-chain and then that's it." Rogue employees, incompetent defenses against hacking, reselling – the unpleasant possibilities abound.

Yet rather incredibly, Zyskind says you can make your data available to use without actually revealing it. Through a technique called secure multiparty computation, his company is building a platform that enables data not only to be stored in an encrypted, distributed form, but to be computed over while still in that encrypted, distributed form.
With an IPFS, a Storj or a BigchainDB, it's possible to keep your data secure and to decentralize it across multiple devices. But if you want to do anything with that data – run it through an algorithm or edit it – you have to decrypt and re-centralize it. In order for a credit rating agency to calculate your creditworthiness, say, they need full access and visibility.
With Enigma, these calculations can be performed without any Equifaxes ever being able to see your decrypted financial data. They would not even have access to the full set of encrypted data: it would be split across multiple nodes in the network.
Building on this ability, Enigma is working on "secret contracts," smart contracts that obscure their terms and participants. Enigma plans to begin with ethereum, but ultimately, Zyskind says, "we want to be able to augment basically every blockchain with the privacy our technology brings."

Beating the brokers
As much promise as these projects have, none has any ability to prevent a data broker from hoovering up your personal information. They can only try to outcompete the incumbents, offering a better product in the eyes of the data's ultimate buyers. So do they have a chance?
Engel is confident that users with full control over their data will readily shunt data brokers out of the market because, for all the "creepiness" these firms employ  in collecting data, they just aren't that good at it. "The data points that are actually set to publicly available, which can then be scraped by a broker, that only amounts to roughly 10% of the data a user creates," he says. "The rich information, such as likes, posts, check-ins, whatever it is, that's off bounds."

Nor is it easy to accurately assign data from different sources to specific individuals. Matching cookies based on device IDs has a success rate of 2.9%, says Engel, so "even if you do have data on your customers in the form of cookies, you will still be wasting 97.1% of your ad budget on people who are not really interested in your product." The industry has only "highly probabilistic and highly experimental" techniques to derive real information about a consumer's interests from data that has a tiny chance of actually being theirs.
When a consumer can simply sell their data, there is no doubt about which bytes belong to whom, and the resulting picture can be incredibly rich: not a website visit tentatively paired with a Facebook like, but an actual, "completely deterministic" web of purchases, browsing patterns and social media activity.
If you're an advertiser, which do you choose?
What about the platforms?
Still, we're ignoring the five elephants in the room. Facebook Inc. (FB
), Amazon.com Inc. (AMZN
AMZN
Amazon.com Inc
1,527.84
+1.60%
Created with Highstock 4.2.6
), Alphabet Inc. (GOOG
GOOG
Alphabet Inc
1,072.08
-0.19%
Created with Highstock 4.2.6
, GOOGL
GOOGL
Alphabet Inc
1,075.39
-0.37%
Created with Highstock 4.2.6
), Apple Inc. (AAPL
AAPL
Apple Inc
177.84
-0.22%
Created with Highstock 4.2.6
) and Netflix Inc. (NFLX
NFLX
Netflix Inc
334.52
-0.46%
Created with Highstock 4.2.6
) are just as interested in your user data as the brokers are. They also control the platforms where you produce it. All of this talk of owning your Facebook likes, Amazon purchases and Google searches glides over the fact that the firms themselves obviously own that data already.
Pihkala emphasizes the potential of a universal, decentralized data marketplace – an "eBay for data streams" – to wreck this model. In other words, to beat the platforms at their own game. "Currently the data in the world is typically in silos or held by giant corporations," he says. "It's being underutilized."
Perhaps, but the threat blockchain and other cryptographic techniques pose to data brokers is much clearer and more immediate than the threat they pose to the platforms.
Then again, Liriano reveals a surprising fact about the smart textile industry. Loomia is building an app that allows consumers to transfer data from sensors in the firm's smart textile product, called Tile, to clothing companies. She expected to encounter fierce resistance when she explained to clothing manufacturers such as LLBean that they were "not going to own all the data." But as it turned out, "they completely understood."
Owning all of that user data would be expensive, the firms reasoned. It would pose a security risk, and it would irritate consumers. More interesting, though, is that they told Liriano, "I want my competitor's information anyway. How useful is it to me, if you can just build it out for me, if I don't know what this competitor's doing? The benefit of this thing is that eventually everybody will be on it, right?"
(Liriano also makes the rare observation that users may not want to sell their data. Loomia's platform would allow data generated by Tile to remain out of reach. Don't count on that idea spreading.)
The Facebooks and Googles of the world clearly don't share apparel companies' reticence to own user data. But the platforms could potentially be tempted by the potential to gain insights from each other's data. None of them really has direct competitors, but Amazon could certainly find Google's data useful, Google Facebook's, Facebook Netflix's, and so on. Perhaps, in a world of data marketplaces and disintermediated data exchange, the platforms could be persuaded that it's in everyone's interest to let users control their own data. Perhaps the state will help with the persuading.
Hard to say. In the near term, at least, the middlemen and data brokers look vulnerable. When former Equifax CEO Richard Smith testified before the House of Representatives in October, he was asked by Rep. Doris Matsui (D-Calif.), "Do I own my data?" Smith didn't have a satisfactory response. Thanks to blockchain and other cryptographic technologies, the answer might be clear in the near future: constant, creepy data scraping and the occasional catastrophic breach could then be a distant memory.

0 comments:

Post a Comment