A Blog by Jonathan Low


Mar 25, 2016

Creating a Marketplace for Smartphone Sensor Data

It is worth noting that the development of information systems based on mobile data has been hindered, as the following article explains, by concerns about the ethics of doing so. This may well be a first for such a widely used, available and potentially beneficial (to say nothing of profitable) source of data.

Assuming that those concerns can be addressed to the satisfaction of most, if not all, users, the question of how to apply that wisdom becomes even more intriguing. JL

Steve Lohr reports in the New York Times:

The promising future of mobile data is that it can point the way to new insights in health care, transportation, urban planning and the social sciences.The development of software models using mobile sensor data has been hindered because of the absence of labeled data sets. And the reason for that is the lack of a community effort to do that in an ethical, efficient way.
Words and pictures, culled from across the web, have been the digital grist for remarkable gains in computing tasks like image recognition and speech translation. But another huge data resource — sensor data from smartphones — lags behind as a fuel source for major research advances.
CrowdSignals, a new initiative to collect, label and pay for mobile sensor data, seeks to overcome that shortfall. The organization’s approach relies less on innovation in technology than on economics and creating a marketplace for sensor data.
CrowdSignals is taking a first step with a crowdfunding campaign seeking modest contributions, typically $100 or less, from university and corporate researchers. The money is to be used to pay participants in research studies who agree to contribute the sensor data on their smartphones.
“We haven’t even begun to tap the potential of smartphone data,” said Evan Welbourne, a computer scientist who heads the CrowdSignals project.
The promising future of mobile data, experts agree, is that it can point the way to new insights in health care, transportation, urban planning and the social sciences.
But such mobile data is typically costly and time-consuming to collect for researchers, and it is difficult to interpret.
The big improvement in fields like vision recognition have come from vast data sources and steadily improving software techniques in a branch of artificial intelligence called machine learning.
Yet that high-tech wizardry relies on human help. The software is trained on sets of data that have been labeled by humans — identifying, for example, the pictures of cats, dogs and horses.
Building human-labeled data sets is straightforward with pictures and words. Humans can identify images quickly and accurately.
The labeling challenge is trickier with mobile sensor data. For example, a person contributing data has to log in when walking, running or biking. If a researcher wants to correlate mood with activity, the contributor might have to answer a question or two on a touch screen.
Data from a personal device — a smartphone or smartwatch — is also sensitive, raising issues of data use and privacy, which complicates the collection of sensor data.
The development of software models using mobile sensor data has been hindered, said Deborah Estrin, a professor of computer science at Cornell Tech, because of “the absence of labeled data sets. And the reason for that is the lack of a community effort to do that in an ethical, efficient way.”
Ms. Estrin is one of several computer scientists from universities around the world who are advisers to CrowdSignals, as are researchers at companies like Microsoft and Intel.
Most research using mobile data has tracked relatively small numbers of people in a single city or region, often using proprietary data from mobile carriers or device makers. CrowdSignals is intended to overcome those limitations, said Daniel Gatica-Perez, a researcher at Idiap Research Institute in Switzerland, who is an adviser on the project.
Both Apple and Samsung have introduced software tools for collecting and sharing mobile data. Samsung’s SAMI is a data exchange platform where developers can place their data and work with others. Apple’s Research Kit allows iPhone users to share their data for use in research projects, often health-related studies.
CrowdSignals is trying to build a marketplace for data with financial incentives for people to participate and label their data. All the data will be anonymized, the organization says.
CrowdSignals’ long-range goal is to have thousands of participants in studies, but it is starting with a pilot project. It hopes to raise $15,000 in the initial crowdfunding round
Mr. Welbourne said, and then recruit 30 to 50 participants whose smartphone data will be collected for a month.
In the pilot, the CrowdSignals software can handle data only from people whose smartphones use Google’s Android operating system. Later, Mr. Welbourne hopes to accommodate users of Apple iPhones as well.
The participants, he said, will be paid a flat rate of about $100 for the month. But the compensation can go up or down, depending on how much personal data they choose to share. In addition, they will receive incentive payments of up to a few dollars for labeling their data.
Much of crowdsourced funds, Mr. Welbourne said, will go toward selecting and educating participants. The outreach will be by websites and follow-up emails. CrowdSignals will also have videos and individual Skype calls to explain the project. Even the pilot, Mr. Welbourne said, will try to enlist a diverse group by gender, age, geographic location and ethnicity.
University and corporate researchers who are crowdfunders will get first access to the data set: Academics will pay a $20 license fee and corporate researchers will pay $50. The licenses last five years, but after one year the data set will be free for academic use.
A former researcher at Amazon, Samsung and Nokia, Mr. Welbourne plans to build a business by creating software models for specific industries like health care and mobile device makers, based on the same CrowdSignals data available to all licensees. His Seattle start-up, AlgoSnap, was founded last year.
Mr. Welbourne’s business plan is a bet that the CrowdSignals pilot succeeds. If it does, he intends to spin off CrowdSignals as a nonprofit entity. “It’s a little more appropriate for the data collection and research,” he said.
A challenge for the CrowdSignals, assuming it takes off, experts say, will be fostering trust among large numbers of people beyond the early participants.
“Navigating the privacy issue may be an uphill battle,” said Jason I. Hong, a computer scientist at Carnegie Mellon University and an adviser. “But this is a really clever idea over all. If it works, we’ll have larger, richer data sets for researchers in universities and corporations.”


Post a Comment