A Blog by Jonathan Low


Nov 14, 2016

How NASA Is Harnessing Graph Databases To Learn From Past Projects

For any enterprise, even those with far more mundane missions - and far less entailed risk - the goal should not just be the storage of data, but the ability to retrieve it in ways that enhance the goals of identifying and eliminating similar problems in the future while optimizing potential outcomes. JL

Steven Melendez reports in Fast Company:

(NASA is) storing information in a graph database—optimized to store information in terms of data records and the connections between them. Such network graphs have become a familiar feature of online social networks. Individual lesson write-ups (are) nodes in the network, associated by a machine learning algorithm. It’s used by e-commerce companies to generate automated product recommendations based on relationships between users and products.
NASA famously maintains a "lessons learned" database containing valuable information from its past programs and projects. But the vast system, which has been online since 1994, is not always easy to navigate. Now the agency is modernizing it with help from a tool more familiar to social media than space missions: graph databases.
The genesis of the change began about a year and a half ago when an engineer, attempting to search "lessons learned" for relevant documents, found the number of possible results overwhelming. "He was getting things that really were not relevant to what he was looking for," David Meza, NASA’s chief knowledge architect, recalls.
Looking to make the database more useful, and help users investigate relationships beyond what basic keyword searches could uncover, Meza experimented with storing the information in a graph database—that is, a database optimized to store information in terms of data records and the connections between them. In recent years, such network graphs have become a familiar feature of online social networks.
The individual lesson write-ups themselves were nodes in the network, as were topics to which the lessons were associated by a machine learning algorithm. And to store and organize that data, Meza turned to Neo4j, a database system that’s specifically designed to store graph data more efficiently than traditional, SQL-powered relational databases. "We frequently have customers telling us that we’re a thousand times faster or a million times faster than a relational database," says Emil Eifrem, CEO of Neo Technology, the San Mateo, California, company behind Neo4j.
The tool was also notably used by the International Consortium of Investigative Journalists to map connections between people and companies identified in the massive leaked collection of offshore finance data dubbed the Panama Papers. And, says Eifrem, it’s frequently used by e-commerce companies looking to generate automated product recommendations based on relationships between users and products, and by financial institutions looking to identify suspicious sets of transactions—even in cases where the individual transactions aren’t independently off-looking.
"A fraud ring is all about relationships," says Eifrem.
And at NASA, Neo4j and a graph visualization tool called Linkurious helped Meza’s team build an interface to explore the databases of lessons, finding documents relating to particular topics and even uncovering connections between disparate subjects. In one case, Meza says, he came across a seemingly puzzlingly strong connection between lessons relating to contaminated fluid valves and those dealing with battery fire risks.
"I couldn’t figure out how valve contamination was actually correlated to fire hazards within batteries," he says. "I realized the topic that talked about battery hazards and fires, there were issues where lead leaked out of the batteries and contaminated the water."
That could let an engineer researching valve contamination issues discover potentially relevant documents about battery issues that might not have have popped up on a keyword search.
Meza says he’s now looking at analyzing how the lessons are clustered by time and geographical location, which might help uncover trends in what sorts of issues are being reported or situations where particular NASA sites are reporting more problems of a certain type.
He’s also looking at using Neo4j to store relationships between other types of documents, particularly when one document cites another. As authoritative documents like policy directives change over time, it can take some time for those changes to propagate through chains of documents citing one another, he says.
"With a graph database, I can see being able to find out really quickly which documents could be affected," Meza says.
The tool might also be able to help track how particular NASA research projects influence other research or industrial developments, even indirect cases where a product is influenced by another invention, itself influenced by NASA research, he says.


Post a Comment