Thursday, August 11, 2016

G-Store: High-Performance Graph Store for Trillion-Edge Processing

Graph has become very hot topic in research community due to its application in social networking, bio-sciences, recommendation systems and in world-wide web etc. The size of such graph is exploding reaching billions of vertices and trillions of edges. A simple search in google will show you a number of graph frameworks that has been proposed recently.  In this blog post we will cover some of the challenges that makes it impossible to process large graphs (billions of nodes and trillions of edges) in a single machine.

A recent supercomputing (SC'16) paper titled  "G-Store: High-Performance Graph Store for Trillion-Edge Processing" takes on these challenges for a trillion edge graph, and proposes a number of techniques such as space-efficient representation of graph data, hardware cache friendly on-disk data-layout and a proactive cache-policy designed specifically to enable a trillion edge graph processing in a single server machine.  On top of these, slide-cache-rewind technique helps to overlap IO, processing and caching analysis. This strategy also makes sure that any data is not discarded without getting analyzed using proactive caching-policy.

Watch out this space for more updates.

Keywords: Trillion edge graph,  external graph engine, semi-external graph processing, extreme graphs.

No comments:

Post a Comment