Modeling a multilevel index in neoj4


Hi all,

Today, for my lab project, I decided to model an in-graph index in Neo4j and query it with the Cypher Query Language.

The basic problem we try to solve here is the ordering of events in a timeline and asking for ranges of events ordered in time without needing to load the whole timeline, or let an external index like Lucene doing the sorting (which is very costly). So, a simple approach to do this is a multilevel tree, where you attach the domain nodes to the leafs of the index tree and query by traversing through that structure.



Now, to ask for all Events between 2011-01-01 and 2011-01-03 you simply find the starting and ending path (in this case they share the upper part of the tree) for these levels in the index, and then collect the Events hanging off the Day-nodes ordered via the NEXT relationships, following the VALUE relationships, if they exist.


All these five segments of the query structure can be expressed in one single Cypher query:

START root=node:node_auto_index(name = ‘Root’)
MATCH 
  commonPath=root-[:`2011`]->()-[:`01`]->commonRootEnd,
  startPath=commonRootEnd-[:`01`]->startLeaf,
  endPath=commonRootEnd-[:`03`]->endLeaf,
  valuePath=startLeaf-[:NEXT*0..]->middle-[:NEXT*0..]->endLeaf,
  values=middle-[:VALUE]->event
RETURN event.name
ORDER BY event.name ASC

Returning Event2 and Event3. This may seem surprising at first, since we’ve asked for the middle events, but notice that variable length path [:NEXT*0..] includes length 0 and has no upper limit. Because the startLeaf and endLeaf are bound through the previous path definitions, they will be the boundaries of the range.  

Some more examples on this data structure are available as part of the Neo4j Manual in the Cypher Cookbook section.

Happy hacking!

/peter





Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.

Download My Ebook