Hi all,
Today, for my lab project, I decided to model an in-graph index in Neo4j and query it with the Cypher Query Language.
The basic problem we try to solve here is the ordering of events in a timeline and asking for ranges of events ordered in time without needing to load the whole timeline, or let an external index like Lucene doing the sorting (which is very costly). So, a simple approach to do this is a multilevel tree, where you attach the domain nodes to the leafs of the index tree and query by traversing through that structure.
Now, to ask for all Events between 2011-01-01 and 2011-01-03 you simply find the starting and ending path (in this case they share the upper part of the tree) for these levels in the index, and then collect the Events hanging off the Day-nodes ordered via the NEXT relationships, following the VALUE relationships, if they exist.
All these five segments of the query structure can be expressed in one single Cypher query:
START root=node:node_auto_index(name = 'Root')
MATCH
commonPath=root-[:`2011`]->()-[:`01`]->commonRootEnd,
startPath=commonRootEnd-[:`01`]->startLeaf,
endPath=commonRootEnd-[:`03`]->endLeaf,
valuePath=startLeaf-[:NEXT*0..]->middle-[:NEXT*0..]->endLeaf,
values=middle-[:VALUE]->event
RETURN event.name
ORDER BY event.name ASC
Returning Event2 and Event3. This may seem surprising at first, since we've asked for the middle events, but notice that variable length path [:NEXT*0..] includes length 0 and has no upper limit. Because the startLeaf and endLeaf are bound through the previous path definitions, they will be the boundaries of the range.
Some more examples on this data structure are available as part of the Neo4j Manual in the Cypher Cookbook section.
Happy hacking!
/peter
Neo4j Blog





9 comments:
This is interesting, I like it. I imagine if you wanted to query for the next 20 days, rather than from a date range, you could do this:
START root=node:node_auto_index(name = 'Root')
MATCH
commonPath=root-[:`2011`]->()-[:`01`]->()-[:`01`]->,
valuePath=startLeaf-[:NEXT*0..]->middle-[:NEXT*0..20]->endLeaf,
values=middle-[:VALUE]->event
RETURN event.name
DISTINCT event
# natural order is by event start date
# would there be a good way to order by end date?
Exactly, very nice use of the potential of Cypher IMHO :)
Thanks!
I have a problem for a clinic setting, where there are multiple patients with multiple events per interval (say 1 - 15 minutes), and multiple state changes for an event. Would it be better to have tree for each patient, or a common tree for all ? What impact would the granularity of the time be, down to the minute or second ?
Regards,
Bryan Webb
Thanks!
I have a problem for a clinic setting, where there are multiple patients with multiple events per interval (say 1 - 15 minutes), and multiple state changes for an event. Would it be better to have tree for each patient, or a common tree for all ? What impact would the granularity of the time be, down to the minute or second ?
Regards,
Bryan Webb
@bww00 depends a bit on the amount of data. How big is the total number of events going to be?
@bww00 I think you should just create a testcase and try it out. It sounds like doable with one big tree for all events, but I wouldn't do that without performance testing.
Hello Peter, thank you for this nice example.
We are trying to use this approach to represent a timeline of frames, and a graph on each frame. However, we are using the Neo4j createRelationshipTo method to create the relationships, and we found difficult to create the same structure of this example. The problem comes if you want to give types as `2011` and `01` to the relationship types. As RelationshipTypes are all enums, it is not possible to give such names to the relationships; moreover, we should create a long list of enum types, each type repesenting a year, a month or a day.
Our solution was to give all relationships the same type (NEXT_LEVEL), and add an attribute with the given year, month or day. In this case, the Cypher query would be something like this:
START root=node:node_auto_index(name = 'Root')
MATCH
commonPath=root-[y:`NEXT_LEVEL`]->()-[m:`NEXT_LEVEL`]->commonRootEnd,
startPath=commonRootEnd-[d1:`NEXT_LEVEL`]->startLeaf,
endPath=commonRootEnd-[d2:`NEXT_LEVEL`]->endLeaf,
valuePath=startLeaf-[:NEXT*0..]->middle-[:NEXT*0..]->endLeaf,
values=middle-[:VALUE]->event
WHERE y.year=2011 and m.month=01 and d1.day=01 and d2.day=03
RETURN event.name
ORDER BY event.name ASC
Is it correct or there is a better approach?
Regards,
André
Hi think the key here is to use DynamicRelationshipsType.withName("MONTH_01") which will let you create dynamic relationship types.
Thanks Peter, I was not aware of the availability of the DynamicRelationshipsType class.
Post a Comment