Friday, January 11, 2013

Batchinsert, Auto-indexing and friend recommendation with Neo4j

This is a guest blog by Amit Gupta, following a discussion on a Neo4j discussion forum.

Get introduced to your friend of friends

There was this interesting problem that I had encountered, where I had to find (in a facebook like graph network) a list of all my first degree friends (friends) who could introduce to me to my second degree friends (friends of friends).

For example, let’s say I have friends named ‘X’ and ‘Y’ and ‘Z’ is their common friend. I’d like to query all my second degree friends (like Z) and have a list of all my first degree friends returned. In this case the output should be [“X”, “Y”]. This list is the people who can introduce me to my friends of friends. To demonstrate the Cypher query which does this, create the example graph setup that you can see embedded in the Neo4j Demo console below.

start joe=node:node_auto_index(name = "Joe") match joe-[:knows]->friend-[:knows]->friend_of_friend where not(joe-[:knows]-friend_of_friend) return collect(,

This query returns:
[Bill, Sara] Ian
[Bill] Derrick
[Sara] Jill

Create the index to back automatic indexing during batch import

 We know that if auto indexing is enabled in, each node that is created will be added to an index named node_auto_index. Now, here’s the cool bit. If we add the original manual index (at the time of batch import) and name it as node_auto_index and enable auto indexing in neo4j, then the batch-inserted nodes will appear as if auto-indexed. And from there on each time you create a node, the node will get indexed as well.

Create nodes and add nodes to index via cypher

My problem was simple. I had batch imported Nodes to my graph db and created an index, and now for the nodes that were newly added to the graph, I wanted them to be added to the index. All this had to accomplished through cypher. Unfortunately, this is currently not supported through cypher. I thought of workaround and I will describe it below.

For example, let’s say you have bunch imported nodes to your graph and the index is called node_auto_index. Also, in the you have declared name as an indexable property. Now, you can do :

create n = {name : 'xyz', ...}, return n;

This will create a node with the name ‘xyz’. To make sure that this node is actually indexed, you can fire this query:

start root= node:node_auto_index('name:xyz') return root;

Viola, you will find that the node that you just created has also been indexed automatically!

/Amit Gupta, @TheObscureInt

No comments: