Friday, January 17, 2014

Importing data to Neo4j the spreadsheet way in Neo4j 2.0!




Hi all graphistas out there,


And happy new year! I hope you had an excellent start, let's keep this year rocking with a spirit of graph-love! Our Rik Van Bruggen did a lovely blog post on how to import data into Neo4j using spreadsheets in March last year.  Simple and easy to understand but only for Neo4j version 1.9.3. Now  it’s a new year and in December we launched a shiny new version of Neo4j, the 2.0.0 release! Baadadadaam! So, I thought better provide an update to his blogpost, with the spirit of his work. (Thank you Rik!)



You can still use the Neo4j CSV batch-importer (Now for 2.0.0) from Michael Hunger, or look at other Data Import Options.






If you simple want to use Cypher, Rik’s way is much easier. That’s why I have updated Riks Cypher statements old statements in a new spreadsheet that shows how how to import to Neo4j 2.0.0.


How does it work?



Open the spreadhsheet.






The sheet is composed of two parts:


  • columns A, B and C: these contain the data for the Nodes of our graph, using a custom “id”, a “name”, and a “gender” as properties.


  • columns F, G and H: these contain the data for the Relationships of our graph, having a “from-id” (where the relationship starts), a “to-id” (where the relationship ends), and a “relationship type”. Columns F and G reference the nodes and their id’s in column A.




And then comes the secret sauce: how to create Cypher statements from this nodes and relationship information.


For this we use very simple statements that leverage the columns mentioned above, the cypher syntax and string concatenation. Look at the columns D and I:




Nodes


We just use this formula to create the cypher statement.
="MERGE (meetup:Event {id:'"&A3&"', name:'"&B3&"'})”


(instead of create we will use merge who is a new feature in 2.0.0 it will create if the node not exist otherwise it will not create a new node. You can read more about it here in the Neo4j Manual.

Output for row 3:



 MERGE (meetup:Event{id:’153602002', name:’Meetup Malmö'})


If we check the next row, we will see a change, since we know that all attendees of the meetup will attend our meetup, we can create the whole relationship too. So we combine the creation of the “Person” Node with connecting it to the meetup node we just created.


="MERGE (_"&A4&":Person {id:'"&A4&"', name:'"&B4&"', gender:'"&C4&"'})
-[:ATTENDS]->(meetup)"

Output for row 4:


MERGE (_2:Person {id:'2', name:'Donald Duck', gender:'man'})-[:ATTENDS]->(meetup)

As you can see, it takes that id, name and gender properties from columns A, B and C, and puts these into a “MERGE” Cypher statement.

Relationships



Originally Rik used the (now legacy) Neo4j-AutoIndex to look up nodes to connect. We can use a schema index and do the same with MATCH.


In this particilar dataset we don't have to create a index from our labels and the nodes properties, but since I can do it I will show you. 



create index on :Person(id)
When you create a index you use the labels and the property in the node that you want to index. 
Time to create some more relationships, let’s look at the Cypher statements to create them.





="WITH 1 as dummy MATCH (p1:Person {id:'"&E4&"'}), (p2:Person {id:'"&F4&"'})
MERGE (p1)-[:"&G4&"]->(p2);"

The reason why we are using WITH 1 as dummy is that it's for the single statement for the neo4j-browser where all the match merge follow each other with no separation in a single big query.


Output for row 2:


MATCH (p1:Person {id:'2'}), (p2:Person {id:'5'}) MERGE (p1)-[:WORKS_WITH]->(p2);


This one is a little bit more complicated, as it uses Neo4j’s MATCH statement in order to create the relationship. We first have to look up start node and end node using the “id” property. And then the merge-statement creates the relationship based on the relationship-type in column G.






Then we copy each of the formulas down across all the rows we want to cover.
Having done this, we end up with two columns each containing a number of cypher statements. So then what?









The Instructions Sheet


In the first sheet of the spreadsheet, you will find a bunch of instructions. Basically, you need to go through the following steps:
  • download and unzip Neo4j server.
  • copy/paste the Cypher statements from the top part the Import Sheet into a text file or the browser window directly
  • All these statements form a single large Cypher statement as the browser can currently only execute single cypher statements
  • drag the file into the browser input area and then execute it


  • If you want to use the Neo4j-Shell for importing larger amounts of data use the approach shown in the second tab titled: “For the shell”
  • It uses one cypher statement (terminated with semi-colons) per line
  • and a begin / commit block around the statements to speed up the import with a single transaction


  • paste all statements into a file and use bin/neo4j-shell -file import.txt or copy and paste direct in the browser








And there we go: the dataset gets created, and Neo4j is ready for use. I hope this little overview was useful for you - it sure was useful for me when getting my hands dirty for the first time :) …


Note: Make sure you have the newest java running on your device. You can download it here.


(I did that mistake)


Time to DIY! Good luck!


Cheers,

Pernilla




Monday, January 6, 2014

The Winter GraphGist Challenge

Happy New Year!


We’re happy to announce that we’ve extended December's GraphGist Challenge until January 31, 2014!


This gives you a few more weeks to submit or improve your entries to maximize your chances to WIN a $300 Amazon.com gift card and more prizes for any of the 10 categories. You can win in multiple categories, so feel free to create as many submissions as you like. All entrants will win a Neo4j t-shirt for participating in the challenge.


GraphGists are an easy and fun way to develop an awesome graph data model for your apps, document the actual use-cases as Cypher queries and present the resulting data in a variety of ways.


We want to encourage you to model a realistic graph in one of the domains listed below and present it interactively with the appropriate use-cases in a GraphGist.


The main goal of this challenge is to provide a large variety of graph applications that can be used by anyone to start thinking in graphs for their domain. The GraphGist collection will be a resource of ready-made examples to help new graphistas kickstart their graph apps.


After all,
(graphs)-[:ARE]->(everywhere)


This challenge should also encourage you, our creative Neo4j community to hone your graph data modeling skills and bring them to the next level.


To provide you with more chances to win, we are offering prizes in 10 of the following categories (with some example topics):


  • Education
    • Schools, universities, courses, planning, management


  • Finance
    • Loans, risks fraud


  • Life Science
    • Biology, genetics, drug research, medicine, doctors, referrals


  • Manufacturing
    • Production line management, supply chain, parts list, product lines


  • Sports
    • Football (both American and association/soccer), baseball, olympics, public sports, team ranking


  • Resources
    • Energy market, consumption, resource exploration, green energy, climate modeling


  • Retail
    • Recommendations, product categories, price management, seasons, collections


  • Telecommunication
    • Infrastructure, authorization, planning, impact analysis


  • Transport
    • Shipping, logistics, flights, cruises, road/train optimizations, schedules


  • Advanced Graph Gists
    • Free for all, be as creative as you like



Frequently Asked Questions


What is a GraphGist?


A GraphGist is simply a text file with some basic markup for creating interactive graph models just like the official Neo4j online documentation.


The GraphGist text file uses AsciiDoc for its syntax and can be hosted in a version controlled GitHub Gist or any other publicly accessible URL. Just input the URL to your GraphGist text file into field on the upper right at http://gist.neo4j.org/ and then see the result rendered as an interactive page in your browser.


What is the GraphGist creation process?


GraphGists are a easy way to collect your thoughts into an interactive graph model using the power of open source collaboration. It allows you to build a living functional documentation for your apps that you can iterate on, e.g. by using GitHub’s version control. Start by articulating your problem in your domain and then build your demo solution by setting up an example dataset using Neo4j’s Cypher query language.


Ask real questions about your sample dataset and record them in your GraphGist. For example, “What users have access to an application?” or “What are the strongest football teams in a division or conference based on historical game results over the last season?” Then translate your real world questions into Cypher queries and see the results rendered live in your GraphGist webpage!


Finally, publish and share your GraphGist to the Neo4j community and get valuable feedback on optimizing your model and queries.


Why are GraphGists useful for designing a graph data model?


GraphGists give you a fun way to start designing and building your application by creating a living functional specification that also builds your backend data model. It also gives you a way to get valuable feedback on your queries from the Neo4j community.


How do I participate in the challenge?


Develop your GraphGist as described above, then add the GraphGist-URL to the Challenge Wiki page and tweet out your creation using the twitter button on the page. Make sure that the tweet contains the tags #neo4j #graphgist.


Where can I learn more?




What are you waiting for? Get started!