Tuesday, November 26, 2013

Recap: GraphConnect New York and London 2013

Andreas Kollegger teaching Intro to Neo4j course in NYC
GraphClinician Kenny Bastani diagnosing an attendee's graph problem
Peter Olson presenting how Marvel uses Neo4j to graph the Marvel Universe
Dreams do come true: Jim Webber and Ian Robinson
thoroughly enjoyed having a Tardis at GraphConnect London
Ian Robinson teaching Data Modelling and Import in London
Sebastian Verhueghe of Telenor presenting their graphDB use case in London
Packed room to hear Joe Parry of Cambridge Intelligence present
A big thank you to all of the speakers, helpers and attendees who made GraphConnect New York and London 2013 such a graph-tastic success!

We had a great time hacking and learning with graphistas on both sides of the pond and we're already looking forward to future GraphConnect conferences.

In New York and London, we held trainings on the day before each conference. Attendees chose between our Intro to Neo4j and Data Modeling and Import courses with our graphDB experts: Ian Robinson, Michael Hunger, Kenny Bastani, Peter Neubauer, Alistair Jones, Pernilla Lindh, Chris Leishman, and Andreas Kollegger.

Our GraphClinics were also a hit, as our experts gave one-on-one exclusive graph consulting to our attendees.

In New York, we had over 150 attendees with presentations in three different tracks in addition to our GraphClinic. Talks included Graph Applications for the Enterprise, Data Modeling in Telecoms, Route Finding in TIme Dependent Graphs, The Five Graphs of Finance, Graphs Opening Medical Care Information, Natural Language Search with Neo4j and Analyzing Career Paths with College Miner. To see the complete agenda, click here.

The incredible diversity of use cases and companies presented at GraphConnect New York truly demonstrate how graphs are everywhere!

Peter Olson from Marvel Entertainment presented the keynote after lunch on graphing the Marvel Universe. His talk gave an overview of why graphs are such a powerful conceptual framework for modeling intellectual property and how Marvel uses them to represent the 70 years of fictional content from many different media that makes up the Marvel Universe.

Slides from GraphConnect New York 2013 are posted on our Slideshare account: slideshare.net/neo4j

You can watch videos from GraphConnect New York 2013 here: graphconnect.com/graphconnect-new-york-videos/

London was also quite awesome with almost 200 attendees! We hosted two tracks in the morning and one in the afternoon and our GraphClinic was open all day. In honor of the 50th anniversary of the Dr. Who television series, we even brought in our very own TARDIS, much to the delight of Jim Webber and Ian Robinson (see right).

Talks included Object Graph Mapping with Spring Data Neo4j 3, Neo4j Theory and Practice, Graph Adoption with Gamesys, The Power of Graphs to Analyze Biological Data, Adoption of a Graph Database in the Insurance Sector and In-Flight Asset Management with Neo4j. Click here to check out the entire agenda.

Once again, graphs are everywhere! It was inspiring to hear how so many people were using graph databases for their unique use case.

Videos from GraphConnect London will be posted soon, but slides are available here: slideshare.net/neo4j

Wanna see pictures from the conference? Head over to our Flickr.

Stay tuned for updates on GraphConnect 2014! Or better yet, sign up for our newsletter to be among the first to know the dates and locations for GraphConnect 2014.

Feel free to email us (graphconnect@neotechnology.com) with any questions; we're happy to help!

See you at GraphConnect 2014,
Adam

P.S. Below are some of our favorite tweets from the GraphConnect conferences, but check the @GraphConnect and #GraphConnect pages to see what other people thought of the conference.






Thursday, November 21, 2013

Neo4j 2.0.0-RC1 – Final preparations

WARNING: This release is not compatible with earlier 2.0.0 milestones. See details below.

The next major version of Neo4j has been under development for almost a year now, methodically elaborated and refined into a solid foundation. Neo4j 2.0 is now feature-complete. We're pleased to announce the first Release Candidate build is available today.


With that feature-completeness in mind, let’s see what’s on offer...


Cypher Syntax - finishing touches
Two guiding principles of Cypher language design are readability and internal consistency. The basic syntax should be easy to understand, with little ambiguity about the intent of a query. This release has a number of changes that follow those principles, resulting in much nicer looking syntax, with more clear semantics.


MATCH with properties
Cypher CREATE and MERGE clauses can have patterns with properties in them, but that syntax wasn’t previously supported in the MATCH clause. Now you can include properties in any pattern. This makes simple queries more concise; a query like this:


MATCH (a:Person) WHERE a.name = "Joe" RETURN a


Can now be written like this:


MATCH (a:Person {name:"Joe"}) RETURN a


OPTIONAL MATCH
Sometimes your data isn’t all the same – you are looking for a core pattern, but some matches have additional detail attached. We call this additional detail ‘optional’ because it isn’t required to match the core pattern (a bit like an outer join in SQL).


Previously, we expressed optional details with optional relationships, using the -[?]-> syntax. However, this sometimes proved confusing. To resolve the ambiguity, Stefan Plantikow came up with an excellent solution: separate the concerns.


Everything in a MATCH is now required, so the ? operator has been removed. For optional patterns, use the new OPTIONAL MATCH, which either returns matching data, or null if nothing is found.


For example, you can write:


MATCH (a:Person)
OPTIONAL MATCH (a)-[:SPOUSE]->(b)
RETURN a, b


This will find all Person nodes. If a person has a spouse, Neo4j find them, otherwise b will be null. Lovely!


MERGE for relationships
Cypher has built-in support for ‘get-or-create’: with a single query you can find existing data, or create it if it’s missing. Since this is a common operation for any database, we wanted to make it work very well in Cypher. To get-or-create, you use MERGE for single nodes or MERGE for relationships (but not both at the same time). MERGE for relationships replaces the old CREATE UNIQUE clause.


For example, to get-or-create a relationship between two nodes:


MATCH (a:Person {name: "Joe"}), (b:Person {name: "Steve"})
MERGE (a)-[r:KNOWS]->(b)
RETURN r


Simpler syntax for MERGE ON MATCH and ON CREATE
When use a MERGE clause in your query, there are two possible outcomes: Neo4j will either find all existing matching data, or create entirely new data. The special sub-clauses ON MATCH and ON CREATE allow to you to distinguish between these outcomes.


We’ve simplified the syntax of the ON MATCH and ON CREATE clauses, removing the need to cite an identifier from the related MERGE pattern. Where you used to write:


MERGE (a:Person {name: "Joe"}) ON CREATE a SET a.created = {}


You can now write (dropping the 'a' in ON CREATE):


MERGE (a:Person {name: "Joe"}) ON CREATE SET a.created = {}


NULL != NULL
We’ve changed the way Cypher handles null in important ways: Many expressions now return null for invalid arguments (HEAD([]), Slicing, e.g. [][1..3]). We’ve embraced ternary logic by allowing null to be used as a “maybe” value in expressions with AND, OR, NOT meaning it’s easier to compute predicates when information (like property values) is missing.
For more details, refer to the Neo4j Manual section on working with null.


Caution: manual upgrade between milestones
Data stores created with any previous milestone version can not be used with 2.0.0-RC1 unless a manual upgrade is performed. This is due to incompatible changes made to the store files. Please proceed with caution, backing up your data before attempting to manually upgrade.


Manual upgrade (only from 2.0.0-M06, and after you've backed up):
  1. Cleanly shut down on the old version on Neo4j 2.0.0-M06
    $ bin/neo4j stop
  2. Navigate to the database directory
    $ cd data/graph.db
  3. Delete the label scan store (this is the critical part that has a new format). It will be recreated on startup.
    $ rm -rf schema/label
  4. Start with the new version of Neo4j 2.0.0-RC1
    $ bin/neo4j start


To be clear: DO NOT USE THIS RELEASE WITH EXISTING DATA

Of course, as always you can safely upgrade between GA versions like 1.9.5 and the coming 2.0.0.


Breaking changes
While this release is feature complete, there are some breaking changes since milestone 6.


Breaking changes include:
  • textual status codes, which alter the error response from the transactional endpoint
  • clean-up of deprecated APIs
  • removal of reference node (use labels instead)


For all the details, please refer to the 2.0.0.RC1 changelogs.



Next steps

Now that the Release Candidate is ready, we’d love for you to try it out. Between now and the GA release, we will only be including bug fixes. Give us feedback about any issues you might encounter, reporting problems on our google group and asking questions on Stack Overflow.

Cheers,
Andreas, on behalf of Team Neo


Wednesday, November 13, 2013

Why Graph Databases are the best tool for handling connected data like in Diaspora

Handling connected domains with the “right tool for the job”



Michael Hunger
Sarah Mei recently wrote a great blog post describing the problems she and her colleagues ran into when managing highly connected data using document databases.


Document databases (like other aggregate-oriented databases) do a good job at storing a single representation of an aggregate entity but struggle to handle use-cases that require multiple, different views of the domain. Handling connections between documents tends to be an afterthought that isn’t covered well by the aggregate data model.

Real world use-cases



Sarah described how she worked on a TV show application at Pivotal and discussed the modeling and data management implications that surfaced when the application’s use-case evolved.


The same applied when working on the Diaspora project which started out as a Ruby on Rails application using MongoDB.


For both projects, these requirements caused difficulties with the chosen data model and database which triggered the move to PostgreSQL. A relational database was chosen as it allowed some of the fidelity in the model to return.


Unfortunately this comes at the cost of dealing with queries with a high number of JOINS which can cause performance issues.


Fortunately there is a data model that embraces rich connections between your domain entities: graph databases.


Live Graph data models of Diaspora and the TV-Show



To show how a graph database would handle these use-cases we created two live graph data-models of both the a social network like Diaspora and the TV-Show. For that we set up a small example data set and then represent the use-cases she mentioned as a set of graph search queries with the graph query language Cypher. These GraphGists allow easy modeling discussions and a live exploration of the dataset and use-cases and provide a good starting point for your own (forked) variant of the domain model.


Example graph model - TV Shows



To quickly develop the models we use the typical patterns that we’re looking for in a graph when answering the use-cases described. We call it whiteboard-friendlyness :)



Shows, seasons and episodes
(:TVShow)-[:HAS_SEASON]->(:Season)-[:HAS_EPISODE]->(:Episode)


Characters played by actors featured in a episode
(:Episode)   -[:FEATURED_CHARACTER]->(:Character),
(:Character)<-[:PLAYED_CHARACTER  ]- (:Actor)


Users writing reviews for individual episodes
(:User)-[:WROTE_REVIEW]->(:Review)<-[:HAS_REVIEW]-(:Episode)


Using these basic patterns we can quickly create sample data for the domain and also develop the queries used to solve the use-cases. For example:


Listing all the episodes (filmography) of an actor across episodes and shows


MATCH
(actor:Actor)-[:PLAYED_CHARACTER  ]->(character),
(character) <-[:FEATURED_CHARACTER]- (episode),
(episode)-[*]->(show:TVShow)
WHERE actor.name = "Josh Radnor"
RETURN show.name, episode.name, character.name


Please check it out in more detail in the live graph model.


Example graph model - Social Network




Users, friends, posts


(:User)-[:FRIEND]->(:User)-[:POSTED]->(:Post)


Posts, comments and commenters


(:User)-[:POSTED]->(:Post)<-[:COMMENTED]-(:User)


Users like posts


(:User)-[:LIKED]->(:Post)


Find the posts made by Rachel’s friends


MATCH (u:User)-[:FRIEND]-(f)-[:POSTED]->(post)
WHERE u.name = "Rachel Green"
RETURN f.name AS friend, post.text as content


List people who commented on posts by Rachel’s friends


MATCH (u:User)-[:FRIEND]-(f)-[:POSTED]->(post)<-[:LIKED]-(liker)
WHERE u.name = "Rachel Green"
RETURN f.name AS friend, post.text as content,
      COLLECT(liker.name) as liked_by


Please check it out in more detail in the live graph model.


Graph Databases as a niche technology?



As you can see, it is incredibly easy to model these use-cases with a graph database. So why weren’t they considered? To quote from the article:


But what are the alternatives? Some folks say graph databases are more natural, but I’m not going to cover those here, since graph databases are too niche to be put into production.


This is an interesting observation, as Neo4j is the most widely used graph database and has been running in production setups for more than 10 years now. Neo Technology has more than 100 paying customers (30 of which are Global 2000 companies) and there are tens of thousands of community users that deployed Neo4j as a database backing production applications. The industries of these use-cases span everything from network management, gaming, social, finance, job search to logistics and dating sites.


We can understand why some people may have felt that graph databases were a niche technology in 2010 when Diaspora got started - we actually backed Diaspora on Kickstarter and offered our help at the time - but now the landscape has changed and graph databases are an uncontroversial choice.


Judge for yourself



If you work in a domain with richly connected data, we encourage you to try to model it as a graph and manage it with a graph database. For some more insights of how this works feel free to check out the freely available book “Graph Databases” by O’Reilly.


Also, the offer to support Diaspora still stands! We’re happy to help so please reach out to us if you’re interested. You can also follow the discussion with Sarah on Twitter. Feel free to jump in!

Cheers,

Michael Hunger (@mesirii) 

with help from Kenny Bastani, Mark Needham and Peter Neubauer