Friday, July 15, 2011

Recap: Intro to Graph Databases | Webinar Series #1

Thanks to all of those who attended our Intro to Graph Databases on Wednesday, July 13. We had a great turnout and LOADS of fantastic questions!



Below are answers to all questions posed during the webinar. For any other questions, be sure to refer to our user list. We have an incredible community that answers your questions ridiculously fast, no matter what time zone.
  • Can keys/values be first class objects? -@PatrickDurusau
No, keys are just strings, and values can be primitives and strings and arrays of those. You can easily add methods that convert arbitrary objects into these primitives or graph structures at your domain level. Support for JSON types is planned.
  • What is the best Clojure library for Neo4J? Some of them seem quite old, is that because this is a "solved problem" or nobody cares.... -DJ
see here: http://stackoverflow.com/questions/5680976/is-neo4j-a-good-fit-for-clojure
  • How does CAP theorem apply to neo4j? -MK
Neo4j is not partition tolerant. Automatic, domain agnostic graph sharding is still a problem that we have to solve to scale out. (See also the thesis of Alex Averbuch: http://alexaverbuch.blogspot.com/2010/04/me-my-names-alex-im-currently.html)
  • Is Facebook using neo4j for their social graph? -Anonymous
No, but they should :)
  • Can there be multiple relationships between nodes? -Anonymous
Yes as many as you'd like. For most domains one relationshhip is enough as it can be traversed in both directions.
  • So, can Neo both know and love Trinity? -Anonymous
Yes.
  • What does the 4j in neo4j stand for? -Anonymous
"For Java". As Neo4j is written in Java and provides a native API for the JVM. Other languages can access Neo4j Databases via a RESTful server protocol. There are lots of language bindings for the Neo4j graph database (see: http://wiki.neo4j.org/content/Main_Page#Language_and_framework_bindings)
  • Shouldn't it be "At depth %d => %s\n" in the example? 'np' just a type in the slides, I guess. -DM
That would have only worked on unix, since “\n” is the char code for linefeed, windows uses carriage return and linefeed for line endings. The format string “%n” will expand to the appropriate line ending for the current platform. Check out http://download.oracle.com/javase/6/docs/api/java/util/Formatter.html#syntax in the table called “Conversions” for more details.
  • Is there a performance penalty if there is a large number of nodes linked to a single node or does this not matter? -DM
The number of connections of a node (nodes with many (> 100 000) connections are sometimes called supernodes) matters. Mostly when loading them initially in the cache (cold caches). The Neo4j team is currently working on improving that aspect.
  • What if Trinity was in love with Morpheus? Will Trinity's node be returned because she only has the outgoing "LOVES" relationship? There was no check if the node the relationship points to is the start node. -BP
Yes. Any node in the traversed graph (reachable through KNOWS relationships from the start node) that has an outgoing LOVES relationship would be returned.
  • If possible, can you please take one simple example and explain the difference in representation between connected database and Graph database. -SR??
What is a connected databases?
  • Is the Cypher query language implemented in the programming language like LINQ or is it String-based? -US
Cypher is implemented in Scala using the parser-combinator library of scala. This parser is string based, but renders to a object based expression tree which is then evaluated using graph matching and several filtering and aggregation steps.
  • Are there other Traversers that return a set of Relations rather than a set of Nodes? -US
Yes, by using our more advanced traversal framework. See our documentation about this traversal framework and its JavaDoc API documentation.
  • Seems you have reinvented the hierarchical data model that was used in the late 60's, 70's and early 80's and was then replaced by the relational model. -AG
Good point. Graph databases such as Neo4j have a lot in common with the navigational databases of old. Even back then the navigational databases often outperformed relational databases. There are thus two questions that you ought to ask: why did the relational databases take over the market? and what has changed since then?

The navigational databases back then were a pain to work with. They lacked good abstractions for working with the data, and required trained specialists to work with. Relational databases took over the market because of their structured query language, this meant that any developer could use relational databases.

A lot has changed since that time. The biggest game changer has been the advent of object oriented programming. Objects gives us the abstraction we need for making graph databases accessible as a pattern to work with for anyone. We are of course working on making Neo4j even more user friendly, with efforts such as Cypher. Having the foundation of a clear object model for the graph makes such efforts possible, and make navigational databases interesting again.
  • Is the traversal asynchronous? -NV
Right now traversals are synchronous. And so far they were fast enough. But we've been thinking about providing parallel traversals.
  • How do I get the node that I want to work with from the Graph DB ? I mean how do I get i.e. Mr Andersson? -RE
You can look up the start nodes for your traversal using the integrated indexing framework (e.g. by name) or you can create "category" nodes that aggregate nodes of a certain type, tag or category and link the category nodes to the root node and start your traversal from there.
  • How can you deal with versions of graphs? Or, just versions of key-value properties, for example, if you wanted to keep the date a relationship or a specific property on a node changed while keep the old value (the history, or last version)?
There are several ways of dealing with that, either by cloning parts of the graph or by putting those relationships into array based properties. One of our engineers created a proof of concept for that (https://github.com/dmontag/neo4j-versioning). You could also look into the theory of clojure's persistent data structures for creating such a versioning approach not just for properties but for relationships or whole subgraphs.

Monday, July 11, 2011

Announcing Neo4j 1.4 “Kiruna Stol” GA

Neo4j 1.4 “Kiruna Stol” GA

Over the last three months, we’ve released 6 milestones in our 1.4 series. Today we’re releasing the final Neo4j 1.4 General Availability (GA) package. We’ve seen a whole host of new features going into the product during this time, along with numerous performance and stability improvements. We think this is our best release yet, and we hope you like the direction in which the product is heading.

Cypher Query Language

For some time now Neo4j has supported the Gremlin query language for traversing graphs. To complement that, we’ve introduce a new language called “Cypher” which provides humane, DBA-friendly syntax for graph queries. With Cypher, queries can often be easily expressed, for example this Cypher query finds friends-of-friends concisely:
START user = (people-index, name, "John")
MATCH user-[:FRIEND_OF]->()-[:FRIEND_OF]->fof
RETURN fof
While Cypher can’t yet mutate graphs, it’s a really powerful and interesting way of exploring, and drawing business intelligence of your data.

Automatic Indexing

Neo4j’s index framework is powerful and tightly integrated with the database, but there are times when you’d gladly trade some of that power for a little less effort. And that’s where auto-indexing comes in. With a little configuration the auto-indexing framework will take care of managing the lifecycle of index entries so that they’re consistent with the data in the graph.

The auto-indexing framework allows us to declare that specific properties in nodes and relationships should be indexed. As those properties are created, updated, and removed, so too the corresponding entries in the index change.

To create an auto-index in code:
AutoIndexer nodeAutoIndexer = graphDb.index().getNodeAutoIndexer();
nodeAutoIndexer.startAutoIndexingProperty( "model" );
nodeAutoIndexer.setEnabled( true );
Now any nodes with the property called “model” will appear in the auto-index. This auto-index can be accessed and queried in a similar way to regular indexes:
ReadableIndex autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
And while we’ve only shown node indexes here, of course the same functionality applies to relationships.

Index Improvements

In this release we’ve taken the opportunity to upgrade to Lucene 3.1. This means index operations will be faster thanks to the hard work of the Lucene community. However the transition to Lucene 3.1 indexes means an irreversible change to index stores – so ensure you’re ready to upgrade your indexes before switching to the 1.4 GA release, and keep a backup of your data!

Self Relationships

When modelling domains, entities which refer to themselves are somewhat common. For example, the self-employed are their own bosses. In previous versions of Neo4j we’d model that with two relationships and a (dummy) node. It worked but it was less expressive than we liked.

Our community have lobbied about this, and we’ve listened. So, from this release Neo4j supports relationships whose start and end nodes are the same. So for those lucky people who are their own boss, you can now express subgraphs like:

Albert-BOSS_OF->Albert
And the database will happily persist that structure and make it available for traversals.

Performance Improvements

Down in the engine room our kernel hackers have been busy. Our boffins have implemented new directional caching strategies and tweaked the code path to make small transactions (especially) much more efficient. This all adds up to a substantial performance improvement and a database engine that feels a whole lot snappier.

REST API Improvements

In the Neo4j server, we’ve been pushing the REST API forwards. To that end, we now have batch behaviour exposed so that large bundles of commands can be executed efficiently on the server, thereby paying the network penalty only once.

We’ve also included a paging mechanism for traversers. Any traversals that you execute on the server can return their results in pages rather than as a single large chunk. This means traversers can be terminated early (when enough results have been gathered) and provides more manageable chunks of data to work with, making the server more efficient (as well as taking the pressure off clients).

Finally we’ve made REST indexes on a par with the embedded indexes, and from this release the same arbitrary queries and index management supported in the Java APIs are available to our community of RESTafarians too!

Webadmin Improvements

Our browser-heads have been plugging away, knee-deep in coffeescript and CSS, and the results show. The Webadmin tool that comes with the server is looking better than ever. With this release, there’s a new index manager tab that allows DBAs to fully control indexes on the graph with a friendly point-and-click interface.

We’ve also expanded the set of consoles available. To compliment the existing Gremlin console, we’ve added a Cypher console. This means to get up and running with Cypher it’s as simple as downloading Neo4j, running the server, and opening a Web browser!

For our RESTafarian friends, we’ve also created a neat HTTP console that allows users to create simplified curl-like commands that can be executed against the server. This is extremely useful as a REPL for exploring the REST API and testing extensions and plugins.

New Server Management Scripts

Ever since we first released Neo4j server we’ve used 3rd party libraries and scripts to help users manage the service. Unfortunately both of the 3rd party wrappers we’ve used have been painful for everyone concerned. So late in the 1.4 release cycle we switched out those wrappers and provided much simpler bash (if you’re on a Unix variant) and command scripts (if you’re on Windows) instead - much nicer!

Ready to go!

The 1.4 GA release is now available through our download page, and the components have been pushed out to the Maven central repository. Remember, we always value your feedback over on the community mailing list, but in the meantime we wish you all happy hacking!

Tuesday, July 5, 2011

Neo4j 1.4 M06 “Kiruna Stol”

It’s been just a week since the Neo4j 1.4 M05 release, and though we’re pleased with the way the feature set has evolved, during testing we found a potential corruption bug in that specific milestone.

To address that issue, this week we’re releasing our sixth milestone towards the 1.4 GA release. This milestone is likely to be the last of the series for the 1.4 release, and if the community feedback is positive we will transition into our GA release shortly.

We’ve also managed to squeeze in a few syntax changes in Cypher, to make your queries smaller and easier to read/write.

In the meantime, download it, and give it a workout. Remember that milestone releases are there for early adopters and while this milestone is feature complete and well tested it may still contain surprises. As always, remember to play safe: backup your data and test thoroughly before putting a milestone into production.

-- happy hacking!