Tuesday, June 28, 2011

Neo4j 1.4 M05 - "Kiruna Stol" Midsummer Celebration

Update: Possible corruption issue found
During testing for our forthcoming GA release, we discovered a possible corruption bug in this milestone. Although it's fine to use to learn the new APIs, it's not recommended for production use. If you really need to go into production on a milestone release, the 1.4 M04 milestone is preferred.

Midsummer madness!

Extending the festive atmosphere of Midsummer here in Sweden (though sadly not the copious amounts of beer and strawberries), we’re releasing the final milestone build of Neo4j 1.4. The celebration includes: Auto Indexing, neat new features to the REST API, even cooler Cypher query language features, and a bunch of performance improvements. We’ve also rid ourselves of the 3rd-party service wrapper code (yay!) that caused us and our fellow (mostly Mac) users in the community so much anguish!

Refined Auto Indexing

Having a database manage indexes on behalf of users can be awfully convenient. Recently we announced an early access version of our auto indexing framework that provided a rudimentary way to have indexes managed by convention, rather than explicitly. In this milestone release we’ve incorporated your feedback to provide a much more polished user experience.
The auto indexing feature requires some configuration options during startup of your GraphDatabaseService, depending on the functionality you want to enable. There are two AutoIndexers, one for nodes and one for relationships, enabled and configured independently. If you want to AutoIndex all node properties called “name” and all relationship properties called “since” the configuration is:

Map<String, String> config = new HashMap<String, String>();
config.put(Config.NODE_AUTO_INDEXING, “true”);
config.put(Config.NODE_KEYS_INDEXABLE, “name”);
config.put(Config.RELATIONSHIP_AUTO_INDEXING, “true”);config.put(Config.RELATIONSHIP_KEYS_INDEXABLE, “since”);

Now, when a transaction finishes successfully, all primitives are checked for changes on those properties and are added in a dedicated index.
To use that index, simply...
ReadOnlyIndex nodeAutoIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
...and then use nodeAutoIndex as a read-only Index<Node>.
The AutoIndexer interface is the primary API through which you can control the auto indexing functionality at runtime. For more details, see the javadocs for the available methods and the manual for comprehensive examples and more detail.

Evolving Cypher

The Cypher language query language goes from strength to strength. In this release we now have a powerful (read-only) graph-matching language which can support some pretty sophisticated queries. Here at the Neo4j HQ we’ve had a blast exploring the cineasts.net dataset through Cypher, like so:
START user=(User,login,'micha')
MATCH (user)-[:FRIEND]-(friend)-[r,:RATED]->(movie)
RETURN movie.title, AVG(r.stars), COUNT(*)
ORDER BY AVG(r.stars) DESC, COUNT(*) DESC limit 7
==> +-----------------------------------------------------+
==> | movie.title | avg(r.stars) | count(*) |
==> +-----------------------------------------------------+
==> | Forrest Gump | 5.0 | 2 |
==> | The Matrix Reloaded | 5.0 | 1 |
==> | The Simpsons Movie | 5.0 | 1 |
==> | Terminator Salvation | 5.0 | 1 |
==> | The Matrix | 4.8 | 5 |
==> | Meet the Fockers | 4.0 | 1 |
==> | Madagascar: Escape 2 Africa | 4.0 | 1 |
==> +-----------------------------------------------------+
==> | 7 rows, 9 ms |
==> +-----------------------------------------------------+
The logical operators syntax has been improved and aggregate functions have been built out to enable sorting/slicing/limiting, making Cypher a powerful tool for querying graph data. Want to see it in action? Our Michael Hunger has produced a lovely screencast introduction to Cypher. We'll continue to evolve Cypher, complementing our continued support for Gremlin and an open embrace of language research and development.

Paged Traversers in the REST API

In addition to the batch interface that we released in the previous milestone, the REST API now supports paging traversal results so that clients can iterate through large result sets on the server side, rather than pulling a single, potentially large result set onto the client for processing.
Node representations now contain a URI for creating a paged traverser. The API is similar to the existing REST traverser API, except that the client is able to specify a page size and a lease timeout (since paged traversers are expired over time to recover server-side resources).
This provides much finer granularity of control for REST clients and also has benefits for server performance (particularly since massive response representations don’t have to be created on the heap). If this approach works well, we might generalize it to graph algorithms and plugins in future releases.

Performance improvements

A constant theme (and dev team work package) is to make the database engine smaller, faster, and better. To that end we’ve been busy at work under the hood to squeeze out more performance for the 1.4 release. Highlights compared to Neo4j 1.3 include:
  • 25% smaller memory footprint so that even more objects can be crammed into the valuable heap space caches.
  • Getting typed relationships using node.getRelationships(TYPE,INCOMING/OUTGOING) and getting directed relationships with node.getRelationships(OUTGOING/INCOMING) are around 4x faster due to a new cache implementation.
  • Small transactions (e.g. creating a single node) are now also around 4x faster.
  • Traversing performance is around 50% better than previously measured.
In addition to these, we have measured a number of other modest double-digit improvements in things like adding to indexes, retrieving uncached relationships, and getting relationships from heavily connected nodes, all of which add up to a generally snappier experience.

What’s next?

This is our final milestone build and we’re feature-frozen for the 1.4 release. In the coming days we’re asking for your feedback (and bug reports) while we perform deep QA on the stack to make sure our 1.4 GA is flawless.

Download, feedback

As always, the download is available on the Neo4j Web site, and the individual components are available on Maven central.
Happy hacking and give us your feedback!

4 comments:

Anonymous said...

auto indexing.... i love you guys... i really do :)
i'm starting a project in these days using neo4j (great product, congrats!) and now i'm managing them manually..as other millions of other things (find_or_create node and relationships, create_or_update node and relationships, nodes merging and other stuff...), now i've just to find a way to optimizing it (A-->(3000 nodes)-->B takes 4 seconds for a traversal on max depth = 3.. quite to much as there will be hundreds per second :( )

Andreas Kollegger said...

We'd be happy to help you improve that traversal. I've created a discussion on our help site with your text. Please follow up there with some more details about your domain model and the traversal you're running.

Cheers,
Andreas

Anonymous said...

Hi Andreas, cool, I'll update you there with all the various info :)

One more thing it would be very useful to have is a way to increase/decrease a numeric value with a single call. Example, you have a property on a relationship like the number total checkins that an user (as node) has been made on a place (as the other node), because you don't actually care if they are 10, 100 or 1000, you just need the value in order to show it (so it doesn't make sense put the checkin as an external node with the various relationships, it would probably only add complexity).

Now lets say that a user do a checking in a place, you first need to find the node of the user and the one of the place, then check if there is already a relationships between them and if there is get the relationship properties and update back the checkin_count increased by 1, otherwise create the relationship.

It would be nice to make the updating with a single call like:
set_relationship_properties(rel, {"checkin_count" => ":check_in_count+1"})

(i used :check_in_count, but it could be anything that neo4j can understand and change it automatically)

That way it would avoid that in the meantime that the property and the update is done there are other calls that update it at the same time. I read around the docs, but I didn't see a way for this

Markus Gatol said...

Great stuff, thanks to the entire neo team and all participants for their effort!

One thing though: does anybody know about when the new Python driver will be released?