Tuesday, June 28, 2011

Neo4j 1.4 M05 - "Kiruna Stol" Midsummer Celebration

Update: Possible corruption issue found
During testing for our forthcoming GA release, we discovered a possible corruption bug in this milestone. Although it's fine to use to learn the new APIs, it's not recommended for production use. If you really need to go into production on a milestone release, the 1.4 M04 milestone is preferred.

Midsummer madness!

Extending the festive atmosphere of Midsummer here in Sweden (though sadly not the copious amounts of beer and strawberries), we’re releasing the final milestone build of Neo4j 1.4. The celebration includes: Auto Indexing, neat new features to the REST API, even cooler Cypher query language features, and a bunch of performance improvements. We’ve also rid ourselves of the 3rd-party service wrapper code (yay!) that caused us and our fellow (mostly Mac) users in the community so much anguish!

Refined Auto Indexing

Having a database manage indexes on behalf of users can be awfully convenient. Recently we announced an early access version of our auto indexing framework that provided a rudimentary way to have indexes managed by convention, rather than explicitly. In this milestone release we’ve incorporated your feedback to provide a much more polished user experience.
The auto indexing feature requires some configuration options during startup of your GraphDatabaseService, depending on the functionality you want to enable. There are two AutoIndexers, one for nodes and one for relationships, enabled and configured independently. If you want to AutoIndex all node properties called “name” and all relationship properties called “since” the configuration is:

Map<String, String> config = new HashMap<String, String>();
config.put(Config.NODE_AUTO_INDEXING, “true”);
config.put(Config.NODE_KEYS_INDEXABLE, “name”);
config.put(Config.RELATIONSHIP_AUTO_INDEXING, “true”);config.put(Config.RELATIONSHIP_KEYS_INDEXABLE, “since”);

Now, when a transaction finishes successfully, all primitives are checked for changes on those properties and are added in a dedicated index.
To use that index, simply...
ReadOnlyIndex nodeAutoIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
...and then use nodeAutoIndex as a read-only Index<Node>.
The AutoIndexer interface is the primary API through which you can control the auto indexing functionality at runtime. For more details, see the javadocs for the available methods and the manual for comprehensive examples and more detail.

Evolving Cypher

The Cypher language query language goes from strength to strength. In this release we now have a powerful (read-only) graph-matching language which can support some pretty sophisticated queries. Here at the Neo4j HQ we’ve had a blast exploring the cineasts.net dataset through Cypher, like so:
START user=(User,login,'micha')
MATCH (user)-[:FRIEND]-(friend)-[r,:RATED]->(movie)
RETURN movie.title, AVG(r.stars), COUNT(*)
ORDER BY AVG(r.stars) DESC, COUNT(*) DESC limit 7
==> +-----------------------------------------------------+
==> | movie.title | avg(r.stars) | count(*) |
==> +-----------------------------------------------------+
==> | Forrest Gump | 5.0 | 2 |
==> | The Matrix Reloaded | 5.0 | 1 |
==> | The Simpsons Movie | 5.0 | 1 |
==> | Terminator Salvation | 5.0 | 1 |
==> | The Matrix | 4.8 | 5 |
==> | Meet the Fockers | 4.0 | 1 |
==> | Madagascar: Escape 2 Africa | 4.0 | 1 |
==> +-----------------------------------------------------+
==> | 7 rows, 9 ms |
==> +-----------------------------------------------------+
The logical operators syntax has been improved and aggregate functions have been built out to enable sorting/slicing/limiting, making Cypher a powerful tool for querying graph data. Want to see it in action? Our Michael Hunger has produced a lovely screencast introduction to Cypher. We'll continue to evolve Cypher, complementing our continued support for Gremlin and an open embrace of language research and development.

Paged Traversers in the REST API

In addition to the batch interface that we released in the previous milestone, the REST API now supports paging traversal results so that clients can iterate through large result sets on the server side, rather than pulling a single, potentially large result set onto the client for processing.
Node representations now contain a URI for creating a paged traverser. The API is similar to the existing REST traverser API, except that the client is able to specify a page size and a lease timeout (since paged traversers are expired over time to recover server-side resources).
This provides much finer granularity of control for REST clients and also has benefits for server performance (particularly since massive response representations don’t have to be created on the heap). If this approach works well, we might generalize it to graph algorithms and plugins in future releases.

Performance improvements

A constant theme (and dev team work package) is to make the database engine smaller, faster, and better. To that end we’ve been busy at work under the hood to squeeze out more performance for the 1.4 release. Highlights compared to Neo4j 1.3 include:
  • 25% smaller memory footprint so that even more objects can be crammed into the valuable heap space caches.
  • Getting typed relationships using node.getRelationships(TYPE,INCOMING/OUTGOING) and getting directed relationships with node.getRelationships(OUTGOING/INCOMING) are around 4x faster due to a new cache implementation.
  • Small transactions (e.g. creating a single node) are now also around 4x faster.
  • Traversing performance is around 50% better than previously measured.
In addition to these, we have measured a number of other modest double-digit improvements in things like adding to indexes, retrieving uncached relationships, and getting relationships from heavily connected nodes, all of which add up to a generally snappier experience.

What’s next?

This is our final milestone build and we’re feature-frozen for the 1.4 release. In the coming days we’re asking for your feedback (and bug reports) while we perform deep QA on the stack to make sure our 1.4 GA is flawless.

Download, feedback

As always, the download is available on the Neo4j Web site, and the individual components are available on Maven central.
Happy hacking and give us your feedback!

Friday, June 10, 2011

“Kiruna Stol” 1.4 - Milestone 4


Hi everyone,

We’re on the fast track to the next major Neo4j release, “Kiruna Stol”. With today’s milestone, we’ve added some brand new features, some experimental aspects that we’re looking for feedback on, and of course numerous enhancements to everyone’s favorite graph database.

Cypher - An expressive graph query language

To allow expressive and efficient querying of the graph store without having to write traversers in code, we’re releasing the first iteration of a new query language, code-named “Cypher”.
Cypher is designed to be a humane query language, suitable for both developers and (importantly, we think) operations professionals who want to make ad-hoc queries on the database. Its constructs are based on English prose and neat iconography, which helps to make it (somewhat) self-explanatory.
For example, here’s a query which finds a user called John in an index and then traverses the graph looking for friends of John’s friends (though not his direct friends) before returning both John and any friends-of-friends that it’s found.

In this next example, we take a list of users (by node ID) and traverse the graph looking for those other users that have an outgoing follows relationship, returning only those followed users who are older than 18.



The Cypher documentation is growing even at this early stage, available in its own section of the Neo4j Manual.

This is how it looks in the webadmin console:


Batch-oriented REST API

The Neo4j Server now contains a new (and experimental) REST API to support batching of REST operations. With this API it’s possible to upload many commands to the database simultaneously and have them executed as part of a single transaction – reducing both network and transactional overheads.
The API is activated by POSTing JSON encoded requests to http://.../db/data/batch
An example HTTP entity body for the request is shown below, containing a list of commands to execute:

Request results are returned as a JSON array, each element of which contains the return data, status code, originating request and the specified request id of the related command. Any erroneous commands stop the execution and roll back the transaction. When this happens, the HTTP response provides feedback on the command that caused the problem.

Documentation and Tutorials

The Neo4j documentation project continues at pace. The site at http://docs.neo4j.org has become the authoritative place for all Neo4j documentation, some of which is even being generated automatically from running code to ensure it’s up to date and consistent.

To help get to grips with Neo4j, we’re also publishing tutorials to the docs site, so get stuck in!

Faster, Better, Smarter

As always, we’ve looked for opportunities to make our footprint smaller and our code run faster. In this iteration, we’ve lowered the memory profile of nodes, relationships and properties so they use less of that valuable heap space. Also, the BatchInserterIndex now keeps its memory usage in-check with batched commits of indexed data using a configurable batch commit size.

With the addition of relationships that can refer to the same start and end node (loops), we gained significant modelling expressivity. In this milestone we’ve shrunk the memory overhead on loop relationships so that it only takes up extra space for those nodes that actually have loops on them.
For traversals, we’ve built in some caching optimizations that yield useful traversal performance improvements. For even more performance (and less memory overhead), we’ve also removed some unnecessary checks and object allocations during traversal.

Bug Bashing

We’ve been busy refining the code and ironing out kinks in this iteration. We’ve fixed several issues that can arise in highly concurrent scenarios, together with a list of fixes for other annoyances that we think will add real value for our community:

  • Database locks are now released on Transaction afterCompletion(). This means that database objects can be used as monitors in a distributed environment such as HA setups.
  • Added directions on how to configure garbage collection logging.
  • Fixed a bug where the RRDB (the store where metrics data is kept) was not shutdown properly, causing problems on Windows machines.

Start your engines!

The release is now available in Maven central, and bundled up on the Web ready to download here.

Happy hacking!

Your Neo4j team and Andres "Cypher" Taylor.