Tuesday, October 11, 2011

Announcing Neo4j 1.5 “Boden Bord” Milestone 2 - the Autumnal Fruits of our Labor

As the last of the summer sunshine leaves us and the northern winter approaches, at Neo HQ we've been hunkered around our laptops for warmth and been busy packing in all manner of new functionality for the forthcoming Neo4j 1.5 release. In our last milestone release before our GA, we're opening the floodgates and letting out a feature-complete stack. And there's a lot in here too!

Safety First!

In the 1.5 M02 release, the format of the data store has significantly changed. This means you need to be careful when upgrading from previous versions of Neo4j, and you'll have to have plenty of disk space available for the conversion process.
Before using this milestone, you need to cleanly shutdown and backup before upgrading. Similarly, Neo4j HA clusters must be shutdown, and their stores must be upgraded one at a time before being brought back up. As always, think very carefully before using a milestone build for production use.
And a final notice that online backup will default to full backups if instructed to perform an incremental backup on an old store version, while storing out of harm’s way your existing backup.
With all that in mind, let’s break open the new features!

Store size and IO reduction

Now you’ve read the safety card, we can see the reason for the changed store layout: to substantially reduce our on-disk footprint. In typical cases we've seen stores reduced by 30% but in some cases the store size is even smaller still.
To shrink the store format on disk, we've had to change how we inline some data (much like our short-string optimization) so that many properties are now stored in-line with their respective nodes/relationships rather than being in a separate store.
As a Neo4j user you won't notice any changes to the API, but you’ll see performance improve because our new store results in fewer IO operations for the same functionality. This is especially valuable in virtualized severs where IO latency can be unpredictable and expensive.

Cypher enhancements

The team behind the Cypher query language continues to innovate at a ferocious pace which has meant some powerful upgrades to the syntax. Some existing queries might have to be migrated. In this release Cypher’s been extended and refined so that:
  • Relationships can be made optional
  • Added new predicates for iterables: ALL/ANY/NONE/SINGLE to refine filtering on returned subgraphs
  • New path functions: NODES/RELATIONSHIPS/LENGTH return respectively the nodes, relationships or length of a path
  • Parameters for literals, index queries and node/relationship id
  • Shortest path support has been added
  • The Pattern matcher implementation will, if possible, eliminate subgraphs early, by using the predicates from the WHERE clause providing faster response times
  • Relationships can be bound
  • Added IS NULL for painless null checking
  • Added new aggregate function COLLECT which combines multiple result rows into a single list of values
Cypher’s capabilities and expressiveness continue to improve, and they’re fueled by your feedback so take these features for a test drive.

HA improvements

Like our ongoing kernel work, HA is part of our stack that isn’t readily visible to users but it’s something we obsess over. In this release we’ve focused attention on performance and robustness, particularly for cloud deployments. The upshot is that running and operating a Neo4j HA cluster - even in unpredictable environments and multi-region clusters like on Amazon - is solid and more refined than in previous releases.

Webadmin

Our wonderful Webadmin dashboard for Neo4j server is getting slicker by the release. This time around our Web team has pulled out the stops to deliver some fantastic new visualization tooling. And we think you’ll agree the new eye candy is rather nice too.

As you can see the tool looks more sophisticated and slick than previous versions. But this beauty isn’t just skin-deep, there’s some real substance in what you can do with this release. In particular, we’re fond of:
  • Rule-based visualization style and icons in the visualization so that rich and compelling domain models can be easily shared with domain experts
  • Visualization profiles sot that you can inspect data from different perspectives

Community features

Our 1.5 releases marked the beginning of a dedicated community team at Neo HQ who have championed a plethora of new features (and squished a few bugs) in the 1.5 M02 release, based on feedback from our fabulous open source community.
In the Neo4j server, there have been some subtle changes to the REST index API which make the API clearer. For example, now to create an index you POST an indexable payload to the URI of the index, rather than encoding that payload as part of the URI itself. Functionality hasn't changed, but don't forget to check that your client library supports the change.
Also added is support for HTTP authentication and authorization in the server. By implementing the org.neo4j.server.rest.security.SecurityRule interface (and adding a line of configuration) developers can easily intercept each HTTP request and check whether it's permitted for the targeted URI. A simple scheme might simply check an access token against a third-party directory service, while more sophisticated schemes might implement node-by-node level authorization policies. Either way, the API is short and neat and allows you to plumb Neo4j into your existing security infrastructure.
In 1.5.M02, we also upgraded to Gremlin 1.3 as the Groovy-based scripting language to be accessed via the Gremlin Plugin. Enjoy all the performance and syntax improvements from the Tinkerpop community!
For a full list of community issues fixed, see https://github.com/neo4j/community/issues?milestone=1&state=closed and thank you all for participating.

Download...enjoy!

If all this seems like something you might like, then the Neo4j 1.5 M02 release is now available from our download site. Developer dependencies are also available from the Maven repository.
Since this is our last milestone before the 1.5 GA release we really need your feedback. If you find things you like, things you detest, things that are puzzling, things that are wrong or even bugs please get involved in the community and let us know.

9 comments:

Anonymous said...

congrats and thanks for the release. I'll wait the final one as an aws user. Please work on traverse times too, I've a simple one which takes just too much time (3 seconds, where it should be a lot less)

Peter Neubauer said...

Hi there,
could you give us a bit more info on the data, graph size and type of traversal you are doing so we can help you speed things up?

Ashutosh said...

Does upgrading from 1.5 M01 also require a backup-restore?

Anonymous said...

Hi Peter, sure, you can find all the info here: http://help.neo4j.org/discussions/problems/3-4-second-traversal with a graph file in order to show you
Unfortunately with 1.4 it's the same problem, I haven't tried 1.5 out yet
The same scenario on "larger" scala (at the moment just 200'000 relationships and 60'000 nodes) for some traversal takes something like 20-30 or more seconds... using a depth of 2 it decrease a lot the required time, but I need 3...and very fast (few ms :( )
My worry is that if it's so slow know where there will be like millions of nodes and hundred of millions relationships (obviously not all connected, but if you consider 3rd depth it'll take few millions at least...see linkedin) it will take forever :(

Mattias said...

Ashutosh: what exactly do you mean be backup-restore? The database format has changed so a migration must be performed. Please do a backup first just for safety though. http://docs.neo4j.org/chunked/milestone/deployment-upgrading.html

Ashutosh said...

Thanks Mattais, that's what I meant to ask - whether the data format has changed from M01 to M02. We were trying out a graph with over 9M edges, and would like to use Gremlin 1.3

Anonymous said...

1.5 looks good. Our project is currently on 1.4.1 but we'd be interested in upgrading to 1.5. However, our company's infrastructure/deployment team is uncomfortable installing/supporting milestone versions of any application/server. Can you please give an idea of when we can expect a "stable release" of 1.5?

Peter Neubauer said...

Hi there,
1.5 is already overdue, but we have found one problem in QA that we MUST fix before release, and it is not trivial. So, any week now 1.5 will arrive. Sorry for the vague dates, but we are working on it :)

Anonymous said...

Hi Peter, any info on the "bug" about slowness I posted above? Is there any improvement on 1.5 about it?