Wednesday, January 25, 2012

Released Neo4j 1.6 GA “Jörn Kniv”!

Three milestones later and we’re proud and happy to announce the release of Neo4j 1.6 GA.

We are excited about a host of great new features, all ready to be used. Let's get to it.

Highlights

What features have been included in this release?
  • Cloud - Public beta on Heroku of the Neo4j Add-on
  • Cypher - Supports older Cypher versions, better pattern matching, better performance, improved api
  • Web admin - Full Neo4j Shell commands, including versioned Cypher syntax.
  • Kernel - Improvements, for instance the ability to ensure that key-value pairs for entities are unique.
  • Lucene upgrade - Now version 3.5.

Also, there have been many improvements behind-the-scenes:
Infrastructure - Our library repositories have moved to Amazon, providing significantly faster download times.
Quality - High availability features better logging and operational support.
Process - Better handling of breaking changes in our api and how we handle deprecated features.

If you want more info on all of this - sure you do - please keep reading. Here is a run down of the major new features in Neo4j 1.6.

Heroku Public Beta

The public beta of the Neo4j Add-on for Heroku is available. We're taking a careful approach with our cloud services, evaluating the best supporting infrastructure and user experience in preparation for a general release in the coming months. Already, we've been pleased with the positive response.

Documentation on how to get started with the Heroku Neo4j Add-on can be found at the Heroku DevCenter. We’ll be posting additional guides for getting started on Heroku with Neo4j.

For pioneering adopters, we welcome you to join our Neo4j Heroku Challenge. You can win fabulous prizes while proudly blazing a path into the cloud for our community.

Latest on Cypher

Most the work in Cypher for this release has been internal changes that are not immediately visible to an end user. The type system has been rebuilt and revamped, and a second, simpler, pattern matcher has been added. The first change makes the Cypher code base faster to work with, and the second makes your queries faster.

End user facing changes include: possibility to get all shortest paths, the COALESCE function, column aliasing, and the possibility for variable length relationships to introduce an iterable of the relationships.

More, array properties have been supported in Neo4j for a long time, but until now it wasn’t possible to query on them. This release makes it possible to filter on array properties in Cypher. We have also improved aggregation performance.

Finally, there are two breaking changes - the syntax for the ALL/NONE/ANY/SINGLE predicates has changed, and the ExecutionResult is now a read-once, forward only iterable.

New to Cypher? Then you should watch this updated "Introduction to Cypher" screencast by Alistair Jones:



New on the web admin
I’m quite happy to announce that the web admin interface has initial support for Cypher calls directly in the data browser. It’s so sweet to be able to query your way around the node space! And, the Cypher console is now supports full Neo4j Shell commands.
Moreover, Gremlin has been updated to version 1.4, with major improvements and bug fixes.

Kernel changes

This release includes a popular feature request: the ability to ensure that key-value pairs for entities are unique!

If you look up entities (nodes or relationships) using an external key, you’ll want exactly one entity to correspond to each value of the key.  For example, if you have nodes representing people, and you look these up using Social Security Number (SSN), you’ll want exactly one node for each SSN.  This is easily achieved if you load all your data sequentially, because you can add a new node each time you meet a value of the key (a new SSN).  However, up to now, it has been awkward to maintain this uniqueness when multiple processes are adding data simultaneously (via web requests for example).

Since this is a common use-case, we’ve improved the API to make it easy to enforce entity uniqueness for a given key-value pair.  At the index level, we’ve added a new method putIfAbsent which ensures that only one entity will indexed for the key-value pair, even if lots of threads are using the same key-value pair at the same time.  Alternatively, if you’d prefer to work with nodes or relationships rather than with the underlying indexes, there’s a higher level API provided by UniqueFactory. This makes it easy to retrieve an entity using get-or-create semantics, i.e. it returns a matching entity if one exists, otherwise it creates one. Again, this mechanism is thread-safe, so it doesn’t matter how many threads call getOrCreate simultaneously, only one entity will be created for each key-value pair. This functionality is also exposed through the REST API, via a ?unique query parameter.

Lucene upgrade

Neo4j uses Apache Lucene as the default implementation for its indexing features - this allows you to find “entry points” into the graph before starting graph-based queries.  Lucene is an actively developed project in its own right, and is constantly being enhanced and improved.  In this Neo4j release, we’re taking the opportunity to upgrade to a newer stable release of Apache Lucene, so that all users get the benefits of recent enhancements in Lucene.  We’ve moved to Lucene 3.5; for details on all the changes, have a look at their changelog.

Breaking changes and deprecating

We’re introducing a new way to handle breaking changes. They will be flagged in the change logs as “BREAKING CHANGE.”

Where we do introduce a breaking change, we will continue to support the older functionality for 2 GA releases. This would typically be six months heads up and will allow you to adopt new GA releases quickly while giving plenty of time to develop against the new API. This policy applies to published and stable APIs, including Cypher.

In the same vein: We now have a deprecated feature. Cypher execution is now part of the core REST API, the cypher plugin is deprecated.

This policy does not cover third-party add-ons (like Gremlin from Tinkerpop) which have their own release strategy.

Looking Forward

Community member Pablo Pareja Tobes had organized a poll around feature requests, which really helps us prioritize our development focus. Thanks everyone making their voice heard!

Here are the results:
Filter relationships natively by their name (supernodes issue)
Sharding and horizontal scalability 
Mandatory node types
Node insertion with checking of uniq external (get_or_create) 
N-ary relationships
20
24
6
17
12
Let's consider each of these features more closely.

Sharding

The write-scaling complement to high-availability, sharding distributes a graph across multiple machines in a cluster. We (and many others) have researched the general graph sharding problem for years. This year, we're embarking upon a pragmatic approach to sharding, providing the benefit without obsessing about academic perfection.

Supernodes

In Twitter-culture, you'd call these the "Ashton Kutcher" nodes, the nodes in a graph with an extreme number of connections. We've been working on a branch that has a promising approach for mitigating the performance challenge of traversing these supernodes.

Node types

In Neo4j, there is no schema, only structure. Relationships indicate the effective type of the connected Nodes, and Indexes imply membership in a set. Often, though, it would be helpful to know the designated type of a Node. So, we're considering the appropriate way to introduce just enough schema. If you have any thoughts or desires to share, please chime in on the issue page.

Unique indexing

Indexes provide a quick look-up for sets of Nodes or Relationships. With unique indexes, Neo4j will guarantee that only one Node is mapped to a property key, providing support for domain-specific identifiers. This new feature is available now with 1.6GA.

N-ary relationships

Neo4j's property graph model restricts a relationship to connecting two nodes. In some domains, it is useful to consider relationships having multiple end-points. For now, we think this is best solved with domain-specific solutions.

Fixes and details

Of course, this release includes a slew of bug fixes. For details about all the fixes and additions please read the various CHANGES.txt files included in the packaging.

Also, an impressive array of community-contributed development has been included in this release. Thank you all for the good ideas and pull requests - everyone is really appreciating it!

Go for it

Your feedback is of great value and we would love for you to join our community mailing list.
The Neo4j 1.6 is ready - download now and get involved!

Björn Granvik et al
Director of Engineering @ Neo Technology

7 comments:

Hendy Irawan said...

Super Awesome !

Glad to have 1.6 released :-)

Anonymous said...

what's the ETA on sharding? Is it Summer 2012 or more like end of year?

Shahzada Hatim said...

Congrats on the release.

I think a more feature complete Cypher would be the killer feature. I would really like insertion and update of data with cypher.

Granted that we should not just copy the old paradigms, but I think some thing transitional is a _must_

Björn Granvik said...

Hendy Irawan and Shahzada Hatim: Thanks!

Also Shahzada, I agree with you on Cypher. I have a crush for that piece of code and we will keep extending it. :)

As far as transitional or not: At one point I thought that a more SQL like approach would make it easier for all those programmers out there in the world to switch. However, personally I believe we have to break new ground. We will just keep toiling away to see what comes out. Stay tuned.

Ashwin Jayaprakash said...

Cypher looks great. It'll be interesting to see what sharding will look like.

Keep it up!

Björn Granvik said...

Thanks Ashwin!

Lee Sandberg said...

Great release of Neo4j.