Thursday, October 21, 2010

Neo4j 1.2 Milestone 2 - One More Step

With a pebble in your shoe, every step distracts you from where you're going. For a short walk, the pebble might be tolerable. But, the longer the walk, the more the pebble becomes a problem. When strutting around with Neo4j, you should be comfortable, get where you're going quickly, and look good doing it. With this release, a little sand is being emptied from the Neo4j track shoes.

This Milestone 2 release features an integrated indexing API, faster graph operations, and fixes for shutdown problems.

Community shout-out

Thanks to the awesome Neo4j community for helping to get this second milestone ready. From feedback to code contributions, the project continues to flourish with the help of an active community of contributors.

Integrated Indexing

The indexing API always felt a little bolted-on-the-side. We've thought about that, iterated over some refinements, discussed with the community and come up with a similar API that is a more natural part of the GraphDatabaseService. The new indexing had been available as a laboratory component, so some of you may already have been using it. Now, it is part of the official release.

What's different? Well, the operations are much the same, but now the GraphDatabaseService has been paired with an IndexManager that provides Indexes through a more fluent API. Indexes can now refer to Nodes or Relationships, use values from multiple keys, and do compound queries. It's quite powerful.

Also, a subtle but significant benefit of the integration: the index service participates in the shutdown of the GraphDatabaseService. You no longer have to worry about having a separate index shutdown.

Read more abut the integrated indexing over on the wiki page for the Index Framework.

Oh, the original indexing is still available, and the two can actually live side-by-side. You can transition over whenever you're ready.

Performance Improvements

Where the integrated indexing removes some irritation, the kernel improvements put more spring in your step. The changes are behind-the-scenes optimizations to caching and some prep work for high-availability (the kernel is now HA-Ready™). No tweaking needed, just bump up to 1.2.M02 and enjoy better performance from your graph operations.

Shutdown, Now. Really.

In the previous milestone, the GraphDatabaseService had two problems with actually shutting down when asked to do so. And, worse, nobody noticed until the release was out. Community members brought it to our attention on the mailing list, and even started investigating the causes. A shout-out of thanks to our alert and good-looking contributors.

The usual problem response unfolded: investigate, replicate, fix-ate, then validate. The fixes for both shutdown problems are included in this release. Your JVM should now exit as expected. And, there is now an integration test which spawns a JVM to make sure the problem doesn't happen again.

This experience prompted some reflection about testing.


While the joy of writing tests may be debatable, everyone appreciates the benefits of having comprehensive testing. All of the Neo4j components have unit tests. There are machines conducting long-running concurrency and performance testing. Now, there is an increasing suite of integration tests to check inter-component operations and even full JVM startup/shutdown behavior. Hooray.

You probably do testing as well. Probably, you have to set up some of the same test fixtures, test harnesses, or other test infrastructure that everyone else who is working with a graph. Probably, it's similar to what is used for testing the components. So, we've started to think about testing as a deliverable.

As a first small step in that direction, you'll find a "tests" directory in the milestone download. The code there shows common practice for unit testing a graph application. Looking forward, we may provide base classes and common utilities to make testing so easy to do that even the most begrudging test author won't mind doing it.

Lace up

Try out the new milestone and let us know what you think. Bring up any "more of this", or "less of that" comments on the mailing list. Together, we'll keep taking out the pebbles.



Wednesday, October 20, 2010

Spring Data and Neo4j

A serious obstacle for the adoption of alternative ('nosql') databases such as graph databases today is the limited support in middleware for multiple database backends.

In the Neo4j team, we've seen this over and over with community members and customers: while few developers LOVE relational databases, the support in popular middleware and development environments for using an RDBMS is just better and more mature than anything out there for the nosql alternatives.

Well, we think nosql databases deserve to be first-class members in modern software development. So we set out to change that by joining up with VMware's SpringSource to build truly great support for graph databases and other nosql stores into Spring -- the dominant middleware for the JVM platform.

Announcing the Neo4j Spring Data collaboration

Today, here at the annual Spring conference SpringOne 2GX in Chicago, SpringSource is showing for the first time the work they have been doing together with the Neo4j team around Spring Data.

Spring Data is a project to provide convenient support from the Spring framework for non-relational databases. In his keynote today, Rod Johnson, CEO of SpringSource and SVP of Middleware at VMware, introduced Spring Data and demonstrated how the Neo4j graph database is used in Spring Data!

Here's a copy of the deck (well, the Spring Data and Neo4j part of the deck) that Rod used in the keynote. It's brief (10 slides) but it does a good job of introducing the basics of what Spring Data is and what it means for developers wanting to use for example Neo4j in a Spring environment.

What is Neo4j Spring Data?

The work we've been doing with SpringSource/VMware is split into two projects:

  • Datastore Graph, which provides an annotation-driven POJO-graph mapping library. It is to Neo4j what Hibernate is to an RDBMS.
  • The Neo4j Roo Add-On, which provides a plugin that enables users to easily build Roo applications that are backed by Neo4j.

While these projects are not considered GA quite yet, the Datastore Graph project is rapidly maturing and is already being used in a commercial context. Both Datastore Graph and the Roo Add-On are expected to reach 1.0 before the end of the year.

How do I get started?

The projects are developed in the open on public git repositories with regularly pushed maven builds. So it's easy to get your hands on this software.

For detailed instructions, please check out Neo4j and SpringOne on our wiki. It will quickly get you up and running with either one of these two projects.

Monday, October 11, 2010

Announcing Neo4j 1.2.M01 and a focus on product usability

Ever since we first started working on Neo4j (crazy enough it's more than 10 years ago!), we've had a relentless focus on robustness and performance. While this work never stops, I think it's fair to say that Neo4j 1.1 is an incredibly robust and highly performant graph database.

But as we reflect on the past year and the amazing interest we've seen in 'nosql' and graph databases, it's clear to us that the world needs not only a very fast but also an amazingly simple graph database. And there's none out there today.

Now, don't get me wrong, I think Neo4j is very easy to use for a select group of people. In particular those highly skilled in Java or another language on top of the JVM.

But for example if you're a PHP hacker, or .NET programmer or if you prefer Java but don't find embedded databases fit well with your architecture -- in all these cases and more, getting up and running with Neo4j is a greater hassle than it should be.

Product Usability

We're significantly upping the ante on product usability and making this our number one focus area. Starting today, we're rolling out a number of changes that fall into two broad categories: quick releases of new features and improved feedback on whether those features were useful.

Changes so that we can quickly get new features in the hands of users:

  • The most important change is that we're moving to a time-boxed, bi-weekly release schedule: every two weeks starting today, there will be a new publicly available Neo4j milestone release.
  • This means that we will move to a layered release scheme where we provide three branches with different "velocity": daily snapshots, bi-weekly milestones and roughly twice-a-year stable releases. For more information, check out the release scheme on the wiki.

Changes so that we can get feedback on whether the new features work:

  • As of this first milestone release, we include a component we call the Usage Data Collector (UDC). Inspired by Eclipse's Usage Data Collector, the Neo4j UDC sends a ping back home when users run Neo4j.
  • Starting today, there will be a simple feedback form (powered by Crowdsound) on every Neo4j web page and in our web admin tool. From there you can easily suggest new features, vote on existing suggestions or just generally give us feedback.
  • Not news, but it's worth mentioning that we always listen in on twitter (user @neo4j or tag it with #neo4j) and the community mailing list.

Immediate usability changes:

  • We've been hesitant to make any changes without having a proper feedback loop in place. But we've taken an obvious step: the main download of Neo4j is now renamed from "Apoc" to, well, Neo4j. Hopefully that's a bit more intuitive!
  • Furthermore, if you live in the JVM land you'll be happy to hear that the Neo4j artifacts are now published to the Maven central repo. This means that Maven users don't need to specifically add the Neo4j repository to the configuration any more when dependening on Neo4j releases. It'll Just Work(tm) out of the box. The same goes for other Maven-compatible tools.

The Usage Data Collector

A couple years ago Mike Milinkovich -- the Executive Director of the Eclipse Foundation -- wrote a blog post titled Collecting Usage Data in Eclipse. He wrote:

open source projects have a particular challenge in getting to know their users: we don’t ask people to register, and we don’t have even the most basic information we need to help improve our software. We lack the stats to make good decisions.

These "stats to make good decisions" is one of the traditional challenges of open source. In commercial enterprise software, there's registration forms and mandatory contact information and timed trials. In the consumer space, every successful web site known to man is measuring what features people use in order to prioritize development efforts.

With open source, it's harder to get real, quantitative facts on whether the features you roll out actually solve a real problem for your users. We normally rely on subjective perceptions of feedback from community mailing lists and similar forums. That feedback is great but also difficult to use as a reliable measure on whether we're actually progressing or just expending energy building features that provide marginal value at best.

UDC closes that gap by giving us the ability to not only get statistics on for example how many people downloaded Neo4j last week, but also how many of those people actually fired it up to try it out at least once. That's a huge improvement.

Technically, UDC is a separate component that is loaded as a kernel extension if it's available on the class path. Once loaded, it waits 10 minutes (to minimize noise from unit tests) and then sends a ping with very basic data (kernel and Java version, the store id and download site) once every 24 hours.

If you want more details, check out the UDC wiki page or the source code here:

Furthermore, we've tried to make it as easy as possible to disable UDC. See the wiki page. In fact, since we've put all the "calling home" code in a completely separate component then removing that component will guarantee that UDC won't ever be activated. But we hope that most users will help us make better decisions by leaving it on.

We realize that any automatic data collection system is a huge privacy concern. And in the end if we get a strong negative reaction from our community we will remove UDC from the community download. But we strongly feel that it's crucial for us to have continuous feedback on whether the steps we take are actually moving us closer to building that amazingly simple graph database or whether we're just expending energy on features no one wants.

Other changes in 1.2.M01

On the kernel level, we've also worked a lot on lowering the memory footprint to minimize GC load. Neo4j 1.2.M01 also sports a new infrastructure for 'kernel extensions' that get automatically loaded during bootstrapping, which enables seamless integration of instrumentation and other non-core extensions. For more information, check out the Neo4j changelog here and the kernel changelog here.

All in all, we're very excited about the usability focus, the new release scheme and the first milestone in the 1.2 release. Please download and check it out for yourselves!