Thursday, March 24, 2011

Neo4j 1.3 “Abisko Lampa” M05 - Preparing for arrival

Chris Gioran
 It’s that last mile, having traveled far but now closing in on our destination. Time to check around for all our belongings, stretch a bit, and make sure we look presentable. Milestone 5. Well, what have we got?

A fresh look to Webadmin

Our big news is a fresh new UI for Webadmin. A complete rewrite of the data browser makes it faster and easier to use, featuring both the classic tabular view and a new graph visualization for navigating nodes and relationships with a neat point-and-click interface.
We’re also proud, in a geeky way, to see the first of many keyboard shortcuts in the Webadmin interface. Simply press ‘s’ to get to the data browser search bar from any page. Though we won’t be making any promises about full vi/emacs keybindings! Since Webadmin is new and shiny, we know there are a few rough edges. We’d love to get your feedback on it so we can make it perfect for the 1.3 GA release.

We’ve got your back on Windows

Our Windows support continues to be strong. Users will be happy to find full backup functionality available on Windows with this release, bringing parity to all of our supported platforms. Windows as an afterthought is now, well, itself an afterthought.

In the engine room

Naturally we’ve taken the opportunity to fix up some rough spots in the core engine. So we’ve squashed a few issues in graphdb, graph algorithms package, and taken some time to improve our Lucene indexing. A big thanks to our community and customers for being awesome at providing feedback so we can do this!

Next stop, Abisko Lampa...

With this release, Neo4j 1.3 is feature complete. Now, we’ll be hunkering down at Neo HQ (and in our satellite offices around Europe and the USA) to tighten up every single component, preparing for the general availability of Abisko Lampa. We think 1.3 is going to an awesome release and we’re working hard to make it so.

Now go get the milestone,
read more about it,
and let us know what you think.

EDBT Uppsala 2011 - Graph Processing everywhere.

Yesterday, I had the great honor to be invited to present Graph Databases and data-local processing at the EDBT in Uppsala. Thanks a lot to Pierre Sennellart, Anastasia Ailamaki and Tore Risch for inviting me to this great event and to Erik Zeitler for a good student singing session!

Overall, it was impressive to see so many experts in the field of database technology in one place. It seems the space is really picking up steam with cloud computing platforms, all the different efforts to merge the benefits of distributed data storage approaches like key-value stores and RDBMS and many others.

But what really was interesting from my perspective is that was a lot of different talks that mentioned implicitly or explicitly the processing graph structures and data-local analytics as an interesting approach to complex problem solving. Jeff Ullmann was talking about Map-Reduce extensions which included graph-like data structures. Susan B. Davidson gave a great talk on provenance and privacy, which involved a lot of workflow modeling that turns into graphs with access control on the workflow modules. I wonder if something like modeling ACLs in a graph could be of help here. Also, there was a lot of interest in processing of GIS and spatial problems like automatically discovering segments of movements in GPS traces, where at least some of the methods are a very interesting fit for graphs.

All in all - it is encouraging to see that there is a lot of research around the processing of complex data structures not only in scaling out but even coming up with ways to traverse graphs and express queries in ways that do not require the whole dataset to be touched. And the Property Graph Model is catching on, with a lot of databases (Neo4j, HBase, Redis, OrientDB etc) implementing support for it and thus making querying with Pipes (Dataflow) and Gremlin (Graph Query Language) an universal option to express rich queries on graphy data.

For this event, I prepared a graphy mindmap (click to open it), and had the chance to write on a real BLACKBOARD. Extremely cool and historic!

This is a great time to be in the graph database space!



Tuesday, March 22, 2011

Strategies for Scaling Neo4j

While working a little late into the evening, I've got round to writing up the excellent discussion on scaling Neo4j that happened on the mailing list, including Mark Harwood's useful design heuristic.

Hope you'll find it useful.


Friday, March 11, 2011

Neo4j 1.3 “Abisko Lampa” M04 - Size really does matter

Note: The store format changed in this milestone to allow for large databases. This means that you manually need to run an upgrade before your old database works with Neo4j 1.3.M04 and later. Please see me after class for more info. Wait, no: I mean see the latest documentation.
Champion Andrés Taylor (in the middle)

Today we passed the fourth milestone on the tracks towards Neo4j 1.3 Abisko Lampa. By far the greatest change in this release is how much data Neo4j can store for you (hint: approximately 128 million truckloads). We have also improved our quality assurance on Windows, and extended our REST API to cover more of what Neo4j embedded can do.

Big store opening

A database can now contain 32 billion nodes, 32 billion relationships and 64 billion properties. Before this, you had to make do with a puny 4 billion nodes, 4 billion relationships and 4 billion properties. Finally, every single person on Earth can have their own personal node! And did we mention this is happening without adding even one byte to the size of your database?

This is a big change in the store format, though. It is very important to upgrade a database to the new format in a carefully controlled fashion. Shut down the database cleanly, then add the configuration parameter as described in the latest documentation. When starting back up, your database will be checked for compatibility with the new format and if it passes it will be upgraded, otherwise you will receive a message explaining the unsatisfied constraints.

Please note, however: while the upgrade happens in a safe manner, it is one way! So, if you request an upgrade and it succeeds, your store will be unreadable by previous versions of Neo4j.

(Un)-broken Windows

Not that Windows is broken, or that Neo4j was broken on Windows. But our tests themselves were less than fantastic. We now continuously run our full suite of quality assurance tests on both Windows and Linux before releasing to our Snapshot repository. In doing this work, we have uncovered a number of bugs that stopped our tests from running cleanly on Windows, and we have fixed them, ensuring in the process that the core of Neo4j is unproblematic.

Unfortunately, the same cannot be said about the backup component, which currently cannot perform the complete range of its operations on Windows. Until we fix it, for full backups you will need to cleanly shut down Neo4j and copy the database files. After this, you can do incremental backups, just like you’d expect.

Querious REST indexes

We added a few index-related operations to the Neo4j REST API: advanced queries and also cleaner index remove operations. Previously, you could use an exact look-up for indexed nodes or relationships. With the advanced query, any extra features exposed by a particular index framework can be queried.

For instance, using a lucene-index, let’s create a node, index it, then query:

# create an empty node
curl -X POST -H Accept:application/json http://localhost:7474/db/data/node
# create a node index named “my-nodes” backed by lucene
curl -X POST -H Accept:application/json -HContent-Type:application/json -d '{"name":"my-nodes", "config":{"type":"fulltext","provider":"lucene"}}' http://localhost:7474/db/data/index/node
# index the node (assuming the “create” step revealed node 1)
curl -X POST -H Content-Type:application/json -d '"http://localhost:7474/db/data/node/1"' http://localhost:7474/db/data/index/node/my-nodes/the_key/the_very_long_value
# query the index, expecting to find node 1 in the list
curl -H Accept:application/json "http://localhost:7474/db/data/index/node/my-nodes/the_key?query=the_very*"

To drop that node from the index, try one of the new remove operations:

# remove this way (removing node 1 with exact key/value)
curl -X DELETE -H Accept:application/json http://localhost:7474/db/data/index/node/my-nodes/the_key/the_very_long_value/1
# talk this wa-ay (removing node 1)
curl -X DELETE -H Accept:application/json http://localhost:7474/db/data/index/node/my-nodes/1

For details, read up on the REST API page.

The little things that matter

We have also made a number of less dramatic changes:

  • Minor bug fixes in ShortestPath in graph-algo
  • Some bug fixes in lucene-index regarding batch insertion
  • All manpages are included in the manual
  • Added Dijkstra to the list of graph algorithms to be used when finding paths
  • Minor improvements and bugfixes to the webadmin interface
Measure up

Ready to see how Milestone 4 measures up? Then go get it!

As always, we're grateful to the community for contributions and feedback. Please let us know how things go with this release. Abisko isn't far away now, so we'll start preparing for the big arrival.


Tuesday, March 8, 2011

Neo4j Spatial, Part1: Finding things close to other things

Geography is a natural domain for graphs and graph databases. So natural, in fact, that early map users of Neo4j simply rolled their own map support. However, it takes some effort to deal with spatial indexes, geometries and topologies, and so, since September 2010, the Neo4j Spatial project has been providing early access releases enabling a wide range of convenient and powerful geographic capabilities in the Neo4j database. Some of these have already been used in production projects during the last year. As we move forward and continue to refine the APIs, we will publish a series of blogs introducing users to various aspects of this powerful, yet simple, geographic processing framework: from proximity searches, through routing, to complex GIS Topology analysis.

One of the simplest and most intuitive places to start is to ask the question: how do I find things close to other things?. This is exactly the question answered by location based services on the web, as well as a number of existing spatial databases. In the NoSQL area, CouchDb released GeoCouch in 2009, and MongoDb released their geohashing index in 2010. Both answer exactly this question.

Unlike these other NoSQL databases, Neo4j started with support for complex Geometries in 2010. While simple proximity searches have been possible, they have only recently become simple and intuitive. In this example we will demonstrate just how simple they are.

The complete example

// Initialize database
GraphDatabaseService graph = new EmbeddedGraphDatabase("db");
SpatialDatabaseService db = new SpatialDatabaseService(graph);
SimplePointLayer layer = db.createSimplePointLayer("neo-text");

// Add locations
for (Coordinate coordinate :
     makeCoordinateDataFromTextFile("NEO4J-SPATIAL.txt")) {

// Search for nearby locations
Coordinate myPosition = new Coordinate(13.76, 55.56);
List<SpatialDatabaseRecord> results =
    layer.findClosestPointsTo(myPosition, 10.0);
The above code initializes the spatial index, adds a number of location points to the index and then performs a proximity query. In fact it does everything you need and you could just copy it into a new class' main method and it will run. Well, almost. You do need to get your data from somewhere, and in the above code we wrote the method makeCoordinateDataFromTextFile(). This simply returns an iterator of Coordinate objects representing the locations to add to the index. Before getting into that, let's start by explaining the rest of the code in a little more detail.


The new API added for the 0.5 release of Neo4j Spatial simplifies working with point locations. Looking at the above code, we see the following steps are involved:

Initialize the SimplePointLayer

This is a map Layer, or collection of indexed points, and is created with the code:
SimplePointLayer layer = db.createSimplePointLayer("neo-text");
Neo4j Spatial works with all kinds of spatial Geometries, including Points, LineString and Polygons. For this example, since we are working with Points, we need do no more than create a SimplePointLayer to get access to Point capabilities and proximity searches. In further blog posts we will delve deeper into what a Layer really is and how to deal with much more complex data.

Adding Points

To add a single Point to the database, you could call:
layer.add(13.0, 55.6);
This will add a Point at longitude 13.0 and latitude 55.6, inside the city of Malmö, Sweden, coincidentally close to where the Neo4j core development team is. While this is simple, internally the code will work with actual Point objects, made from Coordinate objects. And when dealing with large amounts of data, you will quite likely work with these too. The code we used in the main example calls a method that produces an Iterator of Coordinates, and adds those to the layer:
for (Coordinate coordinate :
     makeCoordinateDataFromTextFile("NEO4J-SPATIAL.txt")) {
The code in makeCoordinateDataFromTextFile can be read from the unit test code in TestSimplePointLayer, and simply reads some ASCII Art from a file, and makes x and y coordinates for each pixel of the text. The file we used contained the following:
.#       # #########   #####         #       ###               #####   ########      #     #########   #####       #     #        
 ##      # #          #     #       ##         #              #     #  #       #    # #        #         #        # #    #        
 # #     # #         #       #     # #         #             #         #       #   #   #       #         #       #   #   #        
 #  #    # #         #       #    #  #         #              ##       #       #  #     #      #         #      #     #  #        
 #   #   # #######   #       #   #   #         #     #####      ###    ########   #######      #         #      #######  #        
 #    #  # #         #       #  #    #         #                   ##  #          #     #      #         #      #     #  #        
 #     # # #         #       # #########       #                     # #         #       #     #         #     #       # #        
 #      ## #          #     #        #    #   #               #     #  #         #       #     #         #     #       # #        
 #       # #########   #####         #     ###                 #####   #         #       #     #       #####   #       # #########
When this is exposed to a mapping system, through Neo4j's support for common GIS's like GeoServer and uDig, we can see the text written over the south of Sweden, with the top-left corner of the N in the town of Malmö.
OK, this is not a real world example, but it is certainly cool!

Proximity Search

So, the last thing to do is search for points nearby. The following code from the main example does just that:
Coordinate myPosition = new Coordinate(13.76, 55.56);
List<SpatialDatabaseRecord> results =
    layer.findClosestPointsTo(myPosition, 10.0);
We have decided we want to know what is near the point at (13.76, 55.56) which is somewhere in the middle of this map. Perhaps we are tourists travelling around the countryside of Southern Sweden, and, bored of endless fields of brilliant yellow canola flowers, we ask the map for something more relaxing to do than getting a stiff neck while sitting in the car. We tell the map where we are:
Coordinate myPosition = new Coordinate(13.76, 55.56);
We don't want to travel more than 10km further, so we limit the search to 10km from our position:
List<SpatialDatabaseRecord> results =
    layer.findClosestPointsTo(myPosition, 10.0);
The results we get are already sorted by distance, so we could just pick the first and go there:
Coordinate closest = results.get(0).getGeometry().getCoordinate();
However, you might be curious to see everything in the 10km search limit, so let's pop that onto another map layer:
Yes, just what the doctor ordered!

What's next?

This example showed one specific API. But as mentioned in the beginning, this is layered on top of a much more powerful set of capabilities, like:
  • Adding spatial intelligence to existing graph database models
  • Working with more complex geometries, and performing more complex spatial queries
  • The Open Street Map graph data model and performing queries specific to OSM
We will continue to explore some of these in upcoming blogs. One of the most important questions developers need to deal with is how to adapt their existing data to Neo4j Spatial. The example in this blog does not show you the graph structure at all. But of course it is, in fact, creating a special graph structure to support both the points and the index, and you can access that graph using normal Neo4j APIs. However, this is not the most useful and intuitive way to go. What should you do if you already have your own graph, and part of the graph already represent locations? How can you adapt your model to Neo4j Spatial so that you can index your existing graph? How do you perform proximity searches and other spatial queries on your own data model?

Luckily for us, Neo4j Spatial was designed from the beginning to deal with exactly those questions, and so in the next blog post that is precisely what we will show you.

While there is a lot of functionality in Neo4j Spatial, the different ways of exposing it (REST, Java, GeoTools, Neo4j Index API) are not set in stone. We very much welcome your feedback on how you think they should look! Just comment on the Neo4j mailing list or directly to the author.