Friday, February 25, 2011

Neo4j 1.3 M03 - With (short) strings attached!

Today marks the release of our third milestone (M03) towards the Neo4j 1.3 "Abisko Lampa" release, which is now available for download. It's been a busy iteration with several parallel streams running covering low-level footprint optimizations through to high-level usability, sysadmin and operations support. We've also been thrilled by fantastic threads on the mailing list and by how Neo4j has been received out in the community during this release cycle. When we're slogging away on the code it's great to know that we have such a fabulous community working with us at every step - thanks folks!

Kernel changes

Usually we don't talk much about the things going on in the kernel since we're just constantly doing computing science stuff to make it better. But for this release we're really excited about what's going on down in the engine room.

We're particularly enthusiastic about our new improvements for short strings. This upgrade allows our storage engine to inline common short strings (things like Java enum names, or Ruby symbols), meaning we get much more compact data files and faster reads/writes for short strings since we don't have to use the DynamicStringStore. If you'd like to know more (including what's considered a short string) then head over to Tobias Ivarsson's blog posting.

We're really grateful to all the people out in the community who took the time to run our statistics gathering application on their databases. From that, we know that our short-string implementation really meets your needs.

In keeping with our focus on usability, this iteration also gave us a chance to reflect on things that make our users' experience better once they're in deployment. In particular we now support both full and incremental backups over the network using our backup tool (which incidentally is based on the same rock-solid codebase that we use for Neo4j High Availability).

Server improvements

The server product is coming along steadily. In this release we've focussed on improving usability as we work towards our high availability (HA) server release for 1.3. In particular we've fixed a few of those visual niggles in our Webadmin tool and made our configuration file way simpler and DRY. We've also added a small RESTful discovery API, so if you're building your own tools around Neo4j server, it's easy to lookup what services are available and where they're hosted.

We've also split out the API for managed server plugins into its own separate small jar which you can pull from a Maven repo and get started on building cool server plugins easily (don't worry, you don't need to use Maven, we just like the Maven repository system). Naturally the server also benefits from all the goodness being poured into the kernel and embedded database too!

Devteam gossip

Last time round we announced that we'd moved all the Neo4j code over to Github which has been largely successful, though we experienced a bumpy patch or two with our build system.

We're now looking to deploy a new continuous delivery system which will take code in, and spit out fully QA'd builds of our products several times per day. In doing this we're hoping to simplify our own housekeeping (no more Maven madness) and be able to provide rapid turnaround for features to the community. So over the next iteration look out for announcements on the mailing list asking for your input on what you'd like to see.

Until then, thanks for hanging out in the coolest block in the NOSQL neighborhood!

-- Jim

Wednesday, February 23, 2011

Jim Webber: Scaling Neo4j with Cache Sharding and Neo4j HA

Jim Webber discusses solutions for availability, scalability and very large data sets in Neo4j in his new blog post. Read it over here: Scaling Neo4j with Cache Sharding and Neo4j HA. As always, the Neo4j community and team will be happy to hear you thoughts and help you out on the mailing list.

Wednesday, February 16, 2011

Jim Webber: On Sharding Graph Databases

Jim Webber has been on the Neo4j team for a while now, working on making Neo4j even more awesome. Today he pushed out a blog post named On Sharding Graph Databases, to be followed by future postings. Head over to read and comment, discuss the topic on the mailing list and of course on twitter.

Monday, February 14, 2011

Announcing Neo4j on Windows Azure

Peter NeubauerMagnus MÃ¥rtensson

Announcing Neo4j on Windows Azure

Neo4j has a ‘j’ appended to the name. And now it is available on Windows Azure? This proves that in the most unlikely of circumstances sometimes beautiful things can emerge. Microsoft has promised Java to be a valued “first class citizen” on Windows Azure. In this blog post we will show that it is no problem at all to host a sophisticated and complex server product such as the Neo4j graph database server on Window Azure. Since Neo4j has a REST API over HTTP you can speak to this server from your regular .NET (or Java) applications, inside or outside of the cloud just as easily as you speak to Windows Azure Storage.


Intro

This first version (1.0 "JFokus") of our deployment is a bit simplified in some areas. Still it is a complete and fully functioning deploy of Neo4j to Windows Azure. We are already working on the next major release (2.0) which will be much more turn-key; just upload the application to Windows Azure and launch.
Furthermore we have serious plans to use this approach, Neo4j in Windows Azure, on a live project where we are backing a server application with complex graph calculations. We will layer spatial and social graphs in combined searches on the server side and serve condensed search results to the client applications outside of the Cloud.
This project is not a toy it’s the real deal and it runs very smoothly – Java runs with little or no hassle on Windows Azure!


If you are a .NET developer reading this post

What we have enabled for You, dear .NET developer, is to leverage a really powerful graph database and make it available in Your Windows Azure applications!


You can think of Neo4j as a high-performance schema-free graph engine with all the features of a mature and robust database. The programmer works with an object-oriented, flexible network structure rather than with strict and static tables — yet enjoys all the benefits of a fully transactional, enterprise-strength database.
The data model consists of Nodes, typed Relationships between Nodes and Key-Value pairs on both Nodes and Relationships, called Properties. This is how the Matrix characters and their relationships could look in a Neo4j data model:




How to communicate with it? It is very straight forward: Neo4j communicates using a REST based API over HTTP. This means that you can communicate with it just as easily as you can with standard Windows Azure Storage.


What we have done

The fact of the matter is that Neo4j has been running on Windows for a long time. What we have done in this project is to host it on Windows Azure. We have taken into account such things as dynamic port allocation and the subsequent version will also automatically handle storage backups. The following steps are involved in the deploy of version 1.0:

  • Upload a Java Runtime Environment (JRE) to Windows Azure Blob Storage.
  • Upload Neo4j to Windows Azure Blob Storage.
  • Upload the deployment of the Neo4j Windows Azure hosting project to Windows Azure – which will launch the install automatically.

The install will:
  • Download from Windows Azure Blob Storage to our Windows Azure server instance, and deploy, both the JRE and the Neo4j Server.
  • Configure diagnostics on the Windows Azure server instance to also include the Neo4j logs in the diagnostics collections.
  • Modify the configuration of Neo4j to listen to a run time assigned port, to point to the database storage location and to know the location of the JRE etc.

That completes the install. Next Windows Azure will launch Neo4j – and we receive MAGIC!


Brief comments

This version has a few manual deployment steps to many which we will mitigate in the subsequent versions of this project.
Diagnostics in Windows Azure could not be simpler; Neo4j logs it’s activity, as most servers do, to a configurable directory. Windows Azure is enabled to include custom directories in the standard diagnostics collections which is easily configurable on the machine at startup. This means you can reach the Neo4j diagnostics output for debugging and monitoring.

We will also store the data files of the graph database in a blob in Windows Azure Storage. This will make the database automatically triple-redundantly backed up with automatic fail over. This is built into Windows Azure with no extra effort on our part.
Let’s go into a bit more technical detail below. If this is not your cup of tea; scroll to the end for the summary!


How we have done it


Solution

There is much less code in this solution than you perhaps think? All we need is a hosting project which will host Neo4j in Windows Azure. It also takes care of downloading, installing and configuring Neo4j.
Apart from the tests in our solution we have (in alphabetical order from the screen shot):

  • CollectDiagnosticsData: A small project to trigger diagnostics transfer from our Cloud instance to Cloud storage. This is only used for debug purposes and is not a part of the deployed solution. The trigger is fired from a console window on your local machine when and if you want to view the logs of the application.
  • Diversify.WindowsAzure.ServiceRuntime: A general library that enhances testability in the Windows Azure SDK.
  • Neo4j.Azure.Server: The Windows Azure deployment definition project. This is the thing that is packed up and deployed to Windows Azure. It acts as a bag with configuration for the projects that make up the application.
  • Neo4jServerHost: A Windows Azure Worker Role project that hosts Neo4j.

Configuration


Having the application configuration settings separate from your code in Windows Azure is key. The way we have coded our solution is to extract all external links and configuration settings from the code and put it in the Service Definition file* of our Windows Azure Solution. When we have done that we can specify the associated configuration values in the Service Configuration file*.
This gives us the ability to, for instance, upgrade the version of Neo4j simply by replacing the zip-file in blob storage by modifying a few configuration values. No code change required.

As a general rule of thumb you want to make your Windows Azure deployments as configurable as possible to enable easy in place upgrading of your service in the future.


Installation


This is the bit that is more complex in version 1.0 than we’d like. ;~)
The installation of Neo4j involves manually uploading the artifacts of Neo4j and the JRE to Windows Azure Blob Storage before deploy. Sure it’s a fairly normal approach for this type of deployment but it can be made more accessible for a demo application such as this. Again this project is a complete and fully functioning version of Neo4j in Windows Azure but there exists no application that cannot be improved. We want the next version (2.0) to be tun-key in the sense that you should be able to download Neo4j and launch only for full function!
Please note that you can also use another approach for installation in Windows Azure which is to use a so called startup task.


Running the server


When the solution is installed we are ready to run launch Neo4j. A batch file is executed in order to launch through a standard Process.Start() operation.
There should perhaps be more to say here at launch but there really isn’t. It is this simple.
The hosting application kicks of the Neo4j server instance in Windows Azure. All of the configuration of the server is done in the installation steps prior to starting the server.


The Web administration

When the server is running, head over to http://localhost:7474/ to see the web administration:




It gives you access to the main performance measures, a data browser, a scripting console using the Gremlin graph scripting language to test out ideas, and monitoring details regarding the server.

The port on which an application is run on your local Development Emulator is dynamically set. 7474 is the default Neo4j port in the configuration files for the server. The Windows Azure hosting project will dynamically read the allocated port and set it in the config before it launches our server. In my case (Magnus) on my local dev machine the dynamic port was 5100. So for me the link http://localhost:5100/ was correct. Try that or read from the console output when you are running the demo which port your instance launches on. Fortunately the dynamic port selected by the Compute Emulator on the local machine seems to be the same over time.


How do I connect - The Neo4j REST API

The REST API to the Neo4j server is built to be self - explaining and easy to consume, normally mounted at http://localhost:7474/db/data. You can find the docs here. A basic request to the data root URI of your new Neo4j server using CURL looks like


curl -H Accept:application/json http://localhost:7474/db/data/ and gives the response
{
"node" : "http://localhost:7474/db/data/node",
"node_index" : "http://localhost:7474/db/data/index/node",
"relationship_index" : "http://localhost:7474/db/data/index/relationship",
"reference_node" : "http://localhost:7474/db/data/node/0",
"extensions_info" : "http://localhost:7474/db/data/ext",
"extensions" : {
}
}

This describes the whole database and gives you further URLs to discover indexes, the reference data node, extensions and other good information. A REST representation of the first node (without any properties) looks like:

curl http://localhost:7474/db/data/node/0

{
"outgoing_relationships" : "http://localhost:7474/db/data/node/0/relationships/out",
"data" : {
},
"traverse" : "http://localhost:7474/db/data/node/0/traverse/{returnType}",
"all_typed_relationships" : "http://localhost:7474/db/data/node/0/relationships/all/{-list|&|types}",
"property" : "http://localhost:7474/db/data/node/0/properties/{key}",
"self" : "http://localhost:7474/db/data/node/0",
"properties" : "http://localhost:7474/db/data/node/0/properties",
"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/0/relationships/out/{-list|&|types}",
"incoming_relationships" : "http://localhost:7474/db/data/node/0/relationships/in",
"extensions" : {
},
"create_relationship" : "http://localhost:7474/db/data/node/0/relationships",
"all_relationships" : "http://localhost:7474/db/data/node/0/relationships/all",
"incoming_typed_relationships" : "http://localhost:7474/db/data/node/0/relationships/in/{-list|&|types}"

In order to get started, please go over to The main Neo4j Wiki page . For the server, there is a good getting started guide or look at some of the projects using Neo4j:

What can I do with it ?

Building applications with the Neo4j Server is really easy. Either you can just use the raw REST API to insert and update your data, or use one of the bindings to Ruby, .NET, PHP and other languages to start interacting with Neo4j.

Neo4j really shines when it comes to deep traversals of your data and analysis of different aspects of your domain. The flexibility of a graph really helps in a lot of scenarios, not only social networking as in the following example.

As a small example - this is what you do to build a sample LinkedIn - like social network and execute a Shortest Path query against it and make a recommendation engine based on that (taken from Max de Marzi’s Neography Ruby bindings for the Neo4j Server). Install them with
gem install neography

A small Ruby example (let’s say in a file called linkedin.rb):
require 'rubygems'
require 'neography'

@neo = Neography::Rest.new

def create_person(name)
@neo.create_node("name" => name)
end

def make_mutual_friends(node1, node2)
@neo.create_relationship("friends", node1, node2)
@neo.create_relationship("friends", node2, node1)
end

def suggestions_for(node)
@neo.traverse(node,"nodes", {"order" => "breadth first",
"uniqueness" => "node global",
"relationships" => {"type"=> "friends", "direction" => "in"},
"return filter" => {
"language" => "javascript",
"body" => "position.length() == 2;"},
"depth" => 2})
end

johnathan = create_person('Johnathan')
mark = create_person('Mark')
phill = create_person('Phill')
mary = create_person('Mary')
luke = create_person('Luke')

make_mutual_friends(johnathan, mark)
make_mutual_friends(mark, mary)
make_mutual_friends(mark, phill)
make_mutual_friends(phill, mary)
make_mutual_friends(phill, luke)

puts "Johnathan should become friends with #{suggestions_for(johnathan).map{|n| n["data"]["name"]}.join(', ')}"


After executing this code with Ruby:
ruby linkedin.rb

You should get the resulting recommendation
Johnathan should become friends with Mary, Phill

You can of course see the increase of data in the Web dashboard at http://localhost:7474, too.

There are a number of other cool examples, for instance an IMDB simulation with recommendations against a Neo4j server instance. Enjoy!

.NET Client library

If you want to talk to a Neo4j instance from your .NET code you will of course need a client library that knows how to communicate with the REST API. There is a blog post here Neo4j .NET Client over HTTP using REST and json that discusses this concept and what would be required to create such a client library. Also there exists a library which is certainly a very good place to start if you want to communicate this way: Neo4RestNet

Note: It would be nice to teach Neo4j to use another form of communication more easily consumed by .NET code where perhaps the library pieces are more evolved. We are current looking into this and will keep you posted.

I want to play with it. Where can I get it?

Glad you like it and happy that you want to give it a spin!
If you want to look at our Windows Azure solution you only need to

  • Download the Visual Studio 2010 Neo4j Windows Azure hosting project.

If you are aiming to test run our solution either locally on your machine or in the cloud you need a few more pieces of the puzzle. (Again this is version 1.0 and it involves a few more manual steps than we’d like.)

  • Download Neo4j.
  • Download a Java Runtime Environment.
  • Upload Neo4j and JRE to Windows Azure Blob Storage (Or just use your local Development Storage Emulator) to test this on your local machine.
  • Launch the hosting project in Visual Studio.
  • Configure the solution with your own Windows Azure Storage credentials.
  • Deploy Neo4j to your Windows Azure account or hit F5 to run it in your local Development Fabric Emulator).
The source of the Service Definition files, Service Configuration files, Development Storage and Development Fabric Emulators are part of the Windows Azure Visual Studio tools project for Neo4j that you can download and install from here.

Summary


During the coding and testing of this project a few experiences are inescapable:

  • Java runs very well on Windows Azure. In fact if you are able to run your Java application on a regular Windows Server it will run on a Windows Azure instance. with a little tweaking and fiddling to make this happen, of course.
  • Fiddling with folders and paths in your Windows Azure applications to let everything find where everything else is takes some getting used to. Extracting configuration settings is an absolute must! You have to handle this well in order to do run-time configuration changes down the road.
  • It is advised to pack the JRE along side the Java application you are deploying to reduce the number of steps required to install the server application on start up.

In version 2.0 of this project we hope to make the Visual Studio Solution very much more turn-key. All you should need to do to test drive this application is to download the solution and launch it. Instantly you should have a running Neo4j server! We intend to do this by downloading the JRE and Neo4j server direct from http://neo4j.org. We will also look into securing the database files and also add multiple instances of servers collaborating together. This last bit, in Cloud-lingo, is called to “scale out”.
Another thing on our list is to make this Java server bark in a different tongue. ;~) But more about this is to come down the line.


If you do look at this project and have comments or feedback feel free to contact us @noopman and @peterneubauer. Hope you will enjoy this new and shiny toy as much as we do!

Cheers,

Magnus MÃ¥rtensson – Business Responsible Cloud @ Diversify
Peter Neubauer – VP Product Management @ Neo Technology

Magnus: As a .NET Architect and Cloud specialist I am continuously searching for new tools for my toolbox. There are enormous amounts of great tools out there – and Neo4j is one that outshines the bulk of them. Having the power of a graph database at your fingertips is a fantastic power to harness. With this easy deploy to Windows Azure graph data is no longer a stranger in the .NET field.

Peter: The Neo4j community has seen a lot of interest from the .NET developer community lately. Working with Azure as a Platform-as-a-Service hosting environment for Neo4j gives finally .NET developers the possibility to use all the great features and performance gains of Neo4j on a Microsoft-supported infrastructure. The prospect of a solid NoSQL - offering in the space of graph databases is very exciting for the project.
It has been a pleasure to work in collaboration between Diversify and Neo4j and with Microsoft on this project and we are very thankful for this opportunity to have fun with a great and unexpected technology combination.

Thursday, February 10, 2011

Neo4j 1.3 "Abisko Lampa" M02- Moving Day


Peter Neubauer


Hi everyone,

It's milestone release time again. For this time, let me just summarize what we've been up to the last two weeks, guided by the distant light of Abisko Lampa.

First of all, the community (that's YOU!) has been fantastic over the last two weeks. Thanks for helping with questions, comments and remarks making Neo4j better - keep it up!

Forwarding address: http://github.com/Neo4j

We've packed up and moved our source repositories over to GitHub, now living at https://github.com/Neo4j. While settling in to our new digs, we've re-arranged things into multi-module projects with top-level parents that make checking out and building a bit simpler.

Being sentimental, we're keeping the old Subversion repository in read-only mode, for historical tours.

Stop by for a visit on GitHub, feel free to fork away and send in pull requests. We are always thankful for contributions - and you will get a Neo4j baseball cap from us!

Chris Gioran aka DigitalStain

Re-arranging the furniture

Our components are now in the same room together, easy to grab and build if you want to work from the source. What's in that room? Neo4j 1.3.M02 now contains:

  • neo4j-kernel: core graphdb engine

  • neo4j-graph-algo: optimized graph algorithm library

  • neo4j-kernel-com: low-level communication implementations between Neo4j instances

  • neo4j-ha: high availability system

  • neo4j-lucene-index: a Lucene based index provider

  • neo4j-management: monitoring and JMX exposure facilities

  • neo4j-udc: usage data collector subsystem


  • Also, note that the version numbers of all core components now match the release - 1.3.M02 - dropping the previous prefixing of distinct component version.


    New backup library


    Taking advantage of the High Availability communication features, we've re-implemented the Neo4j backup library. Both incremental and full backups are now pretty simple, just borrow this little code snippet:

    GraphDatabaseService db = new EmbeddedGraphDatabase("db");
    OnlineBackup.from( "localhost" ).full( "backupdir" );
    //do stuff
    OnlineBackup.from( "localhost" ).incremental( "backupdir" );
    db.shutdown();

    Web administrivia

    The server's Webadmin UI has been spruced up, we've added a bit of spit-n-polish to repair some blemishes and bashed out some small dents.

    Performance improvements


    Log rotation within the kernel has been significantly sped up, which will improve overall performance especially when small transactions are executed.


    High Availability setup improvements


    There have been a number of efforts being done to wrap the HA setup of Neo4j in different environments. Andreas Ronge has done a great job exposing a simple interface in the HA scripts for Neo4j JRuby and there are now some really useful Chef and Vagrant recipes

    Upgrade to Gremlin 0.7 in the server


    Our good friend, the graph cruncher Marko Rodriguez, has a revamped Groovy-based incarnation of the Gremlin graph language - an easy way to describe graph operations on top of the Groovy scripting language. With this, you can do supercool stuff on graphs directly in the Webadmin tool, check the state of your graph, or prototype stuff.



    That's all for now, and let us know any comments, issues or thoughts you have!

    /peter