Monday, February 14, 2011

Announcing Neo4j on Windows Azure

Peter NeubauerMagnus Mårtensson

Announcing Neo4j on Windows Azure

Neo4j has a ‘j’ appended to the name. And now it is available on Windows Azure? This proves that in the most unlikely of circumstances sometimes beautiful things can emerge. Microsoft has promised Java to be a valued “first class citizen” on Windows Azure. In this blog post we will show that it is no problem at all to host a sophisticated and complex server product such as the Neo4j graph database server on Window Azure. Since Neo4j has a REST API over HTTP you can speak to this server from your regular .NET (or Java) applications, inside or outside of the cloud just as easily as you speak to Windows Azure Storage.


Intro

This first version (1.0 "JFokus") of our deployment is a bit simplified in some areas. Still it is a complete and fully functioning deploy of Neo4j to Windows Azure. We are already working on the next major release (2.0) which will be much more turn-key; just upload the application to Windows Azure and launch.
Furthermore we have serious plans to use this approach, Neo4j in Windows Azure, on a live project where we are backing a server application with complex graph calculations. We will layer spatial and social graphs in combined searches on the server side and serve condensed search results to the client applications outside of the Cloud.
This project is not a toy it’s the real deal and it runs very smoothly – Java runs with little or no hassle on Windows Azure!


If you are a .NET developer reading this post

What we have enabled for You, dear .NET developer, is to leverage a really powerful graph database and make it available in Your Windows Azure applications!


You can think of Neo4j as a high-performance schema-free graph engine with all the features of a mature and robust database. The programmer works with an object-oriented, flexible network structure rather than with strict and static tables — yet enjoys all the benefits of a fully transactional, enterprise-strength database.
The data model consists of Nodes, typed Relationships between Nodes and Key-Value pairs on both Nodes and Relationships, called Properties. This is how the Matrix characters and their relationships could look in a Neo4j data model:




How to communicate with it? It is very straight forward: Neo4j communicates using a REST based API over HTTP. This means that you can communicate with it just as easily as you can with standard Windows Azure Storage.


What we have done

The fact of the matter is that Neo4j has been running on Windows for a long time. What we have done in this project is to host it on Windows Azure. We have taken into account such things as dynamic port allocation and the subsequent version will also automatically handle storage backups. The following steps are involved in the deploy of version 1.0:

  • Upload a Java Runtime Environment (JRE) to Windows Azure Blob Storage.
  • Upload Neo4j to Windows Azure Blob Storage.
  • Upload the deployment of the Neo4j Windows Azure hosting project to Windows Azure – which will launch the install automatically.

The install will:
  • Download from Windows Azure Blob Storage to our Windows Azure server instance, and deploy, both the JRE and the Neo4j Server.
  • Configure diagnostics on the Windows Azure server instance to also include the Neo4j logs in the diagnostics collections.
  • Modify the configuration of Neo4j to listen to a run time assigned port, to point to the database storage location and to know the location of the JRE etc.

That completes the install. Next Windows Azure will launch Neo4j – and we receive MAGIC!


Brief comments

This version has a few manual deployment steps to many which we will mitigate in the subsequent versions of this project.
Diagnostics in Windows Azure could not be simpler; Neo4j logs it’s activity, as most servers do, to a configurable directory. Windows Azure is enabled to include custom directories in the standard diagnostics collections which is easily configurable on the machine at startup. This means you can reach the Neo4j diagnostics output for debugging and monitoring.

We will also store the data files of the graph database in a blob in Windows Azure Storage. This will make the database automatically triple-redundantly backed up with automatic fail over. This is built into Windows Azure with no extra effort on our part.
Let’s go into a bit more technical detail below. If this is not your cup of tea; scroll to the end for the summary!


How we have done it


Solution

There is much less code in this solution than you perhaps think? All we need is a hosting project which will host Neo4j in Windows Azure. It also takes care of downloading, installing and configuring Neo4j.
Apart from the tests in our solution we have (in alphabetical order from the screen shot):

  • CollectDiagnosticsData: A small project to trigger diagnostics transfer from our Cloud instance to Cloud storage. This is only used for debug purposes and is not a part of the deployed solution. The trigger is fired from a console window on your local machine when and if you want to view the logs of the application.
  • Diversify.WindowsAzure.ServiceRuntime: A general library that enhances testability in the Windows Azure SDK.
  • Neo4j.Azure.Server: The Windows Azure deployment definition project. This is the thing that is packed up and deployed to Windows Azure. It acts as a bag with configuration for the projects that make up the application.
  • Neo4jServerHost: A Windows Azure Worker Role project that hosts Neo4j.

Configuration


Having the application configuration settings separate from your code in Windows Azure is key. The way we have coded our solution is to extract all external links and configuration settings from the code and put it in the Service Definition file* of our Windows Azure Solution. When we have done that we can specify the associated configuration values in the Service Configuration file*.
This gives us the ability to, for instance, upgrade the version of Neo4j simply by replacing the zip-file in blob storage by modifying a few configuration values. No code change required.

As a general rule of thumb you want to make your Windows Azure deployments as configurable as possible to enable easy in place upgrading of your service in the future.


Installation


This is the bit that is more complex in version 1.0 than we’d like. ;~)
The installation of Neo4j involves manually uploading the artifacts of Neo4j and the JRE to Windows Azure Blob Storage before deploy. Sure it’s a fairly normal approach for this type of deployment but it can be made more accessible for a demo application such as this. Again this project is a complete and fully functioning version of Neo4j in Windows Azure but there exists no application that cannot be improved. We want the next version (2.0) to be tun-key in the sense that you should be able to download Neo4j and launch only for full function!
Please note that you can also use another approach for installation in Windows Azure which is to use a so called startup task.


Running the server


When the solution is installed we are ready to run launch Neo4j. A batch file is executed in order to launch through a standard Process.Start() operation.
There should perhaps be more to say here at launch but there really isn’t. It is this simple.
The hosting application kicks of the Neo4j server instance in Windows Azure. All of the configuration of the server is done in the installation steps prior to starting the server.


The Web administration

When the server is running, head over to http://localhost:7474/ to see the web administration:




It gives you access to the main performance measures, a data browser, a scripting console using the Gremlin graph scripting language to test out ideas, and monitoring details regarding the server.

The port on which an application is run on your local Development Emulator is dynamically set. 7474 is the default Neo4j port in the configuration files for the server. The Windows Azure hosting project will dynamically read the allocated port and set it in the config before it launches our server. In my case (Magnus) on my local dev machine the dynamic port was 5100. So for me the link http://localhost:5100/ was correct. Try that or read from the console output when you are running the demo which port your instance launches on. Fortunately the dynamic port selected by the Compute Emulator on the local machine seems to be the same over time.


How do I connect - The Neo4j REST API

The REST API to the Neo4j server is built to be self - explaining and easy to consume, normally mounted at http://localhost:7474/db/data. You can find the docs here. A basic request to the data root URI of your new Neo4j server using CURL looks like


curl -H Accept:application/json http://localhost:7474/db/data/ and gives the response
{
"node" : "http://localhost:7474/db/data/node",
"node_index" : "http://localhost:7474/db/data/index/node",
"relationship_index" : "http://localhost:7474/db/data/index/relationship",
"reference_node" : "http://localhost:7474/db/data/node/0",
"extensions_info" : "http://localhost:7474/db/data/ext",
"extensions" : {
}
}

This describes the whole database and gives you further URLs to discover indexes, the reference data node, extensions and other good information. A REST representation of the first node (without any properties) looks like:

curl http://localhost:7474/db/data/node/0

{
"outgoing_relationships" : "http://localhost:7474/db/data/node/0/relationships/out",
"data" : {
},
"traverse" : "http://localhost:7474/db/data/node/0/traverse/{returnType}",
"all_typed_relationships" : "http://localhost:7474/db/data/node/0/relationships/all/{-list|&|types}",
"property" : "http://localhost:7474/db/data/node/0/properties/{key}",
"self" : "http://localhost:7474/db/data/node/0",
"properties" : "http://localhost:7474/db/data/node/0/properties",
"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/0/relationships/out/{-list|&|types}",
"incoming_relationships" : "http://localhost:7474/db/data/node/0/relationships/in",
"extensions" : {
},
"create_relationship" : "http://localhost:7474/db/data/node/0/relationships",
"all_relationships" : "http://localhost:7474/db/data/node/0/relationships/all",
"incoming_typed_relationships" : "http://localhost:7474/db/data/node/0/relationships/in/{-list|&|types}"

In order to get started, please go over to The main Neo4j Wiki page . For the server, there is a good getting started guide or look at some of the projects using Neo4j:

What can I do with it ?

Building applications with the Neo4j Server is really easy. Either you can just use the raw REST API to insert and update your data, or use one of the bindings to Ruby, .NET, PHP and other languages to start interacting with Neo4j.

Neo4j really shines when it comes to deep traversals of your data and analysis of different aspects of your domain. The flexibility of a graph really helps in a lot of scenarios, not only social networking as in the following example.

As a small example - this is what you do to build a sample LinkedIn - like social network and execute a Shortest Path query against it and make a recommendation engine based on that (taken from Max de Marzi’s Neography Ruby bindings for the Neo4j Server). Install them with
gem install neography

A small Ruby example (let’s say in a file called linkedin.rb):
require 'rubygems'
require 'neography'

@neo = Neography::Rest.new

def create_person(name)
@neo.create_node("name" => name)
end

def make_mutual_friends(node1, node2)
@neo.create_relationship("friends", node1, node2)
@neo.create_relationship("friends", node2, node1)
end

def suggestions_for(node)
@neo.traverse(node,"nodes", {"order" => "breadth first",
"uniqueness" => "node global",
"relationships" => {"type"=> "friends", "direction" => "in"},
"return filter" => {
"language" => "javascript",
"body" => "position.length() == 2;"},
"depth" => 2})
end

johnathan = create_person('Johnathan')
mark = create_person('Mark')
phill = create_person('Phill')
mary = create_person('Mary')
luke = create_person('Luke')

make_mutual_friends(johnathan, mark)
make_mutual_friends(mark, mary)
make_mutual_friends(mark, phill)
make_mutual_friends(phill, mary)
make_mutual_friends(phill, luke)

puts "Johnathan should become friends with #{suggestions_for(johnathan).map{|n| n["data"]["name"]}.join(', ')}"


After executing this code with Ruby:
ruby linkedin.rb

You should get the resulting recommendation
Johnathan should become friends with Mary, Phill

You can of course see the increase of data in the Web dashboard at http://localhost:7474, too.

There are a number of other cool examples, for instance an IMDB simulation with recommendations against a Neo4j server instance. Enjoy!

.NET Client library

If you want to talk to a Neo4j instance from your .NET code you will of course need a client library that knows how to communicate with the REST API. There is a blog post here Neo4j .NET Client over HTTP using REST and json that discusses this concept and what would be required to create such a client library. Also there exists a library which is certainly a very good place to start if you want to communicate this way: Neo4RestNet

Note: It would be nice to teach Neo4j to use another form of communication more easily consumed by .NET code where perhaps the library pieces are more evolved. We are current looking into this and will keep you posted.

I want to play with it. Where can I get it?

Glad you like it and happy that you want to give it a spin!
If you want to look at our Windows Azure solution you only need to

  • Download the Visual Studio 2010 Neo4j Windows Azure hosting project.

If you are aiming to test run our solution either locally on your machine or in the cloud you need a few more pieces of the puzzle. (Again this is version 1.0 and it involves a few more manual steps than we’d like.)

  • Download Neo4j.
  • Download a Java Runtime Environment.
  • Upload Neo4j and JRE to Windows Azure Blob Storage (Or just use your local Development Storage Emulator) to test this on your local machine.
  • Launch the hosting project in Visual Studio.
  • Configure the solution with your own Windows Azure Storage credentials.
  • Deploy Neo4j to your Windows Azure account or hit F5 to run it in your local Development Fabric Emulator).
The source of the Service Definition files, Service Configuration files, Development Storage and Development Fabric Emulators are part of the Windows Azure Visual Studio tools project for Neo4j that you can download and install from here.

Summary


During the coding and testing of this project a few experiences are inescapable:

  • Java runs very well on Windows Azure. In fact if you are able to run your Java application on a regular Windows Server it will run on a Windows Azure instance. with a little tweaking and fiddling to make this happen, of course.
  • Fiddling with folders and paths in your Windows Azure applications to let everything find where everything else is takes some getting used to. Extracting configuration settings is an absolute must! You have to handle this well in order to do run-time configuration changes down the road.
  • It is advised to pack the JRE along side the Java application you are deploying to reduce the number of steps required to install the server application on start up.

In version 2.0 of this project we hope to make the Visual Studio Solution very much more turn-key. All you should need to do to test drive this application is to download the solution and launch it. Instantly you should have a running Neo4j server! We intend to do this by downloading the JRE and Neo4j server direct from http://neo4j.org. We will also look into securing the database files and also add multiple instances of servers collaborating together. This last bit, in Cloud-lingo, is called to “scale out”.
Another thing on our list is to make this Java server bark in a different tongue. ;~) But more about this is to come down the line.


If you do look at this project and have comments or feedback feel free to contact us @noopman and @peterneubauer. Hope you will enjoy this new and shiny toy as much as we do!

Cheers,

Magnus Mårtensson – Business Responsible Cloud @ Diversify
Peter Neubauer – VP Product Management @ Neo Technology

Magnus: As a .NET Architect and Cloud specialist I am continuously searching for new tools for my toolbox. There are enormous amounts of great tools out there – and Neo4j is one that outshines the bulk of them. Having the power of a graph database at your fingertips is a fantastic power to harness. With this easy deploy to Windows Azure graph data is no longer a stranger in the .NET field.

Peter: The Neo4j community has seen a lot of interest from the .NET developer community lately. Working with Azure as a Platform-as-a-Service hosting environment for Neo4j gives finally .NET developers the possibility to use all the great features and performance gains of Neo4j on a Microsoft-supported infrastructure. The prospect of a solid NoSQL - offering in the space of graph databases is very exciting for the project.
It has been a pleasure to work in collaboration between Diversify and Neo4j and with Microsoft on this project and we are very thankful for this opportunity to have fun with a great and unexpected technology combination.

20 comments:

rja said...

Wahoo! Great news!

Richard Banks said...

Excellent stuff :-)

Anonymous said...

This is incredible, when is the version 2 coming?

Peter Neubauer said...

Do not despair, we are already on it. Can't promise a date yet though. Anything special you are looking for?

Emilio said...

Hi Peter, thanks for the reply.
Simplification on deployment process is what would be great, altough not too complicated at this state we always want more =D

Peter Neubauer said...

Hi Emilio,
Next steps are indeed simplification, better control over ports and security, and Hi availability setup. Let's see when we get to it! .if you need it right now, feel free to contact me off-blog!

Magnus Mårtensson said...

I am also working on managing Windows Azure Blob Storage Drives in connection with this deplyment. That will be up as a fix within days.

Emilio said...

Does this mean that it will scale out on Azure?

If I understand right, one thing is Storage Scale (add more space) and other thing is Compute Scale (add more CPU power), am I right?

Will neo4j Support both over Windows Azure?

Magnus Mårtensson said...

@Emilio: Scale out on the cloud means to add more identical instances to increase the workload we can take on the server. In Neo4j's case it uses write master/read slaves as a Command and Query Separation Pattern to handle increased load. Also the read slaves will be "eventually consistent" if I understand it correctly. We will be looking into deploying many readers and one master writer to Azure.

As for storage one single Windows Azure Drive stored in Blob Storage can be up to 1TB in size. It'll be some time before we hit that limit. It that happens we have to shard data.

We will also use Blob Snapshot functionality to get the read slaves up and running fast.

Cheers,

Magnus

blog said...

Regarding scaling Neo4j, see this blog post: Scaling Neo4j with Cache Sharding and Neo4j HA.

Emilio said...

Thank you for the detailed answers.
Are there any plans for a Pay as you go service for Azure?

Magnus Mårtensson said...

Hey Emilio!

Not right now but there could be... if there are more takers out there?

I am also considering putting the code up on a code share and invite people who wants to try this and maybe even develop a few features on version 2.0 to come along and do that. What are your thoughts on that?

Cheers,

Magnus

Emilio said...

Hi Magnus,

Opening a code repository would be a great idea.

The Azure/neo4j combination may have little followers for now, but defintely the hype will take off soon.

Im learning all I can about neo4j for now (coming from MS world), so I hope I can contribute with something soon.

Let us know when the repo is up

Peter Neubauer said...

Guys,
we can put up a repo over at https://github.com/neo4j/, WDYT?

/peter

Emilio said...

Peter, I think its the best place to have the deployment code and related files.

It would be good to put a TODO list to know what has been done and whats missing

Anonymous said...

I have deployed Neo4j on windows Azure but not able to access it. Its working fine on localhost but when I try it on windows azure it is not accessible.
Do we need to change: Neo4j Data Uri and Neo4j Admin Uri in .cscfg file?
Can we access it through VIP: http://XXX.cloudapp.net:port
?

Vikas Yadav said...

In Production, we replaced "Neo4j Admin Uri" and "Neo4j Data Uri" with DNS which we got while deploying Neo4j.Azure.Server on Windows AZURE.
I have DNS Name as "http://vyNeo4jSvc4.cloudapp.net"
But when I tried to access it from DNS. Its not working for me. Neo4j.Azure.Server has deployed Neo4j service successfully as I got message in logs. Just one error: "Unable to locate jvm. Could not find HKLM\SOFTWARE\JavaSoft\Java Runtime Environment/CurrentVersion entry in windows registry.; TraceSource 'WaWorkerHost.exe' event
"
But still its saying service is running fine.
Please suggest what can be issue?

Tatham said...

@Vikas You need to use the service endpoints assigned by Azure, not statically configure them, else they won't get mapped out through the firewalls and load balancers.

luke said...

@Vikas I fixed the "Unable to locate jvm" error by updating my version of Java, creating a new "jre6.zip" it it and putting it on blob storage.

Anonymous said...

Thanks Great article! I also found there is a updated to the solution that works for Visual Studio 2010 and 2012. It also hooks up the CloudDrive, and upload the binaries to Azure Storage. Check it out:

http://ideanotion.net/neo4j-and-azure-deployment-improvement/