Friday, July 30, 2010

The top seven news in Neo4j 1.1

The Neo4j graph database release 1.1 has just arrived, so here's some information on the new things that have been included. The main points are the additions of monitoring support, an event framework and a new traversal framework to the kernel. Then two useful components have been added to the default distribution (called "Apoc"): graph algorithms and online backup.

1. Graph algorithms

Since the previous release, the graph algorithm component has been promoted to the default Neo4j distribution package. Here you'll find implementations of algorithms that will help you find the shortest paths, all simple paths or, if you want, all paths between two nodes. Or you can use the Dijkstra algorithm to handle weighted paths, or why not the cool A* algorithm wich is useful in a geospatial context among other uses:

astar node space

The above image is from an A* routing example project with code in Ruby or Java.

Since the previous release of the graph algorithm component, much work has been done to improve memory efficiency and speed. It now also uses the new traversal framework under the hood. As a developer, your starting point is the GraphAlgoFactory, wich provides access to the algorithms.

2. Event framework

The Neo4j 1.1 kernel includes support for a simple but powerful event framework, which allows you to hook into and react to any substantial change of the graph. For example, let's say that you have a UI widget that displays a specific property on a node. In previous releases, you'd have to manually add code to refresh the widget wherever you modify that specific node. With the 1.1 release, you instead write a simple listener that detects changes to that node and repaints the widget.

You can listen to the following events:

  • beforeCommit
  • afterCommit
  • afterRollback

This will allow you to perform actions such as:

  • read changes before commit
  • modify the transaction
  • decide if the transaction should be committed or not

The most powerful use of the event framework is likely in framework-like components that implement horizontal concerns such as validation and integration of data. You can imagine writing a simple component that automatically keeps a Neo4j IndexService up to date with the graph, so you won't have to manually maintain indices.

This is where you can hook into the database life cycle:

  • shutdown event
  • kernel panic - for instance when the disk is full

In this case, a typical use is to make sure layers on top of Neo4j are properly shut down before the database shuts down.

3. Traversal framework

To extend the possibilities for how to traverse a graph, a new traversal framework has been introduced. It's still in an early stage and has been included in the 1.1 release to gather feedback from users. Even if it's indeed already very useful, be preperad for the API to change somewhat!

One main design goal of the new traversal framework has been to increase the flexibility in how a traverser can be controlled. Examples of improvements compared to the old traversal framework are:

  • The user can select in which order relationships will be followed, which opens up for best-first traversals and fine grained traversal control, e.g. weighted traversals. Breadth-first and Depth-first traversals are just trivial examples of a global static branch selection policy implemented for convenience.
  • Paths now play a central role: the current path during traversal is exposed in the traversal context and paths can be returned as the traversal result.
  • For convenience, traversal results can be returned as nodes, relationships or paths.
  • The uniqueness constraints in a traversal now have a wide range of possibilities, like visiting nodes only once, visiting relationships only once (but possibly revisiting nodes), visiting the same nodes and relationships but in different paths and so on.
  • To reduce the memory footprint of the traversal, uniqueness constraints on nodes or relationships can be set to only guarantee uniqueness among the most recent visited nodes, with a configurable count.

There's more to it, but the above list should suffice for this blog post! Let's look at a code example to get a view of how the new traversal framework used.

As seen from this example the framework uses a fluent API:

for ( Path position : Traversal.description()
.depthFirst()
.relationships( KNOWS )
.relationships( LIKES, Direction.INCOMING )
.prune( Traversal.pruneAfterDepth( 5 ) )
.traverse( myStartNode ) )
{
System.out.println( "Path from start node to current position is " + position );
}

The traversal descriptions are immutable and can be reused to create new traversal descriptions. Here's an example of how this is done:

static final TraversalDescription FRIENDS_TRAVERSAL = Traversal.description()
.relationships( KNOWS )
.depthFirst()
.uniqueness( Uniqueness.RELATIONSHIP_GLOBAL );
// ...
// Don't go further than depth 3
for ( Path position : FRIENDS_TRAVERSAL
.prune( Traversal.pruneAfterDepth( 3 ) )
.traverse( myNode ) ) {}
// Don't go further than depth 4
for ( Path position : FRIENDS_TRAVERSAL
.prune( Traversal.pruneAfterDepth( 4 ) )
.traverse( myNode ) ) {}

Please note again that the API is likely to change before going final in the next release. We would love feedback on the new traverser framework! Just head over to the mailing list (which is a great place to hang out if you want to learn more about graph databases!) or say something on twitter.

4. Monitoring

Neo4j now supports monitoring over JMX. For example you can use a tool like JConsole to inspect what's happening in a live Neo4j instance. For example, say that your Neo4j-backed web site has been up and running for two days since the last restart. You can then go in and get statistics on how many transactions have been commited, the number of transactions open right now, the total number of transactions that have been rolled back, and LOTS more. Here's an image of the transaction information that is available:

jmx.transactions

Find out more on the wiki!

If you're using the Neo4j rest server (also see this blog post) there's a lot to look forward to with regard to monitoring. Namely, there's the new Neo4j webadmin project. At the moment it includes:

  • Lifecycle management
  • Monitoring of memory usage, disk usage, cache status and database primitives (nodes, relationships and properties)
  • JMX overview
  • Data browsing
  • Advanced data manipulation via Gremlin console
  • Server configuration
  • Online backups

Here's how the webadmin tool looks at the moment (click for bigger version):

neo4j webadmin dashboard

Currently, the webadmin tool builds on the Neo4j rest layer. In subsequent releases, it will be adapted to also work with embedded Neo4j.

5. Kernel

Much of the news in the Neo4j kernel has already been mentioned, but there's still some points to be adressed:

  • Read operations are not required to be performed inside a transaction any more. This can for example help a lot when doing traversals and gives you more options on the architecture side of things. Note however that in order to read uncommitted data, the read operations still have to be carried out in the corresponding transactional context.
  • At startup, the Neo4j kernel will look at the available amount of RAM and heap space and configure itself accordingly. In most cases there should be no more need to use a detailed configuration. If you have added a lot of data, a restart of Neo4j will let the automatic configuration catch up on what's happened and optimize the configuration.
  • At the creation of a new database, the block sizes for strings and arrays on the storage level can be configured. This settings can't be changed after the database has been created. If you know your data very well, this could be useful if the default settings doesn't cut it for you.
  • The GraphDatabaseService can now be accessed from every node and relationship, so you don't need to pass the instance around or inject it or whatevery you did before.
  • For your conveneince, a helpers package has been added to the kernel (previously it was a separate component named "commons"). The most interesting part is the collection helpers. By using them, creating a traverser that returns you domain objects instead of nodes/relationships is a breeze. There's other goodies in there as well, take a look!

6. Index

Other than bug fixes and performance optimizations, the integrated Lucene index has got some new features:

  • Improved support for removing indexes.
  • Index lookups can be performed without being in a transaction.
  • Exact lookups can be carried out (even when using a fulltext service).
  • Indexing of array values. If a value is an array it's split up and each value in that array is indexed separately.

7. Online backup

Of course you want to backup your Neo4j database while it's running, and in this release the online backup component is included in the default distribution package. This is an example of how to use it from your code:

EmbeddedGraphDatabase graphDb = getTheGraphDbFromApp();
String location = "/var/backup/neo4j-db";
// this will include the integrated lucene indexes as well
Backup backup = Neo4jBackup.allDataSources( graphDb, location );
backup.doBackup();

Conclusion

If you haven't already played with Neo4j there's now some more reasons to do so! And if you have, the fun will be even greater now!

The main starting points are:

Wednesday, June 30, 2010

Neo4j development news

The development of the Neo4j graph database speeds up more and more and it's time to track some of the news! This post will focus on some of the contributions from the core team, while I leave the wealth of contributions from other community members for a later post.

Videos

We've created a page containing videos. At the moment there's:

  • Robert Scoble interviewing Emil Eifrem
  • Getting started with Neo4j screencast
  • Getting started with Neoclipse screencast

Head over to watch the videos: neo4j.org/doc/video/

Screenshots

It's great fun to visualize graph data models, so the team collected some images on a web page. At the moment there are social networks, road networks, product/category hierarchies and some more examples. Please comment on what you'd like to see in there!

Take a look at the screenshots here: neo4j.org/doc/screenshots/

road network

Event feed and other feeds

To make it easier to keep track on upcoming Neo4j events, we've set up a feed (web version). This is intended for all Neo4j events, not only those initiated by the Neo4j team members. Just get in touch if you're going to do a Neo4j presentation/workshop/whatever.

While at it, we also created a few other feeds, found at: neo4j.org/community/feeds/. At the moment you'll find for example Neo4j questions and commits from contributing projects. Expect more feeds to be added there soon!

The new traversal framework

The new traversal framwork has been integrated into the kernel, and is ready to try out. The best starting point at the moment is the apidocs, but there's also some information on the wiki. You can read up on the mailing list discussions on this topic, and this thread too.

The new event framework

Neo4j is getting a new event framework as well, which is documented on the wiki and in the apidocs. There's been quite some mailing list discussions on this topic.

REST API news

General information on the REST API is found at the component site and in the wiki.

Monitoring: JMX support

Neo4j now has JMX support, so that you can connect to it using JConsole:

jconsole

It exposes for example cache sizes:

cache sizes

More information in the Monitoring and Deployment wiki page.

Graph algorithms

Design and performance of the graph-algo component has been improved. Read up on it on its component site.

The component contains implementations of common graph algoritms like shortest paths, all paths, all simple paths, Dijkstra and A* etc.

Indexing

You can now index array properties as well, where every item in the array will be indexed.

Configurable block size

In special cases it may be useful to create a Neo4j database with other block size settings than the defaults. Head over to the configuration wiki page for the details. Note that these settings can only be applied at database creation.

Friday, May 7, 2010

Mashups with the Facebook Graph API and Neo4j

In case you didn't notice already, graph databases like Neo4j are hot nowadays. People ask questions, write about them, also in the contexts of NOSQL and RDF. Recently Twitter open sourced their graphdb implementation, targeted at shallow, distributed graphs. And then Facebook revealed their new Graph API using the Open Graph Protocol.

Today, we're going to show you how easy it is to use the Facebook Graph API to mash up data from Facebook with data in a locally hosted graph database!

It's movie time!

Let's say you want to see a movie with one of your friends. Wouldn't it be neat with a service that uses the Facebook social graph to collect movies your friend liked, and combines this with IMDB data to produce a movie suggestion? Turns out that an app like that is pretty straight forward with a graph database.

The first step is to connect to Facebook to fetch a list of your friends, so that's where the app will start out:


Next a list of your friends will show up:

Now, just click one of your friends and a movie suggestion will be generated:

Under the Hood

What we need to do is simply to let our mashup talk to both the Facebook Graph API and the IMDB API. Uh-oh - IMDB doesn't have a public API that you can throw requests at. Well, that's simple enough: we'll just import the data into a local Neo4j graph database and then access it through the Facebook Graph API!

So, let's see how to solve this. Here's the basic structure of our app:

MovieNight.js is the mashup itself, embedded in the web page. It uses the Facebook Graph API to get information about the friends of the visitor and the movies that your friends like. SuggestionEngine.js uses the Graph API to talk to a Neo4j database containing movie information (a small example data set from IMDB). The movie suggestion is based on what movies your friend has liked in the past. It simply tries to find other movies starring some actor from the liked ones.

Using the same Graph API to connect to both Facebook and the Neo4j graph database backend makes for convenience: it means that you can use tools written for Facebook for locally hosted data as well - and that's what we're doing here. To download the source, go to the download page.

Facebook data

To get your friends from Facebook, just use the common Facebook graph API:

FB.api('/me/friends', function(response) {
friends = response.data;
// Load friends into UI
friend_list.empty();
for ( var i = 0; i < friends.length; i ++ ) {
add_friend( friends[i] ); // write to UI
}
});

Getting the movies a friend likes is very similar to getting the friends list:

FB.api("/" + friend.id + "/movies", function(result)
{
/* handle the response here */
}

For more information, see the Graph API documentation.

Neo4j data

To connect to the Neo4j graph server we had to hack the connect-js library slightly, as it's hard coded to send requests to facebook.com. What we added is the possibility to add prefixes for different data sources. It still defaults to graph.facebook.com etc., but makes a "fb:" prefix available to make your code easier to read. To hook in a data source, we modify the FB.init() call like this:

FB.init({
appId : '', // NOTE: create an appid and add it here
status : true, cookie : true, xfbml : true,
// time to add our IMDB backend to the mix
external_domains : {
imdb : 'http://localhost:4567/'
}
});

Now we're able to send reqests to our own server as well, using code similar to the following:

FB.api("imdb:/path/to/data/in/graph", function(data) {
// data is available here :)
});

So now that we can send requests, what can we do with the Neo4j backend here? Here's a comprehensive list showing precisely that in some detail (all requests are GET from http://localhost:4567):

Get Actor (or Movie) by Id
RequestResponse
/56
{
"name": "Bacon, Kevin",
"id": 56
}
Extended information about Actor(/Movie)
RequestResponse
/56?metadata=1
{
"name": "Bacon, Kevin",
"id": 56,
"metadata": {
"connections": "http://localhost:4567/56/acted_in"
},
"type": "actor"
}
All the Movies an Actor had a Role in
RequestResponse
/56/acted_in
{
"data": [
{
"id": 57,
"title": "Woodsman, The (2004)"
},
{
"id": 59,
"title": "Wild Things (1998)"
}
// tons of movies here ...
]
}
Get (Actor or) Movie by Id
RequestResponse
/59
{
"title": "Wild Things (1998)",
"year": "1998",
"id": 59
}
Extended information about (Actor/)Movie
RequestResponse
/59?metadata=1
{
"title": "Wild Things (1998)",
"year": "1998",
"id": 59,
"metadata": {
"connections": "http://localhost:4567/59/actors"
},
"type": "movie"
}
All the Actors that have a Role in this Movie
RequestResponse
/59/actors
{
"data": [
{
"id": 56,
"name": "Bacon, Kevin"
},
{
"id": 528,
"name": "Dillon, Matt (I)"
}
// loads of actors here ...
]
}
Search for Actors with "bacon" in their name
RequestResponse
/search?q=bacon&type=actor
[
{
"name": "Bacon, Kevin",
"id": 56
},
{
"name": "Bacon, Travis",
"id": 14242
}
// more bacons here ...
]
Search for Movies with "wild things" in their title
RequestResponse
/search?q=wild%20things&type=movie
[
{
"title": "Wild Things (1998)",
"year": "1998",
"id": 59
},
{
"title": "River Wild, The (1994)",
"year": "1994",
"id": 74
}
// more wild movies here ...
]

Ok, but how do we use this stuff then?! Well, that's what we're going to look into right away, to see the Facebook Graph API used from JavaScript with a Neo4j/IMDB backend. To get started, here's how to perform a search:

self.movie_info = function( movie_name, callback ) {
// The search API uses commas for AND-type searches, spaces become OR, so for
// the movie names, we switch spaces out for commas.
movie_name = movie_name.replace(/ /g, ",");
FB.api("imdb:/search", {type:'movie', q:movie_name }, callback );
};

The request to get the movies an actor has acted in goes like this:

FB.api("imdb:/" + actor.id + "/acted_in", function( result ) {
for (var i = 0; i < result.data.length; i++)
{
movie = result.data[i];
// do something with the movie here!
}
});

To get all actors in a movie, simply use the following request:

FB.api("imdb:/" + movie.id + "/actors", function(result) {
for (var i = 0; i < result.data.length; i++)
{
actor = result.data[i];
// do something with the actor here!
}
});

Actually, these three different requests are all our small suggestion engine needs to fullfill it's task. Have a look at SuggestionEngine.js to see the full code.

How to create a Graph API service on top of Neo4j

Let's take a closer look at the movie backend now. It's built using the Neo4j Ruby bindings. In our example data set we have Actors and Movies connected through Roles, here's how these look in Ruby code:

class Movie; end

class Role
include Neo4j::RelationshipMixin
property :title, :character
end

class Actor
include Neo4j::NodeMixin
property :name
has_n(:acted_in).to(Movie).relationship(Role)
index :name, :tokenized => true
end

class Movie
include Neo4j::NodeMixin
property :title
property :year
index :title, :tokenized => true

# defines a method for traversing incoming acted_in relationships from Actor
has_n(:actors).from(Actor, :acted_in)
end

The code above is from the backend/model.rb file. On the Neo4j level, this is the kind of structure we'll have:

By defining indexes on Actor and Movie we can later use the find method on the classes to perform searches.

Our next step is to expose this model over the Graph API, where we'll use Sinatra and WEBrick to do the heavy lifting. The application is defined in the backend/neo4j_app.rb file - we'll dive into portions of that code right here. To begin with, how to return data for an Actor or Movie by Id?

get '/:id' do # show a node
content_type 'text/javascript'
node = node_by_id(params[:id])
props = external_props_for(node)
props.merge! metadata_for(node) if params[:metadata] == "1"
json = JSON.pretty_generate(props)
json = callback_wrapper(json, params[:callback])
json
end

The Sinatra route above uses a few small utility functions, let's look into them as well. The first one is very simple, but useful if we want to extend the URIs to allow for requesting for example /{moviename}/actors and not only numeric IDs.

def node_by_id(id)
node = Neo4j.load_node(id) if id =~ /^(\d+)$/
halt 404 if node.nil?
node
end

The next function returns the properties of a node, while filtering out those that have a name starting with a "_" character. It also adds the node id to the result.

def external_props_for(node)
ext_props = node.props.delete_if{|key, value| key =~ /^_/}
ext_props[:id] = node.neo_id
ext_props
end

Then there's a function that gathers metadata for a node, including a link to the list of connections to other nodes, and the type of the node.

def metadata_for(node)
if node.kind_of? Actor
connections = url_for(node, "acted_in")
elsif node.kind_of? Movie
connections = url_for(node, "actors")
end
metadata = { :metadata => { :connections => connections }, :type => node.class.name.downcase }
end

There's a couple more utility functions, but we'll skip them here as they are unrelated to Neo4j.

Next up is getting the relationships from an Actor or Movie. The code will only care about valid paths, that is, paths having /acted_in or /actors in the end. In other cases, an empty data set is returned. Other than that, it simply delegates the work to the domain classes, by doing node.send(relationship) to get the relationships. Using the send method in Ruby will here equal the statements node.acted_in or node.actors.

get '/:id/:relation' do # show a relationship
content_type 'text/javascript'
node = node_by_id(params[:id])
data = []
[ :acted_in, :actors ].each do |relationship|
if params[:relation] == relationship.to_s and node.respond_to? relationship
data = node.send(relationship)
end
end
data = data.map{|node| node_data(node)}
json = JSON.pretty_generate({:data => data})
json = callback_wrapper(json, params[:callback])
json
end

When viewing the relationships, we only want to show the most basic node info, so there's a utility function to do that as well:

def node_data(node)
data = { :id => node.neo_id }
[ :name, :title ].each do |property|
data.merge!({ property => node[property] }) unless node[property].nil?
end
data
end

Performing the searches are basically handled by adding indexes to the model (see the code further above). So what's left to do in the application is some sanity checks, delegating the search to the model and finally to format the output properly. Here goes:

get '/search' do
content_type 'text/javascript'
q = params[:q]
type = params[:type]
halt 400 unless q && type
result = case type
when 'actor'
Actor.find(to_lucene(:name, q))
when 'movie'
Movie.find(to_lucene(:title, q))
else
[]
end
json = JSON.pretty_generate(result.map{|node| external_props_for(node)})
json = callback_wrapper(json, params[:callback])
json
end

Wrap up

Here's some major takeaways from this post:

  • Graphs are going mainstream, as evidenced by initiatives like the Facebook Graph API.
  • It's often convenient to look at your data in the form of a graph, and with recent support in graph databases like Neo4j, it's easy to use different data sources in tandem through the Graph API.
  • Exposing data through the Graph API is simple if you have a graphdb backend.

And once you put your data in a graphdb, you can of course do more advanced graphy things too, like finding shortest paths, routing with A*, modeling of complex domains and whatnot. Just get started!

Example source code

To get the source code of the example, go to the download page.

Credits

Here's the guys who wrote the code of the example:

Tuesday, April 13, 2010

The Neo4j REST Server - Part1: Get it going!

Introduction

As requested and wished by many, finally Neo4j got its own standalone server mode, based on interaction via REST. The code is still very fresh and not thoroughly tested, but I thought I might write up some first documentation on it, based on the Getting Started with REST Wiki page

Installation

The first version of the distribution can be downloaded from here: zip, tar.gz. After unpacking, you just go to the unpacked directory and run (on OSX/Linux - see the wiki entry for details on Windows)
$ ./bin/neo4j-rest start
which will start the Neo4j REST server at port 9999 and put the database files under a directory neo4j-rest-db/ (lazily with the first request). Now, let's point our browser (not Internet Explorer since it doesn't send any useful Accept-headers and will get JSON back, this will be fixed later) to http://localhost:9999 and we will see the following:



Things seem to be running! The reason for the HTML interface is the Browser sending Accept: text/html. Now, setting the Accept to application/json will produce
peterneubauer$ curl -H Accept:application/json -H Content-Type:application/json -v http://localhost:9999
* About to connect() to localhost port 9999 (#0)
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 9999 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.19.7 (i386-apple-darwin10.2.0) libcurl/7.19.7 zlib/1.2.3
> Host: localhost:9999
> Accept:application/json
> Content-Type:application/json
<
* Connection #0 to host localhost left intact
* Closing connection #0
{
"index":"http://localhost:9999/index",
"node":"http://localhost:9999/node",
"reference node":"http://localhost:9999/node/0"
}

Now, with "200 OK" this is a good starting point. We can see full references to the interesting starting points -the reference node and the index subsystem. Let's check out the reference node:
peterneubauer$ curl -H Accept:application/json -H Content-Type:application/json -v http://localhost:9999/node/0
* About to connect() to localhost port 9999 (#0)
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 9999 (#0)
> GET /node/0 HTTP/1.1
> User-Agent: curl/7.19.7 (i386-apple-darwin10.2.0) libcurl/7.19.7 zlib/1.2.3
> Host: localhost:9999
> Accept:application/json
> Content-Type:application/json
>
{
"incoming typed relationships":"http://localhost:9999/node/0/relationships/in/{-list|&|types}",
"incoming relationships":"http://localhost:9999/node/0/relationships/in",
"all relationships":"http://localhost:9999/node/0/relationships/all",
"create relationship":"http://localhost:9999/node/0/relationships",
"data":{},
"traverse":"http://localhost:9999/node/0/traverse/{returnType}",
"property":"http://localhost:9999/node/0/properties/{key}",
"self":"http://localhost:9999/node/0",
"properties":"http://localhost:9999/node/0/properties",
"all typed relationships":"http://localhost:9999/node/0/relationships/all/{-list|&|types}",
"outgoing typed relationships":"http://localhost:9999/node/0/relationships/out/{-list|&|types}",
"outgoing relationships":"http://localhost:9999/node/0/relationships/out"
}
Which gives us some info about what the Node 0 can do, how to get its relationships and properties and the syntax of how to construct queries for getting properties, creating relationships etc.

Insert some data

According to RESTful thinking, data creation is handled be POST, updates by PUT. Let's insert a node:
peterneubauer$ curl -X POST -H Accept:application/json -v localhost:9999/node
* About to connect() to localhost port 9999 (#0)
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 9999 (#0)
> POST /node HTTP/1.1
> User-Agent: curl/7.19.7 (i386-apple-darwin10.2.0) libcurl/7.19.7 zlib/1.2.3
> Host: localhost:9999
> Accept:application/json
>
{
...
"self":"http://localhost:9999/node/1",
"data":{},
...
}
Resulting in a new node with the URL localhost:9999/node/1 (described by the "self" property in the JSON representation) and no properties set ("data":{}). The Neo4j REST API is really trying to be explicit about possible further destinations, making it self-describing even for new users, and of course abstracting away the server instance in the future. This makes dealing with multiple Neo4j servers easier in the future. We can see the URIs for traversing, listing properties and relationships. The PUT semantics on properties work like for nodes.
We delete the node again with
curl -X DELETE  -v localhost:9999/node/1

and get 204 - No Content back. The Node is gone and will give a 404 - Not Found if we try to GET it again.

The Matrix

Now with properties encoded in JSON we can easily start to create our little Matrix example:



In order to create relationships, we do a POST on the originating Node and post the relationship data along with the request (escaping the whitespaces and others special characters):
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"Mr. Andersson"}' -v localhost:9999/node
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"Morpheus"}' -v localhost:9999/node
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"Trinity"}' -v localhost:9999/node
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"Cypher"}' -v localhost:9999/node
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"Agent Smith"}' -v localhost:9999/node
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"name":"The Architect"}' -v localhost:9999/node

Getting http://localhost:9999/node/1, http://localhost:9999/node/2, http://localhost:9999/node/3 as the new URIs back. Now, we can connect the persons (escaping ruining readability a bit ...):
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/1","type":"ROOT"}' -v http://localhost:9999/node/0/relationships
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/2","type":"KNOWS"}' -v http://localhost:9999/node/1/relationships
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/3","type":"KNOWS"}' -v http://localhost:9999/node/2/relationships
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/4","type":"KNOWS"}' -v http://localhost:9999/node/2/relationships
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/5","type":"KNOWS"}' -v http://localhost:9999/node/4/relationships
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/6","type":"CODED BY"}' -v http://localhost:9999/node/5/relationships
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"to":"http://localhost:9999/node/1","type":"LOVES"}' -v http://localhost:9999/node/3/relationships

Now, pointing our browser at http://localhost:9999/node/3/relationships/all will list all relationships of Trinity:



Our first traversal

To start with, the Neo4j default Traverser framework (updated to be more powerful than the current) is supported in REST, and other implementations like Gremlin and Pipes to follow. The documentation on the traversals is in the making here. There are a number of different parameters:
http://localhost:9999/node/3/traverse/node specifies a return type of "node", returning node references. There are other return types such as relationship, position and path returning other interesting info respective. The Traverser description is pluggable and has default values - a full description looks like
{
"order": "depth first",
"uniqueness": "node path",
"relationships": [
{ "type": "KNOWS", "direction": "out" },
{ "type": "LOVES" }
],
"prune evaluator": {
"language", "javascript",
"body", "position.node().getProperty('date')>1234567;"
},
"return filter": {
"language": "builtin",
"name", "all"
},
"max depth": 2
}

To note here is the pluggable description of the "return filter" (what to include in the return) and "prune evaluator" (where to stop traversing). Right now only JavaScript is supported for writing these more complicated constructs up, but other languages are coming. Very cool. To finish, let's get all the nodes at depth 1 from Trinity via trivial traversal:
curl -X POST -H Accept:application/json -H Content-Type:application/json -d '{"order":"breadth first"}' -v http://localhost:9999/node/3/traverse/node

Which just returns all nodes of all relationships types at depth one (default) as a JSON Array of node descriptions as above, in this case http://localhost:9999/node/1 and http://localhost:9999/node/2.

Summary

Having the Neo4j REST API and with it the Neo4j REST Server coming along is great news for all that want to use a graph database over the network, especially PHP or .NET clients that have no good Java bindings. Already a first client wrapper for .NET by Magnus MÃ¥rtensson from Jayway is underway, and a first PHP client is on Al James' GIThub.
This will even pave the way for higher-level sharding and distribution scenarios and can be used in many other ways. Stay tuned for a deeper explanation of the different traversal possibilities with Neo4j and REST in a next post!

Tuesday, March 23, 2010

CTO of Amazon: "Neo4j absolutely ROCKS!"

There was a NOSQL smackdown at the South by Southwest conference hosted by the changelog. It's a fun 45 minute clip with lots of good-humored banter between representatives from Cassandra, MongoDB, CouchDB and SimpleDB.

The latter is represented by none other than Werner Vogels, the CTO of Amazon, godfather of Dynamo (from which all current key-value stores are inspired) and probably the top expert in the world at building large-scale distributed systems.

The best part is at the end (around minute 45), when the participants were asked to name their favorite nosql systems, except their own. Werner Vogels concluded with saying:

"For anything with multiple relationships, multiple connections, Neo4j absolutely ROCKS!"

A statement we wholeheartedly agree with. :) Listen to the entire clip here.

Modeling categories in a graph database

Storing hierarchical data can be a pain when using the wrong tools. However, the Neo4j open source graph database is a good fit to this kind of problems, and this post will show you an example of how it can be used. To top it off, today it's time to have a look at the Neo4j Python language bindings as well.

Introduction

A little background info for newcomers: Neo4j stores data as nodes and relationships, with key/value style properties on both. Relationships connect two different nodes to each other, and are typed and directed. Relationships can be traversed in both directions (the direction can also be ignored when traversing if you like). You can create any relationship types; they are identified by their name.

For a quick introduction to the Neo4j Python bindings, have a look at the Neo4j.py component site. There's also slides and video from a PyCon 2010 presentation by Tobias Ivarsson of the Neo4j team, who also contributed the Python code for this blog post.

This blog post only contains simplified snippets of code, to get full working source code - which exposes a domain layer on top of the underlying graph data - go to:

If you take a look at a site like stackoverflow.com you will find many questions on how to store categories or, generally speaking, hierarchies in a database. In this blog post, we're going to look at how to implement something like what's asked for here using Neo4j. However, using a graph database will allow us to bring the concept a bit further.

Data model

It may come as a surprise to some readers, but even though we're using a graph database here, we'll use a common Entity-Relationship Diagram. The entities we want to handle in this case are categories and products. The products holds attribute values, and we want to be able to define types and constraints on these attributes. The attributes that products can hold are defined on categories and inherited to all descendants. Products, categories and attribute types are modeled as entites, while the attributes have been modeled as relationships in this case. Categories may contain subcategories and products. So this is the data model we end up with:

What can't be expressed nicely in the ER-Diagram are the attribute values, as the actual names of those attributes are defined as data elsewhere in the model. This mix of metadata and data may be a problem when using other underlying data models, but for a graph database, this is actually how it's supposed to be used. When using an RDBMS with it's underlying tabular model, the Entity-Attribute-Value model is a commonly suggested way of dealing with the data/metadata split. However, this solution comes with some downsides and hurts performance a lot.

That was it for the theoretical part, let's get on to the practical stuff!

Node space

What we want to do is to transfer the data model to the node space - that's Neo4j lingo for a graph database instance, as it consists of nodes and relationship between nodes. What we'll do now is to simply convert some of the terminology from the Entity-Relationship model to the Neo4j API:

ER-modelNeo4j
EntityNode
RelationshipRelationship
AttributeProperty

That wasn't too hard, was it?! Let's put some example data in the model and have a look at it (click for big image):

The image above gives an overview; the rest of the post will get into implementation details and good practices that can be useful.

Getting to the details

When a new Neo4j database is created, it already contains one single node, known as the reference node. This node can be used as a main entry point to the graph, and next we'll show a useful pattern for this.

In most real applications you'll want multiple entry points to the graph, and this can be done by creating subreference nodes. A subreference node is a node that is connected to the reference node with a special relationship type, indicating it's role. In this case, we're interested in having a relationship to the category root and one to the attribute types. So this is how the subreference structure looks in the node space:

Now someone may ask: Hey, shouldn't the products have a subreference node as well?! But, for two reasons, I don't think so:

  1. It's redundant as we can find them by traversing from the category root.
  2. If we want to find a single product, it's more useful to index them on a property, like their name. We'll save that one for another blog post, though.
Note that when using a graph database, the graph structure lends itself well to indexing.

As the subreference node pattern is such a nice thing, we added it to the utilities. The node is lazily created the first time it's requested. Here's whats needed to create an ATTRIBUTE_ROOT typed subreference node:

import neo4j
from neo4j.util import Subreference
attribute_subref_node = Subreference.Node.ATTRIBUTE_ROOT(graphdb)

... where graphdb is the current Neo4j instance. Note that the subreference node itself doesn't have a "node type", but is implicitly given a type by the ATTRIBUTE_ROOT typed relationship leading to the node.

The next thing we need to take care of, is connecting all attribute type nodes properly with the subreference node. This is simply done like this:

attribute_subref_node.ATTRIBUTE_TYPE(new_attribute_type_node)

Always doing like this when adding a new attribute type makes the nodes easily discoverable from the ATTRIBUTE_ROOT subreference node:

Similarly, we want to have a subreference node for categories, and in this case we also want to add a property to the subreference node. Here's how this looks in Python code:

category_subref_node = Subreference.Node.CATEGORY_ROOT(graphdb, Name="Products")

This is how it will look after we added the first actual category, namely the "Electronics" one:

No let's see how to add subcategories. Basically, this is what's needed to create a subcategory in the node space, using the SUBCATEGORY relationship type:

computers_node = graphdb.node(Name="Computers")
electronics_node.SUBCATEGORY(computers_node)

To fetch all the direct subcategories under a category and print their names, all we have to do is to fetch the relationships of the corresponding type and use the node at the end of the relationship, just like this:

for rel in category_node.SUBCATEGORY.outgoing:
print rel.end['Name']

There's not much to say regarding products, the product nodes are simply connected to one category node using a PRODUCT relationship:

But how to get all products in a category, including all it's subcategories? Here it's time to use a traverser, defined by the following code

class SubCategoryProducts(neo4j.Traversal):
types = [neo4j.Outgoing.SUBCATEGORY, neo4j.Outgoing.PRODUCT]
def isReturnable(self, pos):
if pos.is_start: return False
return pos.last_relationship.type == 'PRODUCT'

This traverser will follow outgoing relationships for both SUBCATEGORY and PRODUCT type relationships. It will filter out the starting node and only return nodes reached over a PRODUCT relationship. This is then how to use it:

for prod in SubCategoryProducts(category_node):
print prod['Name']

At the core of our example is the way it adds attribute definitions to the categories. Attributes are modeled as relationships between a category and an attribute type node. The attribute type node holds information on the type - in our case only a name and a unit - while the relationship holds the name, a "required" flag and in some cases a default value as well. From the viewpoint of a single category, this is how it is connected to attribute types, thus defining the attributes that can be used by products down that path in the category tree:

Our last code sample will show how to fetch all attribute definitions which apply to a product. Here we'll define a traverser named categories which will find all categories for a product. The traverser is used by the attributes function, which will yield all the ATTRIBUTE relationship. A simple example of usage is also included in the code:

def attributes(product_node):
"""Usage:
for attr in attributes(product):
print attr['Name'], " of type ", attr.end['Name']
"""
for category in categories(product_node):
for attr in category.ATTRIBUTE:
yield attr

class categories(neo4j.Traversal):
types = [neo4j.Incoming.PRODUCT, neo4j.Incoming.SUBCATEGORY]
def isReturnable(self, pos):
return not pos.is_start

Let's have a final look at the attribute types. Seen from the viewpoint of an attribute type node things look this way:

As the image above shows, it's really simple to find out which attributes (or categories) are using a specific attribute type. This is typical when working with a graph database: connect the nodes according to your data model, and you'll be fine.

Wrap up

Hopefully you had some fun diving into a bit of graph database thinking! These should probably be your next stops on the way forward:

Friday, March 19, 2010

Neo4j meetup at Twitter HQ Wed Mar 31 (NEW DATE & TIME)

Update: NEW DATE & TIME! The meetup has been moved to Wed Mar 31 because the speaker (Emil) is unable to speak. (And what good is a meetup if you're unable to speak?) For more info, see below. Sorry for the late notice and hope you can make it on Wednesday instead!

If you are in the San Francisco bay area, you should mark down next Thu Mar 25th Wed Mar 31st in your calender. Why? Because Twitter is hosting a Neo4j meetup! So if you're interested in NOSQL, graph databases, Neo4j or beer (or all of the above) then please register and join us at the Twitter HQ in San Francisco.

The meetup will start with pizza at 6.30pm and then we get going with a Neo4j presentation at 7pm, followed by beer, discussions and code in random order.

When: Thursday, Mar 25 2010 at 6.45pm Wednesday, Mar 31 2010 at 6.30pm
Where: Twitter HQ at 795 Folsom ave, 6th floor, San Francisco

Please register here, then bring your laptop and an apetite for graphs. Be there or be [ ]!

Thursday, February 25, 2010

Access control lists the graph database way

In many contexts you need to handle user permissions to access, create or change some kind of resources. A common example is a file system, and that's what we are going to dive into in this blog post. We're going to use Ruby bindings for the Neo4j graph database to create a small - but working - example application.

Preparation

To set up the environment for this example on Ubuntu, I used the following commands:

sudo apt-get install jruby
sudo jruby -S gem install neo4j

To import the libraries, the following code was used:

require 'rubygems'
require 'neo4j'
require 'neo4j/extensions/find_path'

Heading for the node space

So user permissions, what are they all about? Obviously it's about users, and usually user groups as well. We'll abstract this away a bit and use the term principals, which can be single users or groups.

The other side of user permissions are the resources which are to be protected. In our case we'll have a file system, so there will be folders and files. Here we'll use the term content.

Let's start out building a graph to support the application from what we have gathered so far! When working with a graph it's beneficial to think in a graphy manner, so that's where we'll begin. Graphs are presumably about connecting things, so our first step is to create some relationships. Neo4j comes with a built-in reference node, which is easily accessible at all times. We use this to create our own "subreference nodes", one for principals and one for content. This is how our graph looks so far:

To create (and get) the subreference nodes, we use this function:

def get_or_create_sub_ref( name )
result = Neo4j.ref_node.rels.outgoing( name ).nodes.first
if ( result.nil? )
result = Neo4j::Node.new :name => name.to_s.capitalize.gsub("_", " ")
Neo4j.ref_node.rels.outgoing( name ) << result
end
return result
end

This function is then called whenever we need to use a subreference node. The important parts here are:

  • ref_node: the built-in reference node
  • rels: relationships connected to a node
  • outgoing: the direction of the relationship (the relationships are always directed, but you can choose to ignore the direction in traversals)
  • ( name ): the type of relationships to follow (the type can be ignored in traversals as well, but in our case we want to use it)
  • nodes: the nodes in the other end of the relationships
  • first: the first node found - there sould only be one subreference node of each type

If the subreference node isn't found, it will be created and connected to the reference node. As you can see, we're adding a property with the key name to the nodes as well, which is there solely for the purpose of visualization (the images in this post are created using Neoclipse).

Basic structure

For the principals part, we are going to connect the top-level ones to the corresponding subreference node using a PRINCIPAL type of relationship. Other than that, there's just users and groups, so let's use a IS_MEMBER_OF_GROUP relationship type to encode that. This is how that looks in the graph:

And here's the code to create it:

def new_principal( name, member_of_groups = [] )
principal = Neo4j::Node.new
principal[ :name ] = name
if member_of_groups.empty?
get_or_create_sub_ref( :PRINCIPALS ).rels.outgoing( :PRINCIPAL ) << principal
else
for group in member_of_groups
principal.rels.outgoing( :IS_MEMBER_OF_GROUP ) << group
end
end
return principal
end

If a new principal isn't member of any groups, it's added as a top-level principal, connected to the principals subrefererence node. In other case, it's simply added to the groups.

With Neo4j all operations on the graph have to be encapsulated in a transaction, so this is how we'll call the above function:

Neo4j::Transaction.run do
all_principals = new_principal( "All principals" )
root = new_principal( "root", [ all_principals ] )
regular_users = new_principal( "Regular users", [ all_principals ] )
user1 = new_principal( "user1", [ regular_users ] )
user2 = new_principal( "user2", [ regular_users ] )
end

For the content part, things are very similar to the principals part. The main difference is that in this case, an item can have only a single parent item. Here's the graphical view on that:

And this is the code to create the structure:

def new_content( name, parent = nil )
content = Neo4j::Node.new
content[ :name ] = name
if ( parent.nil? )
get_or_create_sub_ref( :CONTENT_ROOTS ).rels.outgoing( :CONTENT_ROOT ) << content
else
parent.rels.outgoing( :HAS_CHILD_CONTENT ) << content
end
return content
end

Similar to how the principals were created, this is the code to create the content data:

Neo4j::Transaction.run do
root_folder = new_content( "Root folder" )
temp_folder = new_content( "Temp", root_folder )
home_folder = new_content( "Home", root_folder )
user1_home_folder = new_content( "user1 home", home_folder )
user2_home_folder = new_content( "user2 home", home_folder )
a_file = new_content( "MyFile.pdf", user1_home_folder )
end

At the core

Now that we have the basic structure in place, what's left regarding our data is a small but crucial part: the permissions information! We're using a simple scheme: adding security relationships with optional boolean flags for read and write permission. Not much to say here, this is what we want the full graph to look like (click for a bigger version):

A small function will help us add the security information:

def apply_security( content, principal, map_with_flags )
security_relationship = Neo4j::Relationship.new( :SECURITY, principal, content )
map_with_flags.each_pair {|key, value| security_relationship[ key ] = value}
end

It's time to add the security data:

Neo4j::Transaction.run do
apply_security( root_folder, root, { "w" => true } )
apply_security( root_folder, all_principals, { "r" => true } )
apply_security( temp_folder, all_principals, { "w" => true } )
apply_security( user1_home_folder, regular_users, { "r" => false, "w" => false } )
apply_security( user1_home_folder, user1, { "r" => true, "w" => true } )
apply_security( user2_home_folder, user2, { "r" => true, "w" => true } )
end

To check the permission for some action by an actual principal for some content, there's some work to do. This is the algorithm we use to retrieve a permission flag:

  1. Move from the content node and upwards through the file system structure and investigate each level for permission information.
  2. On each level, see if there are any principals related to or identical with the principal concerned.
  3. Make sure to use the permission information from the principal closest to the principal concerned.
  4. If permission information was found, return it; otherwise, continue traversing to the next level in the file system.

In the code for this, we'll use a function named depth_of_principal() to calculate the distance between the principal we have traversed to and the principal concerned. More on that later, here's the code to check the permissions:

def has_access( content, principal, flag )
for current_content in content.incoming( :HAS_CHILD_CONTENT ).depth( :all )
lowest_score = nil
lowest_modifier = nil
for rel in current_content.rels.incoming( :SECURITY )
rel_principal = rel.start_node
if !rel[ flag ].nil?
score = depth_of_principal( rel_principal, principal )
if !score.nil?
modifier = rel[ flag ]
if lowest_score.nil? || score < lowest_score ||
( score == lowest_score && modifier )
lowest_score = score
lowest_modifier = modifier
end
end
end
end
if !lowest_modifier.nil?
return lowest_modifier
end
end
return false
end

Here's our function to check the distance between principals (and to see if they're on the same path at all).

def depth_of_principal( principal, reference_principal )
result = reference_principal.outgoing( :IS_MEMBER_OF_GROUP ).depth( :all ).path_to( principal )
return result.nil? ? nil : result.size
end

Finally, we want to see that everything works, so here's a utility function to print permission information:


def print_has_access( content, principal, flag )
print principal[ :name ] + " +" + flag.upcase + " access to " + content[ :name ] + "? " +
has_access( content, principal, flag ).to_s + "\n"
end

And here's how to use the function:

Neo4j::Transaction.run do
print_has_access( home_folder, root, "w" )
print_has_access( home_folder, user1, "w" )
print_has_access( a_file, root, "r" )
print_has_access( a_file, user2, "r" )
print_has_access( a_file, user1, "w" )
end

Next steps

The full source code is found here

Here's a few useful resources to help you on your way:

Thanks for reading - any feedback is welcome!

Tuesday, February 16, 2010

The top 10 ways to get to know Neo4j

Today is a big day in Neo4j land because after ten long years of development and seven years of commercial 24/7 production we just announced Neo4j 1.0!

We're very excited about this and this post will outline the ten most interesting and fun ways of getting started with Neo4j. Without further ado, let's go!
  1. Wait, what is Neo4j?

    Neo4j is a graph database, that is, it stores data as nodes and relationships. Both nodes and relationships can hold properties in a key/value fashion. Here's a small example:

    You can navigate the structure either by following the relationships or use declarative traverser features to get to the data you want.

  2. Introduction

    For a high-level, 9 minutes cocktail-party introduction of Neo4j, check out this interview with Emil Eifrem:

    (blip.tv)

    To watch a longer introduction, see the no:sql(east) 2009 presentation by Emil Eifrém

  3. Handling complexity

    Most applications will not only have to scale to a huge volumes, but also scale to the complexity of the domain at hand. Typically, there may be many interconnected entities and optional properties. Even simple domains can be complex to handle because of the queries you want to run on them, for example to find paths. Two coding examples are the social network example (partial Ruby implementation) and the Neo4j IMDB example (Ruby variation of the code). For more examples of different domains modeled in a graph database, visit the Domain Modeling Gallery

  4. Storing objects

    The common domain implementation pattern when using Neo4j is to let the domain objects wrap a node, and store the state of the entity in the node properties. To relieve you from the boilerplate code needed for this, you can use a framework like jo4neo (intro, blog posts), where you use annotations to declare properties and relationships, but still have the full power of the graph database available for deep traversals and other graphy stuff. Here's a code sample showing jo4neo in action:

    public class Person {
    //used by jo4neo
    transient Nodeid node;
    //simple property
    @neo String firstName;
    //helps you store a java.util.Date to neo4j
    @neo Date date;
    // jo4neo will index for you
    @neo(index=true) String email;
    // many to many relation
    @neo Collection roles;

    /* normal class oriented
    * programming stuff goes here
    */
    }

    Another way to persist objects is by using the neo4j.rb Neo4j wrapper for Ruby. Time for a few lines of sample code again:

    require "rubygems"
    require "neo4j"

    class Person
    include Neo4j::NodeMixin
    # define Neo4j properties
    property :name, :salary, :age, :country

    # define an one way relationship to any other node
    has_n :friends

    # adds a Lucene index on the following properties
    index :name, :salary, :age, :country
    end
  5. REST API

    Of course you want a RESTful API in front of the graph database as well. There's been plenty of work going on in that area and here are some options:

    • The neo4j.rb Ruby bindings comes with a REST extension.
    • The neo4jr-simple Ruby wrapper has the neo4jr-social example project, which exposes social network data over a REST API.
    • Similarly, the Scala bindings has a companion example project which will show you how to set up a project exposing your data over REST.
    • Last but not least, Jim Webber has joined up with the core Neo4j team to create a kick-ass REST API. The current code base is only in the laboratory but a lot of people are already kicking its tires.
  6. Language bindings

    The Neo4j graph engine is written in Java, so you can easily add the jar file and start using the simple and minimalistic API right away. Your first stop should be the Getting started guide, or if you want to add a package of useful add-on components to the mix, go for Getting started with Apoc. Other language bindings:

  7. Frameworks

    Work is being done on using Neo4j as backend of different frameworks. Follow the links to get more information!

  8. Tools

    • Shell: a command-line shell for browsing the graph and manipulate it.
    • Neoclipse: Eclipse plugin (and standalone application) for Neo4j. Visual interface to browse and edit the graph.
    • Batch inserter: tool to bulk upload big datasets quickly.
    • Online backup: performs backup of a running Neo4j instance.
  9. Query languages

    Beyond using Neo4j programmatically, you can also issue queries using a query language. These are the supported options at the moment:

    • SPARQL: Neo4j can be used as a triple- or quadstore, and has SAIL and SPARQL implementations. Go to the components site to find out more about the related components.
    • Gremlin: a graph-based programming-language with different backend implementations in the works as well as a supporting toolset.
  10. Inspiration

    Have a look at the Neo4j in the wild page to see what others are doing with Neo4j. Here's a selection:

Hopefully this post was a good starting guide to the Neo4j ecosystem. As always, please ask any questions on the mailing list or come hang out with us in the #neo4j channel on IRC.

Sunday, February 7, 2010

Yay! The Graph Processing Infrastructure is starting to emerge!





Hi all,
in the last months, the Tinkerpop team has been starting to venture into the big task of starting a unified ecosystem for the world of graphs and related projects and products. Now, I am proud to say that it seems things are starting to get some traction and see increasing contributions from outside the core team, mainly the awesome Marko Rodriguez:



Logo contributed by Ketrina Yim
  • The JUNG graph library got adapted to Gremlin
  • HyperGraphDB is being adapted to work with Gremlin
  • a REST API based on the awesome work of Jim Webber and the Neo4j team is in the making by Michael Hunger and Pavel Yaskevich
So, here is the current project ecosystem - great work of everyone involved!

Gremlin:
  • mainly driven by Marko Rodriguez
  • a library and standalone, single-user Java project, defining a
  • number of data models - to start with the Property Graph Model (PGM) and the
  • General Document Model (GDM) , soon to be broken out of the core Gremlin code.
  • Adapters to different underlying graph implementations, from Neo4j to SAIL, integrating anything from Sesame to a live LinkedData SAIL
  • Adapters to other interesting graph frameworks like JUNG, suggested by Seth@Automenta
  • A Turing complete scripting language for querying, modification and transformation of PGM and GDM compliant data structures
  • All selectors are XPath-based in syntax
  • Pluggable external path elements and function implementation.

RESTling:
Webling:
  • Driven mainly by Pavel Yaskevich via financing from Neo Technology
  • A web based visual end-user interface to Restling
  • A web based terminal supporting execution of Gremlin operations and logic
  • Visualization support with graph libraries
  • Multi-user support
  • Via REST support to connect to remote Restling instances
Gargamel:
  • Driven mainly by Marko Rodriguez
  • a execution framework primarily targeted at Bulk Synchronous Parallel graph algos
  • A number of highly parallel base graph algos integrated into Gremlin to use this framework
  • A communication framework for execution of gremlin tasks on different (partitioned or replicated) graph instances, firstly using LinkedProcess (financed by the LANL) and XMPP, but replaceable with e.g. an Erlang-implementation (kudos to Ingo Schramm for suggesting it) or RESTling- based communication for optimization of different aspects like inter-process communication during execution

All in all, I just wanted to express my excitement over the whole emerging community around Gremlin, Neo4j and graphs in general! It is thrilling to see that the easy use of graphs and the internet-scale processing of complex data structures is starting to take shape in an open world, getting the different views on graphy data onto one page and providing a broader audience the possibility to use graph structures in the real world.

/peter neubauer