Thursday, February 25, 2010

Access control lists the graph database way

In many contexts you need to handle user permissions to access, create or change some kind of resources. A common example is a file system, and that's what we are going to dive into in this blog post. We're going to use Ruby bindings for the Neo4j graph database to create a small - but working - example application.

Preparation

To set up the environment for this example on Ubuntu, I used the following commands:
sudo apt-get install jruby
sudo jruby -S gem install neo4j
To import the libraries, the following code was used:
require 'rubygems'
require 'neo4j'
require 'neo4j/extensions/find_path'

Heading for the node space

So user permissions, what are they all about? Obviously it's about users, and usually user groups as well. We'll abstract this away a bit and use the term principals, which can be single users or groups.
The other side of user permissions are the resources which are to be protected. In our case we'll have a file system, so there will be folders and files. Here we'll use the term content.
Let's start out building a graph to support the application from what we have gathered so far! When working with a graph it's beneficial to think in a graphy manner, so that's where we'll begin. Graphs are presumably about connecting things, so our first step is to create some relationships. Neo4j comes with a built-in reference node, which is easily accessible at all times. We use this to create our own "subreference nodes", one for principals and one for content. This is how our graph looks so far:

To create (and get) the subreference nodes, we use this function:
def get_or_create_sub_ref( name )
  result = Neo4j.ref_node.rels.outgoing( name ).nodes.first
  if ( result.nil? )
    result = Neo4j::Node.new :name => name.to_s.capitalize.gsub("_", " ")
    Neo4j.ref_node.rels.outgoing( name ) << result
  end
  return result
end
This function is then called whenever we need to use a subreference node. The important parts here are:
  • ref_node: the built-in reference node
  • rels: relationships connected to a node
  • outgoing: the direction of the relationship (the relationships are always directed, but you can choose to ignore the direction in traversals)
  • ( name ): the type of relationships to follow (the type can be ignored in traversals as well, but in our case we want to use it)
  • nodes: the nodes in the other end of the relationships
  • first: the first node found - there sould only be one subreference node of each type
If the subreference node isn't found, it will be created and connected to the reference node. As you can see, we're adding a property with the key name to the nodes as well, which is there solely for the purpose of visualization (the images in this post are created using Neoclipse).

Basic structure

For the principals part, we are going to connect the top-level ones to the corresponding subreference node using a PRINCIPAL type of relationship. Other than that, there's just users and groups, so let's use a IS_MEMBER_OF_GROUP relationship type to encode that. This is how that looks in the graph:

And here's the code to create it:
def new_principal( name, member_of_groups = [] )
  principal = Neo4j::Node.new
  principal[ :name ] = name
  if member_of_groups.empty?
    get_or_create_sub_ref( :PRINCIPALS ).rels.outgoing( :PRINCIPAL ) << principal
  else
    for group in member_of_groups
      principal.rels.outgoing( :IS_MEMBER_OF_GROUP ) << group
    end
  end
  return principal
end
If a new principal isn't member of any groups, it's added as a top-level principal, connected to the principals subrefererence node. In other case, it's simply added to the groups.
With Neo4j all operations on the graph have to be encapsulated in a transaction, so this is how we'll call the above function:
Neo4j::Transaction.run do
  all_principals = new_principal( "All principals" )
  root = new_principal( "root", [ all_principals ] )
  regular_users = new_principal( "Regular users", [ all_principals ] )
  user1 = new_principal( "user1", [ regular_users ] )
  user2 = new_principal( "user2", [ regular_users ] )
end
For the content part, things are very similar to the principals part. The main difference is that in this case, an item can have only a single parent item. Here's the graphical view on that:

And this is the code to create the structure:
def new_content( name, parent = nil )
  content = Neo4j::Node.new
  content[ :name ] = name
  if ( parent.nil? )
    get_or_create_sub_ref( :CONTENT_ROOTS ).rels.outgoing( :CONTENT_ROOT ) << content
  else
    parent.rels.outgoing( :HAS_CHILD_CONTENT ) << content
  end
  return content
end
Similar to how the principals were created, this is the code to create the content data:
Neo4j::Transaction.run do
  root_folder = new_content( "Root folder" )
  temp_folder = new_content( "Temp", root_folder )
  home_folder = new_content( "Home", root_folder )
  user1_home_folder = new_content( "user1 home", home_folder )
  user2_home_folder = new_content( "user2 home", home_folder )
  a_file = new_content( "MyFile.pdf", user1_home_folder )
end

At the core

Now that we have the basic structure in place, what's left regarding our data is a small but crucial part: the permissions information! We're using a simple scheme: adding security relationships with optional boolean flags for read and write permission. Not much to say here, this is what we want the full graph to look like (click for a bigger version):

A small function will help us add the security information:
def apply_security( content, principal, map_with_flags )
  security_relationship = Neo4j::Relationship.new( :SECURITY, principal, content )
  map_with_flags.each_pair {|key, value| security_relationship[ key ] = value}
end
It's time to add the security data:
Neo4j::Transaction.run do
  apply_security( root_folder, root, { "w" => true } )
  apply_security( root_folder, all_principals, { "r" => true } )
  apply_security( temp_folder, all_principals, { "w" => true } )
  apply_security( user1_home_folder, regular_users, { "r" => false, "w" => false } )
  apply_security( user1_home_folder, user1, { "r" => true, "w" => true } )
  apply_security( user2_home_folder, user2, { "r" => true, "w" => true } )
end
To check the permission for some action by an actual principal for some content, there's some work to do. This is the algorithm we use to retrieve a permission flag:
  1. Move from the content node and upwards through the file system structure and investigate each level for permission information.
  2. On each level, see if there are any principals related to or identical with the principal concerned.
  3. Make sure to use the permission information from the principal closest to the principal concerned.
  4. If permission information was found, return it; otherwise, continue traversing to the next level in the file system.
In the code for this, we'll use a function named depth_of_principal() to calculate the distance between the principal we have traversed to and the principal concerned. More on that later, here's the code to check the permissions:
def has_access( content, principal, flag )
  for current_content in content.incoming( :HAS_CHILD_CONTENT ).depth( :all )
    lowest_score = nil
    lowest_modifier = nil
    for rel in current_content.rels.incoming( :SECURITY )
      rel_principal = rel.start_node
      if !rel[ flag ].nil?
        score = depth_of_principal( rel_principal, principal )
        if !score.nil?
          modifier = rel[ flag ]
          if lowest_score.nil? || score < lowest_score ||
            ( score == lowest_score && modifier )
            lowest_score = score
            lowest_modifier = modifier
          end
        end
      end
    end
    if !lowest_modifier.nil?
      return lowest_modifier
    end
  end
  return false
end
Here's our function to check the distance between principals (and to see if they're on the same path at all).
def depth_of_principal( principal, reference_principal )
  result = reference_principal.outgoing( :IS_MEMBER_OF_GROUP ).depth( :all ).path_to( principal )
  return result.nil? ? nil : result.size
end
Finally, we want to see that everything works, so here's a utility function to print permission information:
def print_has_access( content, principal, flag )
  print principal[ :name ] + " +" + flag.upcase + " access to " + content[ :name ] + "? " +
    has_access( content, principal, flag ).to_s + "\n"
end
And here's how to use the function:
Neo4j::Transaction.run do
  print_has_access( home_folder, root, "w" )
  print_has_access( home_folder, user1, "w" )
  print_has_access( a_file, root, "r" )
  print_has_access( a_file, user2, "r" )
  print_has_access( a_file, user1, "w" )
end

Next steps

The full source code is found here
Here's a few useful resources to help you on your way:
Thanks for reading - any feedback is welcome!

Tuesday, February 16, 2010

The top 10 ways to get to know Neo4j

Today is a big day in Neo4j land because after ten long years of development and seven years of commercial 24/7 production we just announced Neo4j 1.0!

We're very excited about this and this post will outline the ten most interesting and fun ways of getting started with Neo4j. Without further ado, let's go!
  1. Wait, what is Neo4j?

    Neo4j is a graph database, that is, it stores data as nodes and relationships. Both nodes and relationships can hold properties in a key/value fashion. Here's a small example:

    You can navigate the structure either by following the relationships or use declarative traverser features to get to the data you want.
  2. Introduction

    For a high-level, 9 minutes cocktail-party introduction of Neo4j, check out this interview with Emil Eifrem:

    (blip.tv)
    To watch a longer introduction, see the no:sql(east) 2009 presentation by Emil Eifrém
  3. Handling complexity

    Most applications will not only have to scale to a huge volumes, but also scale to the complexity of the domain at hand. Typically, there may be many interconnected entities and optional properties. Even simple domains can be complex to handle because of the queries you want to run on them, for example to find paths. Two coding examples are the social network example (partial Ruby implementation) and the Neo4j IMDB example (Ruby variation of the code). For more examples of different domains modeled in a graph database, visit the Domain Modeling Gallery
  4. Storing objects

    The common domain implementation pattern when using Neo4j is to let the domain objects wrap a node, and store the state of the entity in the node properties. To relieve you from the boilerplate code needed for this, you can use a framework like jo4neo (intro, blog posts), where you use annotations to declare properties and relationships, but still have the full power of the graph database available for deep traversals and other graphy stuff. Here's a code sample showing jo4neo in action:
    public class Person {
      //used by jo4neo
      transient Nodeid node;
      //simple property
      @neo String firstName;
      //helps you store a java.util.Date to neo4j
      @neo Date date;
      // jo4neo will index for you
      @neo(index=true) String email;
      // many to many relation
      @neo Collection roles; 
    
      /* normal class oriented
      * programming stuff goes here
      */
    }
    Another way to persist objects is by using the neo4j.rb Neo4j wrapper for Ruby. Time for a few lines of sample code again:
    require "rubygems"
    require "neo4j"
    
    class Person
      include Neo4j::NodeMixin
      # define Neo4j properties
      property :name, :salary, :age, :country
    
      # define an one way relationship to any other node
      has_n :friends
    
      # adds a Lucene index on the following properties
      index :name, :salary, :age, :country
    end
  5. REST API

    Of course you want a RESTful API in front of the graph database as well. There's been plenty of work going on in that area and here are some options:
    • The neo4j.rb Ruby bindings comes with a REST extension.
    • The neo4jr-simple Ruby wrapper has the neo4jr-social example project, which exposes social network data over a REST API.
    • Similarly, the Scala bindings has a companion example project which will show you how to set up a project exposing your data over REST.
    • Last but not least, Jim Webber has joined up with the core Neo4j team to create a kick-ass REST API. The current code base is only in the laboratory but a lot of people are already kicking its tires.
  6. Language bindings

    The Neo4j graph engine is written in Java, so you can easily add the jar file and start using the simple and minimalistic API right away. Your first stop should be the Getting started guide. Other language bindings:
    • Python - see PyCon 2010 too!
    • Ruby
    • Clojure
    • Scala
    • Java object mapping
    • For Groovy, see the next section on frameworks.
    • Integration with PHP is worked on as well.
  7. Frameworks

    Work is being done on using Neo4j as backend of different frameworks. Follow the links to get more information!
  8. Tools

    • Shell: a command-line shell for browsing the graph and manipulate it.
    • Neoclipse: Eclipse plugin (and standalone application) for Neo4j. Visual interface to browse and edit the graph.
    • Batch inserter: tool to bulk upload big datasets quickly.
    • Online backup: performs backup of a running Neo4j instance.
  9. Query languages

    Beyond using Neo4j programmatically, you can also issue queries using a query language. These are the supported options at the moment:
    • SPARQL: Neo4j can be used as a triple- or quadstore, and has SAIL and SPARQL implementations. Go to the components site to find out more about the related components.
    • Gremlin: a graph-based programming-language with different backend implementations in the works as well as a supporting toolset.
  10. Inspiration

    Have a look at the Neo4j in the wild page to see what others are doing with Neo4j. Here's a selection:
Hopefully this post was a good starting guide to the Neo4j ecosystem. As always, please ask any questions on the mailing list or come hang out with us in the #neo4j channel on IRC.

Sunday, February 7, 2010

Yay! The Graph Processing Infrastructure is starting to emerge!





Hi all,
in the last months, the Tinkerpop team has been starting to venture into the big task of starting a unified ecosystem for the world of graphs and related projects and products. Now, I am proud to say that it seems things are starting to get some traction and see increasing contributions from outside the core team, mainly the awesome Marko Rodriguez:



Logo contributed by Ketrina Yim
  • The JUNG graph library got adapted to Gremlin
  • HyperGraphDB is being adapted to work with Gremlin
  • a REST API based on the awesome work of Jim Webber and the Neo4j team is in the making by Michael Hunger and Pavel Yaskevich
So, here is the current project ecosystem - great work of everyone involved!

Gremlin:
  • mainly driven by Marko Rodriguez
  • a library and standalone, single-user Java project, defining a
  • number of data models - to start with the Property Graph Model (PGM) and the
  • General Document Model (GDM) , soon to be broken out of the core Gremlin code.
  • Adapters to different underlying graph implementations, from Neo4j to SAIL, integrating anything from Sesame to a live LinkedData SAIL
  • Adapters to other interesting graph frameworks like JUNG, suggested by Seth@Automenta
  • A Turing complete scripting language for querying, modification and transformation of PGM and GDM compliant data structures
  • All selectors are XPath-based in syntax
  • Pluggable external path elements and function implementation.

RESTling:
Webling:
  • Driven mainly by Pavel Yaskevich via financing from Neo Technology
  • A web based visual end-user interface to Restling
  • A web based terminal supporting execution of Gremlin operations and logic
  • Visualization support with graph libraries
  • Multi-user support
  • Via REST support to connect to remote Restling instances
Gargamel:
  • Driven mainly by Marko Rodriguez
  • a execution framework primarily targeted at Bulk Synchronous Parallel graph algos
  • A number of highly parallel base graph algos integrated into Gremlin to use this framework
  • A communication framework for execution of gremlin tasks on different (partitioned or replicated) graph instances, firstly using LinkedProcess (financed by the LANL) and XMPP, but replaceable with e.g. an Erlang-implementation (kudos to Ingo Schramm for suggesting it) or RESTling- based communication for optimization of different aspects like inter-process communication during execution

All in all, I just wanted to express my excitement over the whole emerging community around Gremlin, Neo4j and graphs in general! It is thrilling to see that the easy use of graphs and the internet-scale processing of complex data structures is starting to take shape in an open world, getting the different views on graphy data onto one page and providing a broader audience the possibility to use graph structures in the real world.

/peter neubauer